Latent Space: The AI Engineer Podcast - Amp: The Emperor Has No Clothes

Starting point is 00:00:03 Hey, everyone. Welcome to the Latinspace podcast. This is Alessio from our kernel labs. And today there's no Swix. He's in Europe with A Engineer, but I'm joined by Quinn and Thorsten from SourceGraph. Welcome. Thanks. Great to be here. Great to be here. So we already had origin of Sourcegraph with Bejong and Steve. So we'll put the link in there. And this was when you launched Cody. And today, I guess, Cody is a brand that is past and now you have AMP. Let's maybe start there. Obviously, Quinn, your CEO, Sourcegraph. What's your role, I guess, title? How do you describe what you know? CEO is much easier. I'm not going to name my internal title, but I'm... The dictator of AMP. That's internal title, yeah.

Starting point is 00:00:46 But it's, yeah, I'm the lead engineer and one of the creators of AMP, yeah. So where are you part of like the thumbs up, thumbs down on Cody? Brand. Like, how did you get to AMP? Let's tell that story. I mean, I'll start. You can jump in, but basically I came back to Social Square February. And then this was when Claude 3537 happened too.

Starting point is 00:01:06 And then Quinn and I started hacking on, you know, what if we just take Claude 37 and what if we give it just tools and let it go nuts? You know, like no constraints, no, a lot of the other stuff that we had in Cody, which works for Cody. Let's just start trying this out. And we started a new project. And we were, you know, I remember first weekend SF where I would stand up in the middle of the room. like, Quinn, you got to see this.

Starting point is 00:01:32 Like, this is crazy. And then he was like, okay, let me try this. And then we went off from there and then we realized relatively quickly that it's a different kind of product where Cody was very much, first of its kind with rack and assistant panels, assistant sidebar. But with, you know, a tool calling agent where I define an agent as a model, a system prompt and tools and tool prompts that go along with this that you give a lot of permissions for. so it can actually, you know, see the file system, interact with the file system, or your editor.

Starting point is 00:02:04 It's a different thing, and we realized we got to handle this differently. We got to reset expectations. We got to tell users that it's a different thing and they've got to use it differently in some sense. And also that we cannot make it work with a $20 subscription, which back then was seen as a, you know, offensive thing to say. And now they're charging money. Yeah, yeah, exactly. But now, you know, people are paying hundreds of dollars per month, which I've been saying this every day. for the last two weeks, that's crazy to me still, like, how far we've come. So, you know, this is

Starting point is 00:02:35 just how it started, like, okay, this is a different thing. We were astonished, surprised, amazed by what these models can do. So we decided let's reset expectations. Let's tell a new story. We can, we have enterprise customers for Cody, but they have expectations. We have contracts. These are large contracts, long running contracts. And you can't just say, guys, here's a new mode. It costs whatever, how many much dollars more. It works completely differently. You need to hold it in a different way. So in order to avoid this and to avoid being disrupted, you create a new thing that kind of disrupts the business on its own, you know? Yep. That's, I don't know, want to add? Yeah. The only thing that matters is building the best

Starting point is 00:03:16 coding agent. Nothing else matters because if you can build that, that's way bigger than anything else that came before. And to be clear, nobody has built that yet. We are getting better and better. But I think you've seen this treadmill of tools that you use as a dev. First, it started with copilot. And then Cody, we were really good at chat rag. And then cursor and windsurf showed that kind of IDE forks and partially agentic things could get better and better. And then, you know, the next generation, AMP and Claude Code. And now you're already seeing people say, oh, well, codex is better than ClaudeCode.

Starting point is 00:03:48 And there's not been any tool that is stuck with devs for more than six or 12 months or something. ones, yeah. And we saw that firsthand. We're now on our second iteration, and we're able to move so much faster, given that it has a totally different name, totally different brand. And some people don't even know that Sourcegraph or the people behind Cody made Am. That has been so good. So I do not know how if you had an AI tool that was relevant nine or 12 months ago, how you

Starting point is 00:04:15 can even bring the same brand and same customer contracts along with you and make a good product. It is so liberating to be able to say totally different. On the technical level, Cody was or is, you know, it's a softcraft product. So it's kind of, it works with the source craft platform. That means you're tied to the release cycle of the source graph platform. And softs craft is in the cloud. We have, you know, cloud versions of softs, but also on-prem for some customers. Completely different game.

Starting point is 00:04:42 And with AMP, we basically said, let's not do this. Let's build something that allows us to ship 15 times a day. And that's what we've been doing of the last six months. Like we're still doing this. And it's a game changer, not just, you know, anybody who's done this knows this. But internally and externally, you need to reset expectations that this is a new way of how we build software. And having a new project with a new way to do it is, I think, a better way to do it than to try and get like the old to move in this new way. Because it would take longer.

Starting point is 00:05:15 Are there any numbers that you share about developers, like, you know, AMP usage overall? It's grown really fast. It's growing more than 50. percent month over month, a lot faster in, you know, some weeks. And really, what we have seen, too, is there's a huge change in who's using it. So we have teams with like two or three people that are on annual run rates of like hundreds of thousands of dollars. So that's it. We also made a decision to not try to go to every single dev in an enterprise, which we had done with Cody. We pick off the people that want to move as fast as we want to move that want to stay at the model product frontier like

Starting point is 00:05:51 us. So it's all about just being able to move really fast. And I think that the way that agents work today, most of them are used in your editor or CLI interactively. You have one agent at most running with you at all times. That's going to be blown up with ASync agents when they're running 24-7 concurrently in the background. Then you can have 10 or 100 times as many. And that's going to dominate inference. That's going to dominate the output you get. So it's really, you know, AMP is growing really fast, but it's more about how do we get to be the first ones with that like 10 to 100 X improvement. And everything is about how can we move fast and learn along the way. It just so happens that we are positive gross margins along the way.

Starting point is 00:06:34 I would say that's one of the biggest axioms that we have with AMP is that we don't know where this right is going. But what we do know is that it's changing every few months. And, you know, start of the year, right, cursor was the king. and the biggest, fastest growing side of all time. Now, if you were to ask that off developers, what do you think is the Deaf Tool King? I don't think they would name Cursus the first one. And then in, I think a couple months later,

Starting point is 00:07:00 or maybe a couple months before somebody, this was from somebody in sales, they said, like, I don't know what it was, but they were basically saying, blah, blah, blah, makes Cursor look like GitHub co-pilot, you know, like makes it look old and boring and enterprising. And this is, like, just think about this. like copilot is not that old.

Starting point is 00:07:18 Like it was state of the art, I don't know, maybe two years ago or something. And now the world has changed completely. And we know that this is not over yet. Like the changes are still coming. So from an engineering and business perspective, this is priority number one. Position yourself in a way that you can react to these changes. And position your product and your expectations and your technical decode base in a way that lets you react to these things as fast as possible. And then everything else flows from it.

Starting point is 00:07:47 Everything else we've done is basically based on this, like that. Everything can change at, you know, release of another model or something. But how are you doing it internally from a team perspective? Because, you know, obviously you have a lot of customers already on the Source Graph product. There's kind of like this tension of, you know, going founder mode and kind of burning the bridge or maybe some of the old use cases versus having a smaller team and a dictator for a new product. How does that look like from like a building the company perspective?

Starting point is 00:08:14 When you have a really popular, successful product that's highly profitable, that funds a lot of this craziness. And we're able to do this also with the customer trust. So there's a lot of things on AMP that we do, like no consistent pricing, no user model choice, no checking off all the boxes that security and compliance and legal want that takes nine months. We're able to get away without doing that stuff because we have that customer trust. So, you know, that has been a big thing. it requires you to totally change how you think about an existing business. It's not a way to sell through that same channel to those same users. It's a way to use that trust and that revenue to fund crazy stuff that you got to do.

Starting point is 00:08:55 But it's something that we deal with all the time. And we've got really smart devs. And yet it is hard for people to throw away everything that they have learned about how to build software development. And so in some cases, it's been really refreshing to have people that have only ever been at tiny one-person companies. Yeah. And they come here and they have no preconceived notions about how you do planning or anything like that.

Starting point is 00:09:20 And that is, it's great because you can throw all of that out of the window. Yeah. We've had, this was radical in some sense that when we started, it was Quinn and I working on Maine, no code reviews, nothing and just pushing. And it was like a personal project. And I think we're both experienced engineers. So it would be everybody owns. stuff. You push and if you break CI, you go and fix it or if the other person is awake,

Starting point is 00:09:45 you fix it or something. And it seems like when you move this fast and you ship this often, you have, you know, throughout the day, there's like 15 decisions you have to make where you have to flip between the duct tape personal project mode, move fast and this is how they do it at Google mode. And it, you know, requires a certain expertise or it requires also to be free from like the thinking of the last 15 years of like always do it like Google. Like we always scale up. And the base assumption between like the whole Google thing was always that, oh, we found product market fit.

Starting point is 00:10:18 Now we have a product. Let's scale this up, right? Every company I ever worked in was based on this assumption that this is the product. Let's make it proper and engineer it up. But now with these changes, what's ingrained in AMP is the understanding that, well, even if it scales up, we have to be prepared that somebody pulls the rug and a new technology comes out and it kind of shifts everything. So we have to be prepared for this.

Starting point is 00:10:41 And again, it all flows from this. So now in our development mode, the team is super small, you know, compared to, I guess, other companies, but I think we're around eight people now on the Amcor team. And we still don't do formal code reviews. We still push to Maine. We still ship 15 times every day. We dock food this as much as possible. And it turns out that in a fast-moving environment like this, this beats a lot of other

Starting point is 00:11:07 things like fast feedback loops and using the product yourself and dogfooting it using the product to build a product beats a lot of established processes you know and we can get away with it because we can dog food it and how has it been internally received i think we have the luxury of you know making use of the infrastructure that we already have for example have a fantastic security team right security team comes in guys let us take care of the security stuff for amp you know so that's just fun. And I'm like, cool. Like, I don't have to worry about this. Then we have infrastructure people. Guys, that does take care of how to run this in the cloud. Cool. I don't have to worry about this. I can concentrate on the client or the UX audio. So this is a nice

Starting point is 00:11:48 spot to be in where we can move fast, but use platform teams to kind of make sure that it doesn't break or it scales up or whatever, but still have like the, you know, the tip of the iceberg can melt and be rebuilt basically while the thing beneath the water line is stable, you know? Yeah. Not the greatest analogy. But I think there's a distinction between, you know, like platform stuff that does work. But on the UX or product applicationally, you want to be able to kind of tear the thing down and rebuild it as fast as possible. And I think that's what we're doing.

Starting point is 00:12:23 One thing is you get a separate team. And then the other thing is how do you put that team to work, right? Like if you look at like the coding agent space, I mean, obviously you started with Cody. and I think there was maybe a thesis behind it. And then you had the rise of clock code. You had codec, CLA, which is trying to catch up. I would say there may be a little behind on the UX and all of that. But obviously have billions of dollars to drain a custom model,

Starting point is 00:12:47 so that kind of weighs a lot of the option. How did you decide about the structure? So you have both plug-in for IDs, so I use some code and cursor, but I can also go into the CLI and use some code. Was that an easy choice? like was there a lot of discussion on, we should just do one of the modes. Like supporting both is obviously more work, right?

Starting point is 00:13:08 And a lot of these products don't support both. So what was that initial design choice of the structure of the product? And then we'll dive into the models as well. So we started with the VS code extension because it was the easiest thing to get off the ground. Like when you have a VS code extension, you have a marketplace, you can ship this. You can update it 15 times every day. You don't have to think about updating stuff. you also are next to the editor.

Starting point is 00:13:33 And looking back, you know, it's been six months, the editor might be dying or you might do a lot of coding outside the editor. Back then, it sounded much more radical than it does sound right now. So we started with, like, let's explore this. And having the thing next to your editor is a good place to start. And you can see the curse. You can do selection and whatnot. But we were really like, from the start, we didn't want to have like a deeply integrated thing.

Starting point is 00:13:58 It was always like, ah, let's keep. the feature small, we've got to be able to move fast. And then we build up the CLI on the side as like a different client, which also gives us the ability to abstract like the core and the client stuff. So that's a nice boundary to have. But then to be 100% honest, we were also surprised by how many people were fine with using a CLI for cloud code, for example. Like if you had asked me half a year ago, I would have said no way, like a CLI tool.

Starting point is 00:14:29 and what we realized is, well, a CLI is not just, you know, it's a UI, sure, but also it's a CLI program. That means you can run it on SSH. You can run it in any other editor. You can run it in multiple split panes. You can run it in multiple tabs. If you want to do this in VS code, you have to rebuild a lot of stuff. And you have to rebuild the way you switch between conversation. You have to rebuild. I mean, SSH works out of the box in VES code, sure. But still, like, you're tied to this. And we had an experiment, an internal one, about a desktop application, so like a standalone application. And turns out, yes, that's great to have multiple agents.

Starting point is 00:15:10 But you also have to reinvent everything that right now a terminal gives you for free, right? If I use ghosty or I term or vest term or whatever, I can command and command T. I get tabs, you know, splits, different environments per tab. You can see D interdirectories. You can set end fires. You get this for free, right? And if you do it in a desktop application, then you run into the issue of, you know, what people see with like a lot of the async agents. Oh, you run and run the task, set the end for us, which directory you have to be in.

Starting point is 00:15:39 What's, you know, what's in the path, whatnot. You have to do this beforehand and in the terminal you get it for free. So that's kind of the short version of it that we started with DS code because it was easy. And it gave a lot of feedback. We could concentrate on the stuff that matters and not worry about stuff like distribution, which VS code takes care of. And then with the emergence of CLIs, we notice that it's a big, big improvement. Or there's other advantages to it. So now then we rebuild the CLI twice.

Starting point is 00:16:07 And now we have like a really nice tweet with our own framework. And one interesting thing is our VS code extension has a lot of advantage over the CLI. For example, it's easy to display diagrams. It's easy to display images. It's easy to render a bunch of stuff. Like we can do command return to submit messages, you know, all of that stuff. And turns out we had like an internal appall last week at our company meetup where Bayang was asking, who of you uses the CLI and who of you uses VES code?

Starting point is 00:16:36 And was that 50-50 split. And it's very strange that it comes out like this. And there's not a clear winner and both have advantages and disadvantages. And so right now we have both. But do you cut the data based on the level of the engineer or maybe the specialty, you know, maybe front end versus back end? How do you segment that? Or do you just take it?

Starting point is 00:16:58 I mean, we haven't really segmented it if I had to, you know, the gues them at here. There's also a generational divide. Well, I would say the younger people, you know, younger than 25, their terminal seems old to them. And they were much more inclined to use the stuff in the editor. But yeah, we don't have any fancy segmentation. I think not to sound too traumatic,

Starting point is 00:17:20 but like one of the other guiding principles that we've had from the start with AMP was whenever somebody is like, what's the data on this or do we have like analytics on this? It's like, well, did you look for it yourself? Like, did you try it out? Did you talk to customers? Like, we constantly talk to customers. That beats a lot of other stuff. So yeah, we don't have any segment analysis of who uses what and where and how. I use both. And this idea that everything is changing, it applies to this. We looked at this. We saw the way the things were going and how many more flexible a CLI was. And we about three weeks ago, we said, we think probably it's painful, but we will kill the VS code extension for

Starting point is 00:18:03 AMP. And we said that. I lay that out. And I didn't like it, but it seemed like that's how things were going. And then you think about async agents, which probably need to be on your phone and on the web, or maybe you use WhatsApp to interact with them. That's a whole other mode of interaction. Well, and if it's on the web, that's like the Vs code UI, not the terminal UI. And then there's this other thing that we're planning on doing that I can't share more about, but that also makes me think, well, actually, we really need to keep the VS code UI in. And so this thing that seems so obvious, actually, there's two other completely different things out of left field that totally overturned it. Yeah. So we're keeping it. And it's definitely adding some more complexity,

Starting point is 00:18:45 but there's a lot of things we can do to reduce that and simplify it. But there's always a hand hovering over the button to, can we get rid of this? Like, can you? Can we shed weight? Can we get rid of the last? Can we reduce complexity? So we're again in the spot of if a new model comes out, we can react quickly. And, you know, sure, it's good engineering and there's not a lot of duplication, but still updating one client is still faster than updating two clients.

Starting point is 00:19:10 So there's this constant tension between what's the most minimal product that we can have. And, and, you know, just to pick some other examples, there's a lot of niceties you can do in VS code where, for example, you have recent examples. not recent, but a common example. You know how in VS code, you can hover over diagnostic, and then you can say, you know, fix this or whatever. And then people would ask, can you add like a let AMP fix this button? And it's like, you can ask,

Starting point is 00:19:37 AMP knows about your selection, knows about the diagnostics, it can see all of this. So you can just ask, fix this for me. And if you type three words, it will usually do it. So that's something where it's like, well, you can already do it. It's a nicety, but let's remove the surface area. Let's remove this other thing that we have to backport. or keep working or whatnot.

Starting point is 00:19:55 And a tiny example, but there's, you know, 500 of these, what we say. But how do you think of that when the IDE is already AI IDE? So I use cursor, right? Yeah. There's already like fix and chat that pops up. And they want, obviously, that button to go to their chat versus like, you guys are on the left side and it's like, yeah, yeah, just do this here. Do you feel that in a way the IDVES code extension is more like for the people not using this,

Starting point is 00:20:19 like AI first tools and using the future, like most people, you know, I'm sure. GitHub, it's like eventually going to have something good to put in in VS code. How much do you think about VS code extension just being, you know, maybe a stepping stone to the thing you cannot talk about that you don't talk about? And then the bifurcation of the TUI versus like the fully async, you're not looking at anything. I think we're not trying to maximize our revenue, our user adoption, literally today with the state of today's models and today's tools because everything is changing so fast.

Starting point is 00:20:51 So, yeah, we're not trying to fight cursor for who's going to win the right to have users fix with our AI or their AI. Frankly, it doesn't really matter to us. I don't think that that interaction is a really important way that people are going to be interacting with AI in six months or 12 months. I don't think we learn anything from that. And we just said, we're not going to do it. And users, some have definitely asked for that. And the other thing is we have to figure out what are users actually want? and they say they want a lot of things.

Starting point is 00:21:21 And in the case of customers, a lot of times they'll say they want a lot of things. They'll say that they want bring your own key. They'll say that they want model choice. They'll say that they want a subscription for $100 a month or pricing to lock users out if they spend more than $30 in a day. But actually, what we've seen is they want the very best coding agent. Not everyone, not everyone.

Starting point is 00:21:41 But we're focused on the ones that want the very best coding agent. And when we tell them how that thing will slow us down, then that starts this conversation where they'd rather not have something they might use 2% of the time if that means that the tool is worse. And we alone among the entire industry, it feels like we are being really honest and really bold with that. And I am really concerned just for the rate of progress overall that a lot of these other tools that are great, like Claude Code and Codex and cursor and so on, that they've forgotten

Starting point is 00:22:12 what made them great and what made them grow so fast, which is building the very best product. And they built it in a way that's too overfit on the current capabilities. And so they're just going to peak and then it's going to be a slow fall. And zero of the software business model works if that happens. You need to have growth into the future. So I think it's best for our business, but also I think that we're trying to push the whole industry to just be radical about the changes that are coming. Yeah, when you said the best coding agent, I'm always like, is there a market for like the mid coding agent? You know, like, there's a, I think the model choice is a great example of, like, why would you want a model choice?

Starting point is 00:22:51 I think pricing, I guess, is like the only thing that people bring up. But I think to your point, it's like, you're really paying engineers a lot of money. Yeah. Like the cost of like Sonnet 4 versus Sonnet 3.5, it's kind of like minimal compared to like 150, 200, 300K once you do taxes and benefits and all that that you pay to employees. So yeah, I think we're like in this part of the market almost, probably. like people are not maxing these things. There's absolutely a market today, literally today. Someone will pay a monthly fee for that cheaper AI product today,

Starting point is 00:23:26 but they're not going to be paying that in six months. It's going to be a different product or they're going to be paying for something else. And if you have that much churn as a product, you simply cannot build software in that way. But a lot of people get tempted by that and they hear a lot of users ask for it. six months ago, it was still the game of, oh, a new model got released. And then everybody would tweet out, it's already available in their editor or whatever it is, their extension, right? And I think that's kind of over. Like, it's just people realize that, well, the benchmarks are one thing, right?

Starting point is 00:24:00 Oh, this is the best model. Turns out it's not in this editor, but it feels different than this editor. So the whole, like, you know, the models are the thing. I don't want to say that's over. But it's becoming less important. And people are now also waking up to the fact that it's not just the model. It's the system prompt. It's the tools.

Starting point is 00:24:21 It's the harness, the scaffolding around the model. So I can give you the choice to use Gemini 2.5 in AMP, but without the system prompt being tuned to it, without what I call before, like going with the grain of the model, the models are trained in different ways. So you want to optimize the tool and all around it for the specific model, without that happening, it does make a lot of sense. You get the wrong signal.

Starting point is 00:24:45 I can drop you in a new mod right now and have it available in 10 minutes, but that's not what you're after, right? You want the best possible version of this mall and this tool. And, you know, that's, I think, become more important, less like the model selectors and whatnot. Why do you mention the models at all? So you have Sonnet 4 for the agent. You have 03 for the Oracle.

Starting point is 00:25:05 We don't. We don't show them in the product. We don't mention them all at all. We put it in the manual. we have like an owner's manual because people kept asking us. Yeah. Well, but even then, it's like, why does it matter that they ask? Because you might not, now it's like if you want to change it tomorrow,

Starting point is 00:25:20 then it's like you got to help people, you change the model. It's like, where do you think we are on like the slope of like, hey, look, you guys should forget it all about what model is even running, what the difference is. So I think we're going towards a future where the model will become an implementation detail to some sense. And we will end up on a different. abstraction layer. And for example, you ask like, when would I use a mid model, right? When you put it like this, it sounds obvious like who wants to use the shitty version of the better version, but, you know, we're thinking actively about this. There's models who might not be as smart as sonnet for as the main

Starting point is 00:25:57 and genetic driver, but it might be 10 times as fast. And that doesn't mean that you think, well, now I need to go fast. Let's use this. But I think there's different modes of working in your day-to-day work with this model in a different harness or in a different configuration can then be another way to do or get things done versus talking to an agent in a back and forth. So in that sense, like we've seen this with like planning modes or people use different models, but it's still like pretty clear that it's a different model and whatnot. But I do think it will be pushed more and more in the background and that people will choose or have different ways to interact with models. and the specific model or its version will not be as visible anymore. Yeah. And I know Cody was using StarCoder for in-line edits, at least as well.

Starting point is 00:26:47 The Younger said publicly, so I'm not leaving in anything. Does this still seem interesting to you to figure out, hey, is there something in open source that we can use and maybe fine-tuned to like make better? Or is it, are you still like? Yeah, absolutely. We just want to be at the cutting edge and, you know, that's maybe in the back burner. So first, it took people eight or nine months to figure out what 3-5 Sonnet was capable of from when it was released last June. And this was around the time we were building AMP and Claude Code came out.

Starting point is 00:27:18 And you realized that, wow, like a tool-calling agent is incredible. And at that moment, everyone, all the smartest people in the world also realized that billions of dollars of money went into training new models and harnesses based on that. And now it's September 2025. and we're reaping the benefits of all that investment. And you have so many more models coming out. You have the open source models like Quint3 Coder and Kimi K2. And they're moving so fast. You have XAI's models.

Starting point is 00:27:44 You have GP5 that came out. And we're still figuring out how to use these things. But it would actually be an incredibly pessimistic outcome if all those smart people and all that money were not able to build anything that was better than Sonnet. So we in our internal team right now, and this could change, we have about half of our internal team using a different model other than Sonnet as their main way of using Amp. And that's a huge change. In the past, we had done that only to test and begrudgingly, but now we're using it. And there's a different way of interacting with an agent that's not the linear chat transcript that actually means you don't feel like you're getting a cheaper mid model. You feel like this is a different way of interacting where that speed is really beneficial and it's more constrained.

Starting point is 00:28:29 So things are changing so fast. Is the GVT5 Codex only being available in Codex, make you nervous about future availability of, like, cutting edge models? And, like, thus they put more emphasis and, like, figuring out maybe, like, an open source strategy. They make it available to API customers. It's delayed. And if they were doing that, I really think that, for the most part, I take these model houses at their word. And they wanted to get it out to their first-party product as quickly as possible because they honestly need to get. gather more data and they're iterating in public.

Starting point is 00:29:04 So, yeah, I would love it if all the model houses perfectly coordinated with us before they released anything. But I know that would slow them down and I don't want to slow them down like that in the same way that we want our customers to give us grace and help us iterate in public. Yeah. I think there's an interesting, just dynamic in the market. Like when cursors switch from Sonnet to GPD5 is like the default model that was like, you know, 200 million our revenue for Anthropic that kind of went away and like moved on to

Starting point is 00:29:29 GPD5. So they're kind of like, okay, we're all friends now, you know, but maybe later that's going to change. But yeah, it's an interesting. The other thing also is that, you know, if you're building an agent and you're not at one of the model houses, you can use multiple models from different providers, right? So which is what we do. Like we, when you use AMP, you're using a model from Anthropic. You're using a model from Open AI. And you're using a model from Google.

Starting point is 00:29:55 And we're also very close to shipping like a fast open source. model that we can use as a different sub-agent in there, too. And, you know, when you put it like this, it seems silly to say we only use one model of like this family, because they all have different strength and weaknesses. I think we are one or two months away from a possible news cycle that is the foundation model companies have spent billions of dollars in CAPEX and hired like crazy. And now, you know, they're no longer the best in this realm and there's a huge stampede away from them. That's very possible. I'm not saying anything new.

Starting point is 00:30:32 Just imagine last May when people were counting Anthropic out before Sonnet came out, things change so fast here. Yeah. Yeah. Yeah. And I think COVID-A, obviously, with Johnny I've and some of that, it's moving more in a consumer fashion as well. So it's been interesting to see the big push on Codex. I would have imagined them to go more towards education, kind of like big, I know they have a lot of big enterprise contracts for like chat, GPD for your enterprise kind of thing.

Starting point is 00:30:59 So yeah, you guys, I think, are in a good spot. Because you have both, like, the source craft trust, like you said, but also like AMP. I see a lot of great stuff on Twitter. You know, people are like, I just put all my AMP agents running. I come back. It's great. It's like, I think it's now on that wave of like, okay, this is like one of the best tools out there. And like if you're like a serious engineer, you should probably use AMP, at least in some capacity.

Starting point is 00:31:23 Yeah. And then make your own choice. How difficult is it to think about what goes in your harness versus? is like what people should build. So you have custom commands. You've done a great job on like the tooling where like people can put like executables as tools instead of having to define like an MCP server. It's like, yeah, how much of it you're like, hey, we're just giving you the tools versus how much you want to be opinionated. We think like, I mean, I think of like compacting conversation is like maybe one of the key commands that people have.

Starting point is 00:31:53 And like in clock code, you can give a custom prompt to compact. Yep. Like a what's that discussion like? The main assumption, again, everything is changing. We've got to be able to move fast. That means what you want is, I don't use the picture of a harness. Often what I use is like a scaffolding. Like you want to build a scaffolding around the mall, a wooden scaffolding that if the

Starting point is 00:32:16 model gets better or you have to switch it out, the scaffolding falls away. You know, like the bitter lesson, like embrace that a lot of stuff might fall into the model as soon as the model gets better, right? because then they can remember more, whatever. It doesn't, why invest three months in like a separate apply model when the next generation, you know, 0.7 version or 0.8 or whatever version of this model can now do all of the edits on its own. So that's, again, the bigger thing.

Starting point is 00:32:44 And with that in mind, we really try to restrict a lot of the features that we add around them all. And you can do a lot of stuff. Like, we could be busy all day adding stuff in our clients and whatnot, making a product more complicated, but we don't want to. So that's, that's the first thing. The other thing is we're living in strange times. We're living in strange times from a product development perspective where basically I think the old triangle of design product and engineering, it's kind of changing. It's not a triangle anymore. I don't know what shape it is and it's not a triangle anymore. And the reason for

Starting point is 00:33:22 this is because you can't build a roadmap. You can't say this is what you're we're going to build in the next six months. People don't know yet how these malls can be used to their full extent. Everybody's figuring this out on the go. That's another thing. The other third thing there is, we just talked about us while having coffee before coming here, is that the only UI basically is like a text UI. And you can use this in the wrong way. And the example I used earlier was if, you know, you buy gyra, for example, but you use it for your shopping list. Atlassian is happy about this. That's not what they built the product for, right? But you can use it in the wrong way and still get results.

Starting point is 00:34:01 The problem with LLMs and a lot of the models is that you can use it in the wrong way. And it looks like you're getting results, you know. Like you can use OpenAI, chat GPT to look up serial numbers or, you know, technical specifications for a camera or something. And it will tell you this, you know. But it might be wrong. Or 99% of the time or 98% or 95% of the time it might work. but in 5% it might not work.

Starting point is 00:34:27 So having non-deterministic LLMs as the heart of your product is something unprecedented that we have in software, I think. So with that in mind, a lot of the features, what we see, you know, where people build like elaborate workflows, like I have my custom slash commands and they trigger custom sub-agents and they in turn trigger custom MCP tool calls behind which again another model is doing inference again and taking the input and blah, blah, blah. I think a lot of this will and has resulted in hangovers where people realize, oh, like this looks like it's a deterministic workflow. It looks like it does the thing that I wanted to do.

Starting point is 00:35:06 But actually, I can't use it if it only does it in 98% of the time. So that's something we're really conscious of where I think everybody is experimenting, everybody's sharing the experiences, you know, the thread boy tweets about what to prompt where and how. but you have to be super strict about not giving users a false sense of what the product can do and how reliable it is, because I think it's dishonest in some way, and it doesn't lead to good results. And just as an example, I think of the last three months, I would say we're ahead of the curve, like using AMP internally. Like we're ahead of like the mainstream agentic adoption, like say a month or two where we've tried a lot of this stuff. realized, oh, this wasn't the best use of our time or the tokens. And now you see a lot of other people waking up to this, like famous on Twitter, Armin, Ronaker, the Python developer from

Starting point is 00:36:03 Austria. He's done a lot of good stuff with Claudecote and shared a lot of his learnings. And you could see that the way he tweeted was super excited, like a lot of things, I can now do this and this and this. And then a month later, it's like, oh, maybe, you know, having AIDS, remote control agents that I control with my phone and let them run for 20 hours. Maybe that's not as productive as I thought it would be. And yeah, it's something that we're super conscious about. What are those things? What are like the failure modes that you heard from customers where it's like,

Starting point is 00:36:34 hey, we tried AMP and it just didn't work at doing XYZ. Is there a collection of those that you guys use as almost like a North Star as you keep building? I think like one of the things is the whole vibe coding stuff where people just use it. and, you know, they're like, hey, I spend $10 in tokens and it didn't build me the fall app or something. The failure mode of outsourcing the thinking, but not the typing, which I think it should be the opposite. You still have to know engineering. You still have to know how to program.

Starting point is 00:37:05 You still have to know your application and its architecture, how it's deployed. And then basically use the agent to do the work that you would have done, but you have to know what the desired outcome is and whatnot. Like that's a common one where people just, you know, hands off the wheel, agent, you go. and write this for me and then turns out a couple hours later, oh, actually, nobody understands it's spaghetti code. AMP, it's different from the products that competes again. So we've had one head-to-head loss with AMP, where we lost against the usual players. And the reason why is one of them discounted their other product 100% for two years.

Starting point is 00:37:40 The other one discounted at 85% for two years, which is just crazy. And we wouldn't want to do that because are we really good? going to learn from that and then how is it going to be used? It's going to be used in a different way. So usually the way that we might lose is there's some other product that would go to 80% of the devs in a company that is like the base layer. Sometimes that's copilot or cursor. And AMP is more expensive. It's more powerful. And they'll give it to that 20% of devs that they trust more. And in a previous world, any software company would say, oh, no, we need to get 100%. We don't want our competitor getting in there. But actually, that means that we're able to

Starting point is 00:38:19 even more focus on being bold and crazy because all those devs can always fall back to a cursor or a co-pilot. So we actually really like that kind of deal. The other thing there, I think a bunch of questions already touched on this is that talking about segmentation or market or the ideal user, again, everything is changing. So what we try to do is we try to, you know, build a tool for people who are at the frontier or at least curious about it and want to forget how to use these agents in the best possible way. And that's based on the assumption that if you build for the mainstream user who not, you know, mainstream sounds like, I don't know, it sounds bad, but what I mean is,

Starting point is 00:39:04 what I mean is if you build a product for somebody who does not know what a good prompt looks like, you will fall behind right now. Because you will spend time in resources building stuff like the prompt enhancer and like blah, blah, blah, blah, blah. but then you will end up building this and you miss the next step change that might happen. So the way we think about it is

Starting point is 00:39:27 we build for the people who already get that a lot of stuff is changing, but we want to leave the door open. If you're open to learning new things and you want to learn how to use AI and agents in your workflow, please come with us. We're happy to have you. But if you're skeptical

Starting point is 00:39:41 and you think prompt engineering, that's a bullshit term, I don't care about this, we're not right now building a product for you because we would fall behind. Yeah, so prompt enhancer, that's a bullshit feature that doesn't actually work. The theory behind it is nuts because what helps LLMs is not tricks in phrasing your prompt in a certain way.

Starting point is 00:40:01 It's fundamentally information that you have in your head that you can bring into the prompt. And if you don't have that in a prompt enhancer, LM cannot magically conjure that up. It cannot narrow the search space for you. Customs subagents. The way that we disqualify that is something we wanted to build at this point is because you look at all of the tokens that you're sending to the model, and it's so many more. It's, you know, so much more convoluted. We don't think that these models are trained in a way that would support this use case and the output of this going in here. It's so much harder to debug.

Starting point is 00:40:36 And MCP is another thing. MCP has done a great job in getting products to expose the verbs that agents might want to, you know, interact with. Although in most cases, they don't actually actually get the right verbs exposed. But as a user-facing technology, it is such a common failure mode where a user will go and add in some MCP servers. Off is a huge pain, but let's say they get over that hurdle. Then they have, I don't know, 50 tools exposed that often are too low-level granularity. And it takes a ton of tokens in the model. It makes everything slower and more expensive. They're often misused. And it's just not a good experience. So, you know, there's all of these things that we've said no to and other tools are bringing them in and they're saying yes to all these

Starting point is 00:41:21 things. I think it feels like they're making progress in the meantime and people retweet and people talk about how they're able to do these amazing things. But just the simplest example that seems so obvious. And frankly, it confounds me that more people don't do this. You make it so that my Google docs and Notion and linear and GitHub issues are all accessible to my agent. The vast, vast majority of developers who use AMP or cloud code or anything else, they don't have all those context sources set up. That seems like such a slam dunk. So we built that, we ripped it out. Before we would move forward with that, we'd have to get an answer, even for our own usage.

Starting point is 00:41:55 Why are we not doing that? And it's frankly still puzzling to us, but we're not going to touch that until we get confident about that. And to come back to the example, you mentioned Compact. We have this in the product. But again, the hand is hovering over the rip-it-out button. Because I think Compact is such a alluring thing where people think, oh, you know, I ran out of context. I hit that button, now I'm back to the start. But you lose signal.

Starting point is 00:42:19 You lose data. And it's something where other models really good enough, is compacting good enough to really glance over this that the user doesn't have to worry about it? Or is it something where you would have to somehow make it clear to the user that, hey, look, your conversation has 50 messages back and forth. If you hit compact, this is all going to become blurry. You know, you're going to compress it and you lose signal. You use fidelity.

Starting point is 00:42:43 And then you put it in a new context window. Are you sure this is the right trade-off? And some users are, but again, like, it's strange times because now we have like this thing at the heart of our software, this, you know, all from outer space that can do sometimes whatever it wants. And it's strange to build on top of this. And it's strange to educate your users about this, that this is the thing, right? Like, imagine, you know, the end of the 90s, PC era you had to build Microsoft Word. And then you say, like, well, at the heart of this new person computer, depends. 3, whatever, there's a weird

Starting point is 00:43:17 op from outer space. And sometimes if you bowl text in word, it actually makes it italic, you know? But that's the situation we're in. Like, that's the fact. Like, it doesn't always bowl the text. I mean, it underlines it if you reach 150 tokens or 150,000 tokens or something. How do you teach this to the user?

Starting point is 00:43:34 Yeah, and, you know, we're in the church of context engineering at the film office. And when we had Jeff on the podcast, they talked about the context route paper that they did. And they mentioned specifically encoding, for example, showing previous failures was like not helpful at all to the agent. And so I think when you're compacting conversation, there's almost like, you know,

Starting point is 00:43:55 if you have a long conversation, it usually means something went wrong along the way. And you had to like go back and forth and like a bunch of things that didn't work. And you're keeping those in. But I've been trying to figure out what's like, what's that going to look like? In my mind, it's almost like if you take the idea of linear, which I use and I give to my agents, just to get, because then I have a canonical prom for one issue. Because often you have to restart.

Starting point is 00:44:19 Yep. Because it's like it just goes too much down the wrong path. A lot of people don't restart. A lot of people just try to keep going. Yes, that's bad. But how, in that, in that case, what can you take from that conversation as a learning and put it back in the upstream issue

Starting point is 00:44:35 so then the issue is like either more descriptive or as like more information that is not compacting, but it's almost like how you would do as an engineer. it's like you're doing it in your mind, right? You get an issue and then you start working and then you kind of update your mental model. Like it doesn't really work for agents, but people are not doing this like small increment in the initial issue. I would say in this case, it's still you cannot outsource your thinking, right?

Starting point is 00:44:58 Like in this case, I don't think you can expect right now a model to say out of this conversation, this is the most important thing. Let me put this back in the linear thing. Maybe, you know, if you phrase it like this and automated it like this and it's always a perfect conversation, maybe it works. But I think in this case, you still have to be mindful of the context. And what we encourage users to do, for example, in AMP is to start a lot of small threads and be really, you know, do context engineering and be really strict about what goes into context and what doesn't. And the other thing that I think touches this on is, you know, where a lot of CLI tools, for example, have super verbose output.

Starting point is 00:45:38 And Basil, for it to call this out, I'm not a big baseline. But you could just call Basil out. super-verbose output. So then the natural assumption is, oh, let's hide this from the user. You know, like, let's abstract this away and summarize the output or whatever, or whatever, just the exit code or something.

Starting point is 00:45:53 And then you get into this dangerous territory where what you see is what you get is not true anymore. And in the context, what you see is like some other thing in the context and that can lead to issues. But for me, the meta thing here too is, everything is changing. That means we're seeing this. CLI tools right now.

Starting point is 00:46:11 while also adopting to being used by agents. So they're changing the output too. So if you focus on the fact that basil will always be rebosed and build something for this issue, you might be outdated in half a year where somebody is like, no, no, no, we have a Basel agent wrapper and now this is not an issue anymore. One model that I have is if you are relatively on the cutting edge of using agents and there's some persistent problem like this, it feels kind of out of band, like how the model itself will update its memory or will update the linear issue.

Starting point is 00:46:41 the model needs to be trained in order to do that better. If it's something like your own coding conventions, that's different. But if it's something fundamental, it feels like about out of band from the agent, the model needs to be trained to deal with memory better or to accept the fact that it might have an incorrect view of its own history if you go back and edit it. And we're feeling these pains right now because people have all been using agenda coding tools for a matter of months.

Starting point is 00:47:10 Most people have been using them for like less than three months. And if we're only feeling them now, it takes a little bit of time for a team at a model house to go and do a fine tune of one of their really big models. Or they've got other big models, the new revisions that are being trained. And they can only fit a certain number of experiments like this in. They're probably going to get half of their approaches wrong. So you can only do so much. And that's Thurston's idea of going with the grain of the model. And I mean, you've seen this, I'm sure, where a lot of users are going through this lesson where they, let me just add this MCP server that does every.

Starting point is 00:47:40 thing I wanted to do. And then two days later, it doesn't use it. Like, it never calls the tools. And it's like, yeah, it wasn't trained to do this. And you can sense, you know, like they have different philosophies in the model houses. I think Anthropic is, from what I can tell, working a lot or training a lot to what's using memory, like storing information. One or JAT, TPD obviously has this open AI. So if you give it a memory thing, yeah, it might use this. But then you have the issue of, well, if I give it this other custom-made MCP that we build internally and our processes don't map to anything that Open AI and Anthropic have seen or trained for, it won't be used and you won't get good results.

Starting point is 00:48:21 And it's super strange, right? Yeah, I wrote this article for the GPD-5 release about models self-improving for coding. So I basically asked GPD-5, what are tools that would be useful to you to be a better software engineer? It's like, well, you know, give a list of like 10 tools. And I'm like, okay, implement them, wrote all the tools. And then I asked it to do the same task I'd done before, but with those tools. And then it goes through the old task.

Starting point is 00:48:46 And I'm like, which of the tools did you use? And it's like, oh, I didn't use any of them. And I'm like, what did you know? It's like, you know, to be honest, I don't really need the tools. I can just do this task, you know? And I think that's like a good metaphor just for like the trend of the models, which is like, hey, they're going to use less and less of this like custom made tools to fix today's issue. I think the things that we can bet on, and I'm curious to hear your thoughts, is like, they're always going to have some sort of like test runtime.

Starting point is 00:49:13 Like, I don't think there's going to be a world in which the model is not going to run test and say, I'm sure this is going to work. The other one is there's always going to be some sort of like infrastructure risk code to then handle the deployment side. So I think whenever there's going to be some runtime issue, they're going to need to understand where they're running, you know? So I think, like, you can put them in a box, having an actual dogrify. and whatnot, it's helpful for them to explain what they have access to. What do you think are other things that you don't expect the model to, like, in the model that you want to still expose to it? So we're going to assume it's going to test.

Starting point is 00:49:49 We're going to assume it's going to have some definition of its environment. Are there other things that come to mind? I think test is a big one, and there's many different kinds of tests. So we had subagents in AMP, you know, among the first that come out with this conception of subagents, which is a sever context window. more curated set of tools. And I think there's a lot of potential to take a tool like test. And right now, you invoke it by the bash tool and you have some complex invocation. Too often, it'll run all of your tests, which is noisy and it takes a long time. If you're in your editor and you've got something nice

Starting point is 00:50:22 set up, you can hit like a hotkey and then it'll only run the tests that you need, you know, at your cursor. So giving the LLM a tool like that seems to have a lot of potential. And then that could even potentially be a smaller model, a fine-tune model for that task. It could be multiple based on what projects or stack you're using. And that could eliminate a lot of the confusion. Even with a good agent's dot MD guidance about how to run tests, I still see with AMP. And I think, you know, we've tried to make this really good. It only gets it right maybe 90, 95% of the time.

Starting point is 00:50:56 Sometimes it'll run the wrong testing or it won't escape it correctly. And I think we can eliminate that with a sub-agent. So there's so much more potential to go deep in areas. like that. And then for every language, it's a little bit different. So handle all those cases. Do you feel like that will just be built by each company on their own? Or do you think there's like a same default that you guys are going to build for that that is going to be effective for most code bases and test structures? This is where scale helps. And we have a lot of scale. So, you know, increasingly we're able to see in this framework for this standard Go unit test package.

Starting point is 00:51:30 that's easy. V-test and JavaScript, that's easy. And once you start getting more of the long tail, then, you know, it might have to just fall back to a really good model. But I think that we could probably make something that's optimized for some of these more popular unit testing frameworks. And it's a combination of deterministic stuff and non-deterministic stuff. Because right now, in my VS code, I can hand Apple T if I'm positioned in a test file inside of one of those test blocks, and it's only going to run that one. So, you know, even that is a benefit. And now I'm mostly bottlenecked by your playwright. Yeah, it just takes a long time, man.

Starting point is 00:52:03 But the crazy thing is the vast majority of devs who are billing web applications with coding agents do not have Playwright. And if they have it, it is set up in such a shitty way where it cannot really log into their app. They don't have any pattern for that. So even something like that, that's another example of a sub-agent that's go and try this basic end-to-end testing flow described in natural language with the running application. And wouldn't it be great if it could also do it in parallel. So, you know, there's all these ways that you can improve. That's a great example. And I think touching on this, we have coding agents.

Starting point is 00:52:33 They are productive. They add value. We cannot assume that everything around the agent in deaf tooling or codebases will stay static. So I think people are already adopting their codebase to be better used by agents or they're adopting their tooling to be better used by agents. They're more descriptive help text or whatever it is. So I think, I don't know, we should have a counter, but everything is changing.

Starting point is 00:52:57 I don't know, saying this again, but we cannot build right now with, oh, this is the tool that's going to stick around, giving that all of the code basis and all of the processes and all of the deaf tools will stay the same. We have to assume that this stuff will change too, and we have to stay nimble. So we have to make like short bets or small bats and try and get us, you know, in small steps forward, but always be reactive to this stuff. That, you know, if people, again, let's not use basal again, but I think playwright is a good thing where the feedback loop is incredibly important to working with these agents, like that the agent can see whether what it's doing is actually working.

Starting point is 00:53:33 So what we've seen people now do is, well, instead of having the client lock and having the browser lock and having the database lock, let's have one unified lock, because then it's easier for the agent to just look at this log and make sense of it. And then it turns out it doesn't have to be nicely formatted. It can be verbose. You can just have like JSON line outputs and whatnot, because the agent can understand it much better than a human can. And I think that's just a little preview of more things that we will see, where you're like, wait a second, this is not made for human consumption anymore.

Starting point is 00:54:06 How can we optimize this for a gentic consumption? And then maybe the game changes. And there's some things that now we get, for example, in my VITES suite, I have a knock to record HGCP calls. So whenever, especially for inference, like you can't really mock. We do a classification, things like that. You just need to see what happens. And then we just save the whole interaction. And then the model can actually see what the API return in much detail.

Starting point is 00:54:32 And it can reference it back in the future. So when you add a new feature, it can look at the test and it can see what the API usually returns. And it's like, oh, okay, it's going to have that key and like the content and things like that. I think there's more of that to be done. I think there was maybe also a time in which having console logs was like really bad. And I think there's maybe not going to be a console log that is like only funneling to like not the actual console in the browser, but like some way for like the agent to see all of the details of like everything that is happening. What I haven't figured out is like how do you instrument that? Because you can not put a bunch of console logs that go somewhere else in the code because then you're also polluting the context window of the model.

Starting point is 00:55:13 So you need some other way to do it. But I think yeah, the more your login, the more the model can kind of like self iterate. And you just described like five approaches that seem absolutely worthwhile to go explore to improve how coding agents work. But somebody do it. We can do some of it at a kernel labs, but we can not do all of it. So somebody help. Again, like the world around us is also changing Jose Valim, the creator of Alexeer and, you know, contributor co-contributed rails. I can't remember the name.

Starting point is 00:55:44 But basically they have a new framework tooling out that is Phoenix. Yeah, it's for Phoenix, right? but it's the name of the, I can't remember. But it's about, well, what if you build a framework for an agent to? What if the agent is integrated into the framework so that you can, if the application fails to run, you can ask the agent that has access to all of the context. And that's going to be more and more, I think. Like a lot of, you know, developers will build stuff because they're fed up with

Starting point is 00:56:12 copy and pasting stuff around. So we're going to see this in developer tool. Well, I mean, Rails was like one of the first frameworks. And I know that in the error page, they had a CLI that you could like use the logo context. Yeah. And I think like more of that. Like in next you have the copy to markdown. Yeah.

Starting point is 00:56:31 Whenever you have an exception. You can copy the markdown. That's the first sign. Yeah. Yeah. But I think there should. And in their docs too, you can like copy to mark down. But then it's like you can only copy to mark down the whole page.

Starting point is 00:56:42 And it's like, well, you know, maybe I only want to do this section or like I want to do one, two, three. I don't know. I think that's why the mentally fies. of the world, staying less, all these companies that do kind of like API docs and API generation from docs are like getting a lot of interest. I think you'll get more of that, but it's hard to get people to move over, you know? I'm sure you see it with like some of the source graph customers. It's like, how am I supposed to re-instrument this old code base that is like 15 years old and like... It's true, but what we have said is we explicitly are building

Starting point is 00:57:12 for the people that do want to move. And that's been so liberating. And I think that that's the great thing about what you see in the market today, which is like, you have all these companies that are like so AI first and like just use it and do great. And then you go on hackers and it's like, I've never got a single good result from AI. And I'm like, well, obviously that's not true. And like maybe the extreme is definitely true though. I think to me that's kind of like the thing is like the people that are spending $100,000 a year on AMP with two people. Obviously they're getting value. It's not like they love burning money. Yeah. But the people that are negative, To me, that's not always true because it's easy to be negative and like it doesn't cost anything, right?

Starting point is 00:57:52 To put a comment that is bad. And so what's going to be the thing that forces the rest of the market to be whatever, man, I just get on Am and like make that work? They just have to see this work once or twice. You know, we've been in developer tooling for a long time with Saucecraft. And it's always been hard for the last, say, 10 years to get a company to adopt a company to adopt a developer tool that does not immediately fit into their codebase because the codebase, that's the standard.

Starting point is 00:58:23 Everything else has to adapt to our codebase and our processes and whatnot. What we're seeing now with agents is as soon as somebody has seen what it can do, they have such a multiplying effect or they bring so much value that people are willing to adopt the codebase for this. Like the first time in how many decades where people are like, maybe our code base is wrong. Like maybe we should change the way we develop code to make more use of this. So I think people have to see this and then the agents will pull them along or like the, you know, the value this brings will pull it along. Yeah, I'm curious.

Starting point is 00:59:01 So I was on the board of a company called Launchable, which was founded by Kuzuke and Kawaguchi built Jenkins. And the idea behind Launchable was like, well, instead of running all of your tests, we'll use machine learning to figure out what tests are impacted by your PR and just run the small subset. of them. And I think like what we found, then the company got bought by copies. But it was like in a lot of companies who go in there and they're like, oh, well, how can we trust it though? Let's do a POC. And then you do the POC. And it's like it works great for the subset. Well, you know, work for the subset. But like, it's going to work for like the whole test week. Then you do a whole process. And I think with coding, it's like for some companies, it's like they see a work on one task. And they're like, it's worth trying on every task. And then there's another subset of companies that are like, well,

Starting point is 00:59:45 you know, it works a little bit on the front end, but it doesn't work on like my Java service back there, so I'm not going to use it at all. I haven't quite figured out what's going to be the market pressure to make those people move along, you know, but it's like you said, it's like for some people it needs to work once, maybe for some other people, it's got to be one task that always feel. My one task that I always use, we have built this kernel gym product, which is like an MCP playground and tester. And I have a task, which is like add yolo mode, which is, you know, let a user toggle

Starting point is 01:00:15 between auto running, which sounds easy, but it's actually quite hard without LLM's work to stop inference to approve a tool and then run it again. And every model was failing until GPD5 Codex and Codex CLI was like the first time I got in like one shot. It made the whole thing. And I wonder if everybody should build some sort of like four or five tasks that are like, okay, if you can actually do this end to end, then I'm like, I'm in. But I feel like people are still in denial of like that's going to work.

Starting point is 01:00:43 The same people. They don't want to have the conversation. conversation at all. If you look at that early adopter, the laggards, that chart of technology adoption, there's a reason why the early adopters are the tiny little start of the curve, you know, 3%. And it feels like so many of these arguments are people saying, well, what if we made a product that was for the early adopters, but somehow made the laggards also adopted early? Why aren't we going after that big market? It's the vast majority of the area under the curve. And it's like because they fundamentally do not want what you are building. And maybe they should. Maybe they're going to realize that,

Starting point is 01:01:21 but you're not going to make them realize it. Or if you waste your time trying to make them realize it, you're going to be trounced by hopefully people like us that are only focused on the early adopters. It's a total mindset shift. And if you are just focused on building something for early adopters and you literally do not care and you set up your entire business and product to not care, not have to care about the people that are laggards, you can do a much better job. And that's what we're experiencing now. Let's talk about the outer loop, because I think that's kind of like the next step, at least for me.

Starting point is 01:01:51 It's like, I think the coding agents themselves do great on on task by task basis, but then there's like, you know, PR review, which GitHub is like so slow and so clunky and it's so like order by file versus like, I think we should get to a world which is like more semantic. It's like, hey, you know, these are really like the 50 lines of code that matter to look at and everything else. It's like, it's fine. You can like skim through it.

Starting point is 01:02:14 how do you think about that when you want to, especially when you think about async agents, you know, there should be an easy way to spin them up, which I think is fairly clear. But then I'm not sure if there's yet an easy way to catch up on what they're doing. You know, what I found when I used conductor like Vibe Camben, it's like I spin out five, six of them. And I'm working on them and I kind of jump between them. And then my wife is like, let's have dinner. And then we have dinner. And I go back.

Starting point is 01:02:39 And I'm like, what the fuck is going on here again? It's like, which one is doing what? And it's hard to like just at a high level see what each of them is working on, where it's getting blocked. Have you guys seen anything that works there? Have you been thinking about building any tools in that space? I agree, right? I feel this too. I think, you know, with our internal experiments, I think, you know, for example, this idea of, well, I just spawned an agent and they work and I control them. I think Stevie is doing this and he has like a whole workflow around this and it seems to work for him. But for me, I guess I'm a one tasker in my mind. Like I need to, I can't do this.

Starting point is 01:03:16 Like, I cannot control five agents at the same time. And then when I do it asynchronously, I realize that I need to be really strict about how a review what they've done and that I also don't jump between them. And then it's also, you know, making sure that you don't miss anything. Like, I spun up so many agents and then haven't checked back on them because I forgot that they actually run. So that's something you need to build in the product. But yeah, I don't think it's. figured out, you know, like it's a, there's so much to do still. Yeah, it's wide open. We think of it right now, like if you're playing chess, you can play one board at a time or the people in New York City

Starting point is 01:03:53 Central Park who play against 10 different tables at once and they go and they sit down in front a table, they get oriented, they make a move, and then they go. And that's what we're trying to build. And it turns out, even if you've got a coding agent running in your editor in the CLI and then it makes a big diff, you've still got to understand it. And it just becomes even more important when you have a lot running in the background. So we want to make it easier to orient yourself with what's the change. And there's a lot of stuff that is not in the realm of coding agents that would help. Like having a deploy preview consistently available so you can just click and click through it. And then we want to make it fast for you to make a move and then, you know, get on with your next

Starting point is 01:04:29 thing. Yeah. Or, you know, just UI. So at a glance, you can see, I don't know what it is yet, but at the first glance, so you can see what the agent actually did without having to go and read Like the emoji summary, finally we have it and blah, blah, blah, stuff like this. But to come back to your question of like the outer loop, I think, and, you know, if Beyang was here, he would talk for a long time about this because he's passionate about it that the inner loop has changed a lot in that, you know, right test review and whatnot. It's that you now review a lot more code. And what effects does this have?

Starting point is 01:05:05 For me, for example, we don't do any formal code reviews on the AMP team, but it doesn't mean that code isn't reviewed because we use, you know, Am to write 80 to 90% of our code base. But that means everybody should review the code that the agent wrote. So it's reviewed by at least one person, right? And that's not reflected at all in GitHub yet. Like GitHub is still based on this other mode where you tag somebody. But then it's like, well, I actually went through two agents to produce this code and I reviewed it three times. Do I now tag five other people? And right now we're stuck in this mode where people would say yes, but I don't think it's going to hold. much longer.

Starting point is 01:05:40 Yeah. The other thing I noticed is like merge conflicts. Like I used to have very little because it's like, you know, I know what I'm working on. And if I'm doing multiple tasks, I know how this is going to impact that and I'm going to build towards it versus the agents, especially when you run them parallel. It's like they just start to change whatever it's convenient to them. And then it's like across them.

Starting point is 01:05:59 They're like changing the same thing. And so one thing we've been thinking about building is like, you know, how do you do better cross-agent orchestration of like these changes? So I built for the GPD-D-5 post as like task manager. There's like CLA first. And basically any agent can like append what files they're touching. And then they can read what files other agents are touching and see what those tips are to like implement them back.

Starting point is 01:06:23 But then I think the question is like, well, maybe what they're doing now doesn't end up being the final thing. And now you're wasting all these tokens. Like we're getting all these changes before review. I think at this point is like, is Git well designed for this future world that we're going into. You know, there's like, I think everything is back on the table. I think maybe, you know, five years ago, it was like, you know, there was like a couple of YC companies doing, oh, we're like a new version control system. And I'm like, look, man, I'm not, I'm not really interested in listening at this stage.

Starting point is 01:06:50 And same way, programming languages. It's like, you know, when Chris Lattner even started working on Mojo, it's like, okay, because of AI, I understand why you need to build a superset of Python. And I think now with agents, it's like maybe clear why TypeScript should win because type-checking is very good for the model to do self-improvement. What are like the other things? I think the interesting flex here is people assume that coding agents meet the bar of writing the exact same kinds of software to the exact same standard.

Starting point is 01:07:21 And that is not necessarily an assumption that end users, consumers will apply. If they have software that's much faster, cheaper, much more personalized, if they can conjure it up on their own, then yeah, you're going to tolerate if the loading state of this thing doesn't quite, you know, work correctly. So changing user demands and standards is an interesting thing that you can flex here. Yeah, what do you think about that? You know, we've been thinking about enterprise software moving more towards user generated content, which is like, hey, you know, expenses are like a great example of like, you know,

Starting point is 01:07:52 all these expense tools. Where there's so many companies when like the core action that you're doing is like take one line of expense and tag it with different things. But then you have to like set up all these categories. and whatnot versus like just generate it for my company and for like each team separately because they have different things. And it's like to me, that feels like more and more of that will become true. And then the real value is like, you know, what's kind of like the underlying data store or like data stores that like you're feeding into this. And I know some enterprises are building already

Starting point is 01:08:24 built kind of like internal like lovable basically. Yeah. Where each employee can kind of like create a simple tool and then they connect the tool to like internal data stores. And they might be the only users of it. There's nobody else that does it. And I'm curious how you guys think about. I know that bold that new, for example, now has clock code integration. Like, where do you see the line move between like software engineers build software and like obviously AMP is like a great tool for that versus going more upstream, which is like any non-technical people can also plug into the code and like build things on top of it. That feels in a way very different, but also very similar in like the challenges that you need to solve for.

Starting point is 01:09:05 I think this idea of non-technical is the wrong way to look at it. There are always going to be people that are good at unambiguously specifying what they want out of a computer. And we've had non-coders, including one of our board members, who built something with AMP that replaced like 250K a year piece of software that he used for a lot of their internal fund tracking. He maybe took one computer science class. He hasn't really coded, but he's a really smart guy.

Starting point is 01:09:32 and he knows how to unambiguously specify what he wants to his CEOs, certainly, and now to a computer as well. So if you can get people like that, a tool that's really powerful, they don't think of themselves as a non-technical person. I think that's just such a bad mindset. So we want to build for the power user. And if that person has not been a coder, but they can pick it up really quickly, that's great. Again, we're completely focused on the people that know how to and want to get the very best out of this and that want the agent to win that aren't trying to be like, oh, you know, nada. Hey, it didn't do this thing.

Starting point is 01:10:06 Tell me when it does. Yeah, we had this at the start a lot where people, whenever you have like an AI tool, I think there's a natural tendency by engineers to get it in a gotcha moment. You know, like, oh, I asked this. And it didn't know this. And it's like, are you trying to get something out of it or are you trying to get it to fail? and, you know, it's not worthwhile to build for somebody who doesn't want to fail. Yeah, actually, if you fast forward how the world is going,

Starting point is 01:10:34 you're seeing already over the last few years, companies have really slowed down their growth in engineering headcount. This is a global phenomenon. You're seeing engineers like here on the AMP team and other companies that are using agents really heavily. They're cutting out the middlemen. They're putting the people who are building the product closer to the customer because you can go and hear an idea from a customer

Starting point is 01:10:55 literally in the meeting, you can kick off an agent to go and build it, and then you have a first draft of it. So overall, the person who's using the coding agent is getting so much closer to the problem. They're also going to share more in the rewards from solving the problem because without needing to share the profits with everyone else, there's naturally more to go to them. So I think if you fast forward this, it's not that the firm or big companies are going to completely go away, but you're going to have people that have an incredible vision in their head and that are so close to the problem and have an incredible incentive to go solve that

Starting point is 01:11:25 problem, equip them with a coding agent. If you're going to build a coding agent that those people want, that is way better and more valuable. And you're creating more value. You're allowing more new things to be created in the world than if you were building a coding agent that is for the median developer that makes them 30% better. So that's who we're targeting. And I don't think that that will necessarily look like vibe coding.

Starting point is 01:11:47 Vibe coding is this really unproductive thing to discuss because everyone has a different definition of it. And too often, it's having the agent write code with poor feedback loops and poor quality control. And I don't think that that's valuable. But it's giving that person the ability to build something truly great really fast when they're so incentivized and they will have every desire for it to work well. Yeah. And I know we're getting close to time, but a couple things I want to touch on. So Thurston, I was reading through your blog. You left Source Graph a year and a half ago, then you joined back. Good job, Quinn. Bring him in home. Thank you, Thurston. But when you've wrote a post about leaving, one thing you've wrote is that when you first

Starting point is 01:12:26 joined in 2019, one thing that Quinn told you is like, hey, search graph is your playground. And you have skills and talents. And I want you to use those skills to, like, you know, move the company forward. How do you take this idea of like the power user, getting close to the customer and like how people are going to build teams overall? Like there used to be engineering and product, like you were saying, the triangle. That's kind of going away. what are the type of people that you think are going to be most successful?

Starting point is 01:12:55 Like, how should people think about structuring teams? It's like, obviously you're doing this with AMP in a way, right? You're like building a sub-team and sub-product within a larger company. Any tips that you have for other founders and executives? Thurston is incredible and AMP would not exist in any way without him. He has strong internal constitution of how he uses it and what's real and what's not. And it's so easy to get carried away with the hype, the possibilities, especially when you see other people, a lot of other smart people who are getting carried away by it.

Starting point is 01:13:25 Thurston has this incredible ability to stay grounded. And that, with everything changing so fast, with it being such a hype cycle right now, that's really important. Also, just these first principles thinking, like how we've completely rethought, how we build everything in AMP, based on how should we actually do it rather than what has come before. Dorson is the rare person who's been at bigger companies who's seen how source graph, how we build enterprise software and not the Google way, but in a different way, and has taken the parts of it that work and not the parts that don't. So all of that combined with someone who's an incredible engineer,

Starting point is 01:14:02 incredible writer, communicator, that's a really powerful combination. So find those people. And then what I said when he rejoined is he is the dictator that made him feel really uncomfortable, as you can see. I hope you cut to his face. But that's exactly what you have to do and had just put so much trust in people like that. And that also shows everyone else in the company that they can do crazy stuff, that they can go way beyond. They can take it to the extreme. They can make mistakes. And that's still okay. Because we're not trying to build something that's going to go really big in the current state. Amp is growing incredibly fast. But the most important thing is

Starting point is 01:14:43 we're building the coding agent God, that thing in the future, and that's something that we're all in search of. So none of the mistakes, none of the successes in the month-to-month time frame really matter. It's all about getting ourselves in the right trajectory. And you've got to do crazy stuff. So equipping Thorsten to do crazy stuff and to take the ideas that he has and make them scale up with all the reach that Sourcegraph has. That's been my goal. On the first principle thinking, how, do you think about that? And so there's the word of EV-Alice and there's the world of vibes, right? Yeah. How do you approach it? Like, how do you look at the product and you're like, okay,

Starting point is 01:15:23 this is good, this is bad? This is what we need to improve. Is there something formal that you guys use internally or is it mostly you as the dictator directing? Well, two-part answer. I think the first part is to also answer the other question a little bit is what I've seen become more important or the shift I've seen is that, you know, I said the triangle of PM designer engineer, I think as an engineer or any of the three, you now need to know a lot more about the other parts. Like as an engineer, you cannot see yourself as the person anymore who types out a spec or turns a product, PRD into code. I think you need to be aware of business. You need to be aware of a product. You need to know and have some taste for software. Otherwise, I think the value of your work will diminish over time because the

Starting point is 01:16:16 pure typing out of code, for most of the code, you know, exceptions being a John Carmack and, you know, whatever, for most of the code, I think the value will diminish. And we've already seen this like compare a GitHub contribution chart today, it's value to save two years ago, right? And to come back to the second part, like vibes and, you know, whatnot, I think we don't have any said evals. We don't. And this was controversial up until a week ago, I think, when I I think Barr is from, or two weeks ago from Anthropics that they don't have for the coding agent too. But we don't and we haven't had them. I've built evals before. I fine-tuned models before. I know that they're good. I love evals. I was addicted to LLM as a judge. I wrote about

Starting point is 01:17:00 LLM as a judge. But for a coding agent who's supposed to work in many different code bases, who's supposed to work with many different types of prompt, who's supposed to work with many different type of tasks. It's a time investment that we cannot afford with everything changing and having to stay fast. And if you ship 20 times a day, you will get a lot of good feedback. I swear you, I could tune my system prompt a little bit now. And then I would say by this evening, people on our team would go, why does it call this tool so often? Like, what's going on? What did we ship? And that's incredibly valuable feedback. And that's incredibly valuable, you know, when people dog for the product and use it all day. And how do I make these calls? I don't know. Like I think it's experience of

Starting point is 01:17:45 like I think about software a lot. I love using software. I listen to a lot of business podcasts. I read a lot about business. I listen to a lot of software podcasts. I read a lot about software. And then I try to project like what does the business need? How can we get growth to 10x? How can we get our users to 10x. How can I use my engineering capabilities to serve as a function of the business to reach those goals? How can I organize the team or get the team to help me reach those goals or together reach those goals? And, you know, it's hard to explain, but it's like I feel like in this year, truly here at SoftCraft, like everything I learned over the last, say, 15 years of my career is, is coming together in the sense that all of the hours spent listening to the Acquired podcast,

Starting point is 01:18:34 to help me as much as, you know, reading hacker news for how many hundred hours and writing code for how many thousand hours, you know. And with code being now this tool that you can wield much easy or much fast or much more often, I think it's become much more important to how do you want to wield it and when and for what reason. I think the hard to explain is a great explanation why, you know, you just cannot one shot create these things because there's a lot of, you know, implicit preference. Awesome, guys. Anything to wrap call to actions? Are you hiring? Like, who should reach out to you, request for startups? What should people build that is going to be helpful to you guys? Yeah, I don't know. I don't know. We're always interested in talking to

Starting point is 01:19:18 fellow engineers who are interested in agented programming, figuring new stuff out. We want to hear from them, like what works and doesn't work. We're always willing to hire people with exceptional talents who are fully in this and realize that, you know, programming is changing a lot. And I don't know. What else? If you want to come on this journey with us and see where coding agents are going, then come along. Yeah, use AMP, send us your feedback. And we are just so excited.

Starting point is 01:19:49 We feel like kids in a candy shop, just that we get to go build the future of coding. It feels like the final boss. Yeah. Nice. Thank you guys for coming on. This was fun. Thank you.

Latent Space: The AI Engineer Podcast - Amp: The Emperor Has No Clothes

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.