The Pragmatic Engineer - Building Claude Code with Boris Cherny

Starting point is 00:00:00 What happens when you join one of the top AI labs in the world, and your first poll request gets rejected, not because the code was bad, but because you wrote it by hand. This is exactly what happened to Boris Churny when he joined Antrophic. Boris is the creator and engineering lead behind ClaudeCode. Before joining Antrofic, he spent seven years at Meta, where he led Code Quality Cross, Instagram, Facebook, WhatsApp, and Messenger, and was one of the most prolific code authors and code reviewers at the company. In today's episode we cover how Claude went from a side project to one of the first thing. fast as growing developer tools and the internal debate on traffic whether to release it at all.

Starting point is 00:00:34 Boris' daily workflow of shipping 2030 pull requests a day with zero handwritten code and how code review works when AI writes everything. Why Boris believes we're living through its high mass transformative as a printing press and which engineering skills matter more now and which ones do not. If you want to understand how one of the people closest to AI coding agents actually built software today and what that means for the rest of us engineers, this episode is for you. This episode is presented by Statsig, the Unified Platform for Flags, Analytics Experiments, and more. Check out the show notes to learn more about them and our other season sponsors Sonar and WorkOS. How did you get into tech, software engineering, and coding in general?

Starting point is 00:01:13 It starts a while back. I think there was kind of like two parable paths that crossed. So when I was maybe 13 or something like this, I started selling my old Pokemon cards on eBay. And I realized that on eBay, you can actually, like, write HTML. And I was looking at other people with Pokemon card listings. And I realized, like, some of them have, like, big colors and fonts and stuff like this. And then I discovered the blink tag. And I really was blink tag.

Starting point is 00:01:41 And if I put the blink tag on it, I could sell my card, you know, for, like, 99 cents instead of 49 cents or whatever. So I kind of learned about HTML this way. Then I got an HTML book and kind of learned about HTML. And then the second thing was this was, this was. also, I think, sometime in middle school, we had these old TI-83 graphing calculators, and we used them for math. And what I realized is I can get a better answer on the math test if I just program the answers to the math test into my calculator. And so I wrote these little programs, to just program the answers. And then the test got harder. So then I had to program solvers

Starting point is 00:02:14 instead of the actual questions because I didn't know what, you know, the coefficients and stuff would be ahead of time. And then the math got more advanced like the next year. And so I had to drop down from basic to assembly to just make the program run a little bit faster. Oh, so in high school, you dropped down to assembly. I think this is like middle school or high school, maybe like eighth or ninth grader. Something like this. Then the thing I realized is everyone in my class was starting to realize that I had the solver and they got kind of jealous.

Starting point is 00:02:40 And so I bought this little cereal cable so I can give it to them too. And then the next math test, everyone on the class just got A's. And the teacher was like, what's going on? Eventually she realized it. She was like, okay, you get away with it once. and knock it off. But for me, it was very practical. So, you know, in school, I studied economics.

Starting point is 00:02:59 I actually dropped out to start startups. And I never thought that coding would be a career at all. It was always very practical to me. Coding is a means to build things and to make useful things. This startup, the first one was, I think it's like my friends and I were trying to get weed. And so we started this like weed review startup. We made like a website. We called kind of different dispensaries, I think.

Starting point is 00:03:26 And then we just tried to get kind of like weed samples so we could like review it for them. And it actually kind of blew up. And then I actually got more interested in at the time no one was like testing this stuff. And so I got into kind of the like chemical testing kind of chemical analysis. And then after this kind of did a bunch of other startups. And then I joined YC actually pretty early. And that was the first hire of this website. YC startup up and up and Polo Alto after.

Starting point is 00:03:54 How does you decide to go to one start up after the other? Kind of vibes, vibes, I'd say. Because you know, you know startups, it's never a linear path. You always kind of pivot, pivot, pivot. You have to figure out what the market wants and what users want. And it's never the thing that you think. You always try a thing, but the idea is always a hypothesis. And then almost always you have to pivot once, twice, three times.

Starting point is 00:04:15 You know, at this medical software company, this is called Agile Diagnosis. This was kind of an early YC company. This was back in maybe 2011, 2011, 2012, something like that. It was medical software for doctors. And the idea was there's these clinical decision protocols. They vary a lot hospital or hospital. And our idea was there was one hospital in Chicago that had a really great protocol specifically for cardiac symptoms.

Starting point is 00:04:39 And so we're like, wouldn't outcomes be great if every hospital in the U.S. would use the same protocol? And so we tried to standardize it. And we made this like decision tree software for doctors, to use. And I wrote, you know, some of the software. The team was like, it was a, it was just a few of us. It was a pretty small team. And I wrote the software. It was in a web browser. And I remember this was back in the internet explore six days. That's what hospitals were using. And I wrote this like SVG render because it was this visual decision tree. And we launched it and then we had a DAU chart. And the DUs were flat and couldn't figure it out. And we were piloting it with a few hospitals at the time. And I, and I At the time, we were based in Paloota, we were piloting it with, you know, a few hospitals, including UCSF. And I rode a motorcycle at the time. So I rode my motorcycle up to, you know, UCSF.

Starting point is 00:05:28 And I shadowed doctors for a couple of days just to see how do they actually use this. And I realized that actually doctors don't have time to sit down and use a computer because you're seeing a patient. Then you have maybe five minutes until the next patient. And in those five minutes, you have to walk down the hall. You have to go to the computer station. You have to open up this totally legacy computer. By the time it boots up, that's like three minutes. Then you open up in our Explorer 6.

Starting point is 00:05:56 That takes like 30 seconds. Then you have to open up this like app that we built. You have to sign in. And your five minutes are up. You don't even have time to use it. And so we rewrote everything to run on Android and they still weren't using it. And the thing we realized is doctors are walking around with a bunch of residents behind them. In this kind of situation, it's like a social situation, right?

Starting point is 00:06:13 Like the thing that matters is they're seen as an authority. they don't want to be seen on their phones. And then we pivoted again. So at that point, we were like, okay, so maybe the doctor isn't the target user. Actually, we wanted to be used by maybe nurses or x-ray technicians or something like this. At that point, I left because I was like, this is actually pretty far off from kind of what I wanted to do. This is like the most fun thing for me is finding this product market fit because it's always surprising. You can't have one big idea because the idea is probably going to be wrong.

Starting point is 00:06:43 So you kind of form hypotheses. is you follow it down and you see what's right. Also, I find it's so interesting how you're telling us this story, because I feel behind a lot of startups success stories, we hear the success story, we hear the path of how it went, but first of all, a lot of starters are like this.

Starting point is 00:07:00 And second of all, what struck me is you were hired as a software engineer, right? And this was back before product engineers or anything was the thing, which we're now talking about, but you just like, you rode your motorbike, and you went there and you shadowed the people, and you understood how they're using it, why they're not using it,

Starting point is 00:07:18 and getting ideas. I feel, you know, this is what makes a great software engineer back then and even today, right? You weren't, it doesn't seem to me that you were focused on a technology. You were focused on the outcome, though. Yeah, I mean, look, there's different kinds of engineers

Starting point is 00:07:34 and there's different ways to do it. And, you know, even on our team right now, I look at an engineer like Jared Sumner, and he's just incredible technical mind. He understands systems better than anyone I've met. And, you know, you need people like this. You need people with this kind of depth. For me, engineering has always been a practical thing.

Starting point is 00:07:54 And, you know, for me, I've always been a generalist. And, like, it doesn't matter if I'm doing, you know, like design or, you know, if I'm doing engineering or user research or whatever. The investment thesis for AI in software engineering is straightforward. As AI writes more code, more code needs to be verified. But there's a catch. AI-generated code is, on average, harder to verify than human-working code. This is why they're Sonar, the makers of Sonar Cube. As a critical verification layer for the AI-enabled world,

Starting point is 00:08:23 Sonar ensures that speed and volume with AI does not compromise your code base. Sonar's competitive position is built on 17 years of specialized expertise that no foundational model can replicate. We're talking about deep analysis engines, like symbolic execution and cross-repositive data flow tracking, that simulate how code actually be able to. behaves, not just what it says. To bridge the divide between AI productivity and code quality,

Starting point is 00:08:47 Sonar has released the SonarCube server. This tool acts as a universal translator between AI applications and the SonarCube platform. By using the model context protocol, it gives AI tools like ClaudeCode, GitHub copilot, and cursor, direct access to SonarCube's analysis capabilities. Instead of context switching, your AI agent becomes a full-flesh code review and quality assurance co-pilot, capable of analyzing code snippets for issues, filtering bugs by severity, and even checking your project's quality gate status before you ever commit code. Whether you're working with coding assistance or scaling up with full-agentic workflows,

Starting point is 00:09:23 Sonar provides the automated verification that 75% of the Fortune 100 rely on. It's about giving your developers a freedom to innovate without the fear of breaking the code base. Head to SonarSource.com slash pragmatic to learn more about how Sonar enables the confidence to develop at the speed of AI. With this, let's get back to Boris's career and what he learned working at startups. My first job I ever had, I was like, I think I was 16, and I just wanted to buy an electric guitar.

Starting point is 00:09:50 And so what I did was I just started freelancing. And so I was like, okay, I guess I'll make websites. And I think Fiverr was not a thing back then. So there was some other freelancing websites. So I just started like, I put up a website, I started bidding on stuff. And my first paycheck, I just spent the entire thing on an electric guitar. But it was very practical, right?

Starting point is 00:10:07 Because it's like when you're in this kind of setup, you have to do the engineering, you have to do kind of the accounting, you have to do the design, you have to talk to customers. So it's just always been like that for me. After a couple of these startups, you ended up at Facebook now called Meta. And there, you spent seven years there. Can you just talk us through what you've worked there, what you've learned there? You've also had a very remarkable career growth in terms of four promotions over seven years. And what do you take away from that experience? experience. Yeah, so I started on Facebook groups. That was the first time I worked on, Vlad Kolesnikov hired me. I think he's actually still a Facebook. I think he's on some other team now. And it was cool, actually, there's a big group of people that I worked with that were these kind of early JavaScript people, too. And, you know, like, I did a bunch of JavaScript stuff. And it's funny, like, I kept crossing paths for these people. And so Vlad, he worked on Boltjs, which was the software, it was the framework that powered ads manager.

Starting point is 00:11:07 which later became React.js. I kept crossing paths with these people. And later on, yeah, later on, there was a bunch of more people like this. But anyway, so I was working on Facebook groups. I was really excited about it because because of this mission of connecting people to their community, this is the thing that drew me in.

Starting point is 00:11:26 And at the time, I was a big Reddit user. I became a Reddit user back when I was a teenager because I didn't know anyone else that coded. Even in college, I didn't really know anyone that coded. And honestly, I was always kind of embarrassed about it because I thought it was this nerdy thing. And I thought it was kind of this thing that I knew how to do. But I wanted, you know, I wanted to be like a cool kid. And, you know, like I couldn't like tell people that I coded.

Starting point is 00:11:49 It was like it was very nerdy. And at some point I discovered it was some like programming community on Reddit. And I was just shocked. Like there's other people that are into this thing. It's like such a weird hobby. It's so niche. And it was just so exciting to find like minded people like this and get this connection. And so I just wanted to.

Starting point is 00:12:07 work on this. I wanted to kind of contribute to this in some way. So I worked on Facebook groups for a while, and then, you know, there's a bunch of different projects. I have to kind of get into details for any of these. Eventually I became the tech lead for Facebook groups and kind of grew into this. And the org grew, the work really changed. It changed from kind of building to a lot of like dock writing and coordination and kind of delegating to other. The culture was changing at the time. So, you know, this early Facebook culture was disappearing. The docs were coming. in, the alignment meetings were coming in. There was a lot of, a lot more work around this kind of foundational stuff, like privacy,

Starting point is 00:12:44 security, things like this, that I think honestly early on, a lot of corners were cut in order to grow. But at some point, you just have to pay that debt. And that was the time when that happened. Then I spent a few years at Instagram after. And that was also a funny story. My wife got a job offer. And she was just really excited about it.

Starting point is 00:13:02 And she came to me and was like, hey, like, I got this offer, but we're going to have to move. Is that okay? And I was like, yeah, that's fine. You know, like I work in tech. We can work remotely anywhere. Where's the job? And she was like, it's a NARA.

Starting point is 00:13:13 And I was like, where's that? And NARA is like rural Japan. And this was a- Different time zone as well. Different time zone. Yeah, this was 12 hours or something different or something like that. Something like that. Yeah, it was like 2021.

Starting point is 00:13:25 Wow. And then I tried to kind of find a team that would sponsor me because there was there were these kind of arcane HR rules about like the time zone you have to be in and the team you have to be co-located with and so on. And so there was a little kind of nascent. team for Instagram in Tokyo. And Will Bailey was running the same. He was also the guy that made Instagram stories.

Starting point is 00:13:44 And so he was my manager for a while. And so we decided to grow that team together. And I worked remotely from NARA and then most of the team was in Tokyo. And during this time, I started hacking on Instagram and the stack was just insane. Like, Facebook was the single best web serving stack in the world. The way that each each, everything is optimized. Like from the hack language to the HHVM runtime to GraphQL as the transport layer to like the client libraries like relay and all the stuff. It was just in React.

Starting point is 00:14:16 It was just amazing. There's no other devstack in the world that was this good. And it is just fully optimized. And then I went to Instagram and it's like, you know, Python where the typechacker didn't work. And click to definition didn't work. And it was this like kind of hacked together Django and then like a fork of, you know, the Scython runtime. and just nothing really worked. And so I came to Instagram, I joined the labs team in Japan,

Starting point is 00:14:41 and the idea was to find the next big thing for Instagram. We tried some stuff, but what I very quickly realized is that I was just not effective at working on the stack because it was such a terrible stack. And so I just went and started working on Devinfra because we needed to fix it. And there's a few projects that we worked on. So one was migrating from Python to the big Facebook monolith,

Starting point is 00:15:02 another one was migrating from Rest to GraphQL. and these projects, they're actually in progress. You know, like these are things that involve. It takes hundreds of engineers many years to do this. It's a big code base. It's a big migration. Now it's much faster. Yeah, with these tools that we have, the AI tools.

Starting point is 00:15:19 Migrations are a pretty good use case for them, though. Yeah, it's like the, it's the perfect use case for it. And then I just started getting kind of deeper into this. And by the end, by the time I left Instagram, so I was working on this on David, friend, kind of leading a bunch of these migrations. That's also where I intersected with Fiona Fung. who is now the manager for the cloud code team. I just worked with her and she was just such an amazing leader,

Starting point is 00:15:39 this incredible depth and kind of history and tech. And I just thought like there's no better manager for this team. And then I also started working on code quality. And so the work on Instagram kind of expanded a bit. And by the time I left, I was leading code quality for all of meta. And so I was responsible for the quality of the code bases across Instagram, Facebook, Messenger, WhatsApp, reality labs, kind of all these code bases. is at meta, it was this program called Better Engineering.

Starting point is 00:16:06 And the idea was, I think it's sort of like 2016 or 2018 or something. But Zuck mandated that every engineer of the company, 20% of their time has to be spent fixing tech debt. Oh, interesting. And we called this Better Engineering. And some of this is kind of bottom up where, you know, a team knows best the tech debt that they have to fix. And then some of it is dropped down where you need to do, you know, very big migrations. You need to migrate to new language views. features, new frameworks, things like this.

Starting point is 00:16:35 And at Facebook scale, you know, there was tens of thousands of these migrations every year. And so I just sort of leading all this. And I realized very quick that you just needed a little bit more order to it. There was no goals. No one knew kind of like what the outcomes were. There wasn't any tracking. And so we developed a bunch of stuff. One of the ideas was a centralized way to prioritize the different kind of code quality efforts.

Starting point is 00:16:59 The second thing was figuring out the impact of code quality on engineering productivity, which turned out to be significant. How did you measure? What did you find there? There was a bunch of stuff. I think some of this has been published. I don't know if all of it has, but essentially you try to do like causal analysis and causal inference.

Starting point is 00:17:13 This is the methodology. You try to figure out like what are the factors that make it so engineers are more productive? Some of it is code quality. Some of it is outside of code quality. So for example, META went back to, you know, return to office instead of work from home.

Starting point is 00:17:26 Those partially driven by this. Because we just found some, you know, fairly strong correlations that we thought were causal. Yeah. about this. But code quality actually contributes like, you know, double-digit percent to productivity. It turns out, even at the biggest skill.

Starting point is 00:17:39 It's kind of comforting to hear because I think it's rare to have a place where you actually measure this, but I think we feel it. Like when you have a clean code base, modular, or it can get easier to work with. And I think, you know, reasoning could it also be easier for LMs to work with it? And my hint would be, yes, it should be, right? But I think there's just very little data But that's the feeling that I would have Yeah, I think a lot of the big companies have published about this

Starting point is 00:18:07 Like I think Facebook published something Microsoft publishes a bunch about this, Google does But yeah, totally. If every time that you build a feature, you have to think about do I use framework X or Y or Z? These are all options that you can consider because the code base is in a partially migrated state where all of these are around the code somewhere.

Starting point is 00:18:25 As an engineer, you're going to have a bad time. As a new hire, you're going to have a bad time. as a model, you might just pick the wrong thing. And then, you know, like the user has to, of course, correct you. So actually, you know, the better thing to do is just always have, you know, a clean code base. Always make sure that when you, when you start a migration, you finish the migration. And this is great for engineers. And nowadays, it's great for models, too.

Starting point is 00:18:46 And then you joined Entrophic. And I've heard the story, which you can confirm or give more color to it, that your first pull request was rejected by Adam Wolfe. He was my ramp up buddy. So I joined Anthropic. I was trying to figure out kind of like what to do next. and, you know, I met a bunch of people at all the different labs, and Anthropic was just the obvious choice for me because of the mission.

Starting point is 00:19:06 This is the thing that personally I know that I need the most. And also just kind of seeing all this change that's happening, it's important to have some sort of framework to think about this and to think about our role in it. I'm also a really big sci-fi reader. Like, that's definitely my genre. I'm a big reader to have, like, you know, a giant bookshelf at home and stuff. And I just know how bad this thing can go.

Starting point is 00:19:25 And I just felt like this is a place that has serious thinkers. people are taking this very seriously and thinking about what can we do to make this thing go better. So when I joined Anthropic, I did a bunch of ramp up projects, just, you know, various stuff that I was hacking on.

Starting point is 00:19:39 And I wrote my first pull request by hand because I thought that's how you write code. That used to be how you write code. They used to be how you write code. But even at the time, at Anthropic, there was this thing called Clyde, and it was the predecessor to QuadCode.

Starting point is 00:19:53 It was super janky. It was like, it was Python, you know, it took like 40 seconds to start up. It was a research code. It was not agentic. But if you prompt it very carefully and hold the tool just right, it can write code for you. And so Adam rejected my PR.

Starting point is 00:20:08 And he was like, actually, you should use this Clyde thing for it instead. And I was like, okay, cool. It took me like half a day to figure out how to use this tool because you have to like pass in a bunch of flags and like use it correctly. But then it sped out a working PR. It just one shot at it. Oh. And this was like 24. It was like September, 24, August, something like that.

Starting point is 00:20:31 And I think for me, this was my first field of AI moment at Anthropic, because I was just, oh my God, like, I didn't know the model could do this. Like, I was used to these, like, kind of tab completions, line level completions and an IDE. I had no idea that it could just make a working pull request for me. Boris just talked about how he had a true wow moment at work using their AI model. A very different wow moment is when you use a tool at work that makes things so, much easier than before. And this leads us nicely to our presenting sponsor, Statsig. Statsig offers engineering teams of tooling for experimentation and feature flagging that used to

Starting point is 00:21:07 require years of internal work to build. It's the kind of tool that was so complex to build that only large companies like META or Uber had their own custom advanced tooling for it. Here's what Statsig looked like in practice. You ship a change behind a feature gate and roll it out gradually, say, to 1% or 10% of users at first. You watch what happens, not just did it crash, But what did it do to the metrics you care about? Conversion, retention, error rate, latency. If something looks off, you turn it off quickly. If it's trending the right way, you keep it rolling forward.

Starting point is 00:21:37 And the key is that measurement is part of the workflow. You're not switching between three tools and trying to match up segments and dashboards after the fact. Feature flags, experiments, and analytics are all in one place, using the same underlying user assignments and data. This is why teams at companies like Notion, Brexon, Atlastion, use Statsig. Statsic has a generalist feature to get started. and pro-pricing for team starts at $150 per month.

Starting point is 00:22:00 To learn more and get a 30-day enterprise trial, go to statics.com slash pragmatic. And with this, let's get back to Boris and the origin story of ClaudeCode. And then when you joined Entrophic, we've covered this in a deep dive, but we could recap briefly on how ClaudecoteCode came to be out of what seemed like a side project

Starting point is 00:22:21 or just a cool hack. So, yeah, I started hacking on a bunch of different stuff. I was working on some things in product. I worked on reinforcement learning for a little bit just to kind of understand the layer under the layer of which I was building. This is still advice that I give to a lot of engineers is always understand the layer under. It's really important because that just gives you the depth and you kind of like you have a little bit more levers to work at the layer that you actually work at.

Starting point is 00:22:45 This was the advice 10 years ago. It's still the advice today. But the layer under is a little bit different now. You know, before it was like understand, you know, the Java. If you're writing JavaScript, understand the JavaScript VM and frameworks. stuff. Now it's like understand the model. So I was hacking on a bunch of different stuff. Something's shipped. Something's didn't ship. And at some point, I just wanted to understand the public Anthropic API because I'd never used it before. And I didn't want to build a UI.

Starting point is 00:23:10 I just wanted to, you know, hack something up quite quickly because we didn't have cloud code back then. We're still writing code by hand. And I wrote this little batch tool that all we did was it hit the Anthropic API. And it was essentially like a chat based application, but just in the terminal because that's what AI used to be. And, you know, I still think about it. Like, engineers are the first adopters. And so when we started to move out of conversational AI to agentic AI, it took a little bit, but engineers understood it up pretty quick. And I, I think now when you ask non-engineers about, like, what is AI? They would say it's this conversational AI. It's like a chat pot or something. And that's why I'm actually very excited for,

Starting point is 00:23:52 you know, co-work, this new product that we launched. because it's going to bring the same thing that engineer saw very early to everyone else. But when I think about, you know, co-work, I think back to this moment that we're talking about, like, very early on. Quad code originally wasn't quad code.

Starting point is 00:24:08 It was a chatbot because that's what I thought AI was. But we had to kind of figure out kind of what is the next thing. And so at the time, I build this chatbot, it was somewhat useful, but it was just a chatbot. And the next thing that I tried was I wanted it to use tools because tool use just came out and I didn't know what it was and I was like, let's experiment.

Starting point is 00:24:29 And I gave it a single tool, which was the bash tool and I didn't know what to do with the bash tool and so I asked it, you know, like, I actually didn't know if it could even do this, but I asked it like, what music am I listening to? And it just wrote a little Apple script program using like said or whatever to open up my music player

Starting point is 00:24:46 and then like query it to see what music it's listening to and just one shot at this. With Sonnet 3.5, this is actually my second field ajii moment very quickly after the first one and the model just wants to use tools that though that's that's just what I realized like this thing like if you give it a tool it will figure out how to use it to get the thing done and I think at the time when when I think about the way that people were approaching AI encoding everyone essentially had this mental model of you take the model and you put it in a box and you figure out like what is the interface

Starting point is 00:25:23 like how do you want to interact with this model? What do you need it to do? Essentially, it's like if you have a program, you stub out some modules, stub out some function, and you say, okay, this is now AI, but otherwise the rest of the program is just a program. And so this is just not the way to think about the model. The way to think about it is the model is its own thing.

Starting point is 00:25:40 You give it tools. You give it programs that it can run. You let it run programs. You let it write programs. But you don't make it a component of this larger system in this way. And I think there's just like, you know, this is a version of the bitter lesson. there's the bitter lesson is a very specific framing but there's many corollaries to it this is one of

Starting point is 00:25:58 the corollaries is just let the model do it do its thing don't try to put it in a box don't try to force it to behave a particular way one of the first ways you saw it was giving it tools giving access to the bash and then later to the file system and then to more tools right that's right yeah we we give it uh we give it bash then uh i say we it was just me the first three months but then the team grew. So it was bash, it was, and file edit, that was the second one. And one of the interesting thing we talked about last time for the deep dive is when you built it and it started to actually write code with the tool, with adult tools that you had, you've had an internal debate inside entrophic. Should we just keep it to ourselves? Because it's making, suddenly,

Starting point is 00:26:40 it spread across engineering and it was making all of you a lot more productive, right? Yeah, that's right. In the end, the decision was to release so that we can study safety in the wild. Because when you think about safety, and I keep talking about the word safety, the reason anthropic exists as a lab is safety. This is the reason it was founded. This is the reason it exists. If you ask anyone at Anthropic why they chose it, it's because of safety. And so if you think about model safety, you know, there's different layers at which to think

Starting point is 00:27:05 about it. There's kind of alignment and mechanistic interpretability. This is at the model layer. Then there's evils and this is kind of like a, it's kind of putting the model in a petri dish and synthetically studying it in this way. And then you can study it in the wild. And you can see how it actually behaves. You can see how users talk about it.

Starting point is 00:27:22 You can see, like, what are the risks in the while? Then you actually weren't a lot this way. And by doing this, we've been able to make the model much safer. So in hindsight, it was totally the right decision. It's amusing to hear about it from your perspective, because from the outside, what I saw and what a lot of engineers saw is like, oh, entrophic release, Claude Code. Oh, wow.

Starting point is 00:27:43 For the first release with, I believe it was with Sonet 4 release, did it come out with Sonat 4 originally or Sonat 4.5? I think it was for, that was the general availability in February, but I think it was research preview before that. Yeah, but when it came out, my interpretation was like, oh, this thing can write code pretty well, and over time it became a lot more capable. So from our perspective,

Starting point is 00:28:07 it was like this really capable coding tool that we just started to adopt and use and use for all sorts of increasingly productive part. And it has become, I believe, one of the fastest growing developer tools. And I'm always surprised to hear the story that it actually comes from research and the goal to understand how people use the model. Because at the other hand, like some startups have been trying to build developer tools

Starting point is 00:28:33 deliberately to get adoption. And yet this research tool is getting a lot more adoption. I mean, this is a, you know, Anthropic, we're a research lab, we're a safety lab. And, you know, product is this kind of thing tacked onto the side. Product exists so that we can serve research better. and so we can make the model safer. And this is kind of how we think about everything. There's also this funny moment early on

Starting point is 00:28:54 when we had this launch review and we were deciding whether to launch it. I remember this moment because we were in the room. I think there was Mike Krieger, there was Dario, there was some other folks in the room when we were deciding what should we do. We were looking at the internal adoption chart, which was just vertical.

Starting point is 00:29:10 Like since we said, it was just insane. It was, you know, like nowadays. It's 100%, right? Just 100%. Like nowadays, nowadays everyone at enthrer, Every technical employee at Anthropic uses quad code every day is pretty much 100%. For non-technical employees, it's also like, it's actually getting quite close to 100%. It's increasing very quickly.

Starting point is 00:29:28 Like, you know, like half the sales team uses quad code. And I think that's increasing. It's just, it's crazy. Dario had this question about like, how did it grow this fast? Are you like forcing people to use it? And I was like, no. We offer this tool. People vote with their feet.

Starting point is 00:29:42 And, you know, just like, let people use the tool that they prefer. Yeah. And they chose it. You don't seem like the person who's exactly forcing people to use your tool. Yeah, yeah. I mean, the way we did it, we just, we launched a thing. And then we just like listen to the users. And we talked to people.

Starting point is 00:29:57 We saw how they use it. We followed up. We made it better. And yeah, I mean, now we're at the point where quad code writes, I think, something like 80% of the code at Anthropic on average. And, you know, it writes all my code for sure. Yeah. And this started for you, it started the first time you mentioned, I think it was in November

Starting point is 00:30:13 when it started to write all of your code. when did that switch come? And what happened to made you trust it to write your code? Or how much you trust it? How much you reviewed that code, for example? So the switch was instant when we started using Opus 4.5. This was before it came out. You know, we were dogfitting it for a little bit.

Starting point is 00:30:32 And it was just right away. It's such a more capable model. I just found that I didn't have to open my ID anymore. I just uninstalled my ID because I just didn't need it at that point. I actually did that like a month later, I just didn't even realize that I wasn't using it anymore. Yeah, a lot of us had similar experiences once Opus 4.5 was out in the public, and especially over the winter break.

Starting point is 00:30:54 I had a similar experience. I just realized that this thing, it actually writes, if I'm being honest with myself, as good code as I would have written in the stack that I'm very familiar with and my code base, my side projects where I know it, and just a lot better than what I could for code base that I'm not as familiar or technologies I'm not as familiar with. Yeah, I'll be honest. It writes better code than I do.

Starting point is 00:31:15 I don't want to go there. I still like to keep my pride, but probably true. Yeah, yeah. I realized this because also in December I was traveling a little bit. I was like on a, I was on a coding vacation. We were talking about this before, but I went to Europe. We were just in a different time zone kind of nomadding around. And it was so fun because I was just coding all day every day, which is my favorite thing to do.

Starting point is 00:31:36 And I wrote maybe, you know, like 10, 20 pro requests every day, something like that. Opus 4.5 and Quad Code wrote 100% of every single one. I didn't edit a single line manually. And I realized at the end of that month, Opus introduced maybe two bucks. Whereas if I had written that by hand, that would have been, you know, like 20 bucks or something like that. Can we talk about your development work? So you have written threads about this, which is awesome. It's on social media on threads and on X.

Starting point is 00:32:04 But can you tell us how you use today clot code in terms of, you know, parallelism? And tips and tricks that you and the team have kind of learned and share across a, across the team? Yeah, I mean, look, there's no one right way to use quad code. So I can share some tips and things, but I think the wrong conclusion to draw would be to just copy, copy these and use it. The way we build quad code is we build it to be hackable because we know every engineer's workflow is different. There's no one way to do things. There's no two engineers that have the same workflow. It's just every engineer is. Same with workstation set up, right? Like a keyboard monitor place, but all that everyone has it differently. Yeah, it's like we're like

Starting point is 00:32:44 crafts people, right? Like, you choose your tools. Like, we care deeply about it. So there's no one right way to do it. So for me, the way that I do it generally is I have five terminal tabs. Each one of them has a checkout of their repository. So it's five parallel checkouts. And usually all kind of round robin and start quad code in each one, almost every time I start in plan mode. So that's like shift tab twice in the terminal. And I also overflow as I run out of tabs because there's only so many terminal tabs. I used to use web a lot for this, so like clod.a.ai slash code. That's the place that I overflow to. Nowadays, I actually use the desktop app. It's more convenient. So quad code, it's been in our desktop app for, you know, for many months. It's just the code tab in the in the

Starting point is 00:33:27 cloud app. And I actually really like it because it has built-in work tree support. So that's existed for a while. And that's quite nice for parallelism. So you have multiple, you don't need multiple checkouts. You just have one. And then we automatically set up get work trees for you. So you get this kind of environment isolation. The reason I do that is I actually just really hate fiddling with Git work trees on the command line, because it's kind of fiddly. Like you need to know the CD. And the Git work tree for those who are not as familiar with it, it's when you can check

Starting point is 00:33:57 out instead of having a separate local folder, it's almost like checks out a separate branch, right? And then you can work on it separately, but not have the complex only at like merch time. That's right. Imagine that you have a folder, but you have maybe like Git makes five cons. copies of that folder in a way that's very cheap and kind of easy to throw away. So you get this kind of isolation. You can work in parallel and the quads don't interfere.

Starting point is 00:34:20 Yeah. So you now have support for this, which I think you recently added like native support. But like for your workflow, you just stuck with the old one of checking out on separate folders, right? Yeah, exactly. I actually find that over time I'm using the desktop app more and more for this. Just because I don't need these separate checkouts and, you know, I just have a bunch of quads running in parallel and I don't have to think about it.

Starting point is 00:34:40 The other surprise hit is the iOS app. for me. Every day I wake up and I just start a few agents on my phone. Oh, the native one, yeah. The native one, yeah. It's just like, it's the quad app. It's the code tab in the quad app. And it's the same exact quad code. Except it runs into cloud, right? It runs in the cloud. Yeah, so you have to kind of configure the environment.

Starting point is 00:34:58 Lucky our environment's pretty simple. So, you know, and we just use hooks for it. So you just use the session start hook and configure it. This is kind of one of the benefits of making quad code really hackable. It's very easy to do this kind of configuration. And this is something honestly, I would never have predicted because, you know, like I code on a computer. If you told me six months ago, I'd be writing, I don't know, a third, I haven't told the data, maybe like a third half, something like this of my code on a phone. That's crazy. But that's what I'm doing today.

Starting point is 00:35:29 And you're using parallel agents. At what point did you start using them and how has it changed your work? Because one thing that I know is on myself, I don't really use that many parallel agents. I may be like two at a time, but I'm someone who, well, I like to be in charge, and especially with Claude, Claude is a tool that you can follow it along. It tells you what it's doing. You can also have, for example, learn mode, which this was shipped a lot earlier, where you can actually follow along. It gives you tasks. I feel that like staying in one tab and following along, the model is pretty fast as well. I can kind of keep in touch. I'm assuming at some point you must have done this, but then what happened when you changed to parallel?

Starting point is 00:36:10 And do you feel you're losing any control or it doesn't really matter that much? Yeah, I think there's kind of like two modes to think about, or kind of like two kind of workflows to think about. So when you're new to a code base, highly recommend. Learn mode is awesome. Highly recommend it. For people that are onboarding to the QuadCode team, people that onboard to Anthropic, the thing that we recommend is, so you do, for people that haven't tried it,

Starting point is 00:36:33 you do slash config. In QuadCode, you pick the output style. And you can do learn. or explanatory. We usually recommend explanatory because that tends to be better for new codebases that you kind of haven't been in before.

Starting point is 00:36:44 For me, once you're familiar with the code base, you just want to be productive, right? Like, you just want to ship as much as you can and you want to kind of be effective doing that. So the road really switches. I don't really go deep into tasks anymore. I start a quad in plan mode.

Starting point is 00:36:59 I'll have it kick something off. With Opus 4.5, I think it got there. With 4.6, it just really, really does it. Once there is a good point, plan, it just, it will one shot the implementation almost every time. So the most important thing is to go back and forth a little bit to get the plan right. So what I do is I start one, I enter plan mode, I give it a prompt. As it's chugging along, I'll go to my second tap and I'll start the second quad, also in plan mode, get it chugging along, then go to the third tap, go to the fourth one. Then maybe I'll

Starting point is 00:37:26 go back to the first one when I get notified that it's done. And then I'll kind of do you have a little pick is on or you turned them off? I actually operating both modes. Sometimes I do like, you know, focus mode on the Mac. So I just have it off. But also, sometimes I use the system notifications. And you're very, very productive with PRs. I mean, I think it was very visible. Even around

Starting point is 00:37:47 the holiday breaks, on social media, you actually were responding to, I think, someone reported a bug or a feature request. I'm not sure what it was. And then an hour or two later, it was done because you did it. You've also talked about, like, number of requests you've done on a day, not to, like, show up,

Starting point is 00:38:04 but just as context. What What does a pull request typically involve in terms of complexity? Are these, like, are some super trivial or some actually like larger pieces of work as well? Yeah, pull requests, each one varies a lot. Sometimes it's a few lines, sometimes it's a few hundred or a few thousand lines. They're all just very, very different. It's changed so much. Like, back when I was at Instagram, I think I was one of the top two, maybe top three,

Starting point is 00:38:29 most productive engineers at Instagram just by volume of code written. Oh, wow. So I've always, you know, for me, I've always just coded. a lot. Like, this is, uh, coding is like a way that I can express myself. And it's just like, it's a way that my brain thinks also. And so now I just get to do it. But I think with quad code, the, the kind of code that you write, if you are very productive, it tends to be even, it's just the number of PR sort of undersells what, what's happening. Because I think people that used to be very productive in the old days before AI assistance. A lot of the code maybe was like

Starting point is 00:39:02 code migrations or something like this. So like people that shipped, you know, 20, 30 PRs every day. A lot of it was like pretty, you know, like a one liner or kind of migrating A to B or whatever. Nowadays, I ship, you know, 20, 30 PRs every day. But every PR is just completely different. Some of them are thousands of lines. Some of them are hundreds. Some of them are dozens. Some of them are one liners. It's none of these are kind of code migrations because actually Klaude just does those. And I don't need to be part of that. Shipping this much code or this much more productive, the obvious question that comes up for any, I guess, software professional as well, the review, the way teams used to work, and I'm not sure if Instagram

Starting point is 00:39:37 did this, but a lot of other companies, that this is, you make a poll request, you put it up there, there's a mandatory human reviewer. At Google, there's actually two, because there's one on cool quality as well. How has this workflow changed? How does the code team think about code review and how has it changed over time? Yeah, I'll start by thinking, I'll start by talking about how code review used to work for me. So the way that I used to do it is every time, I, I also used to be one of the most prolific code reviewers. Oh, okay. So both.

Starting point is 00:40:05 I met it. Yeah, yeah. Right. Is it code reviewers? That's actually, and that's one of the benefits of being in a different time zone. Like, I'm not super human. I just didn't have any meetings. And the way that I approach code review is every time that I would have to comment about

Starting point is 00:40:17 something, I would drop it in a spreadsheet. And I would, like, describe the issue. So let's say, you know, like someone named a parameter, you know, in a function badly. I would like put that in a spreadsheet. If someone did some bad React pattern or something, I would put that in a spreadsheet. And then over time, I would just kind of, kind of tally up the spreadsheet.

Starting point is 00:40:33 And anytime that a particular row had more than three or four instances, I would write a lint roll for it. So just automate it with kind of synodontosis. And so that's what it used to look like. For me, I've always tried to automate myself away because there's just so many things to do. And this is one of our superpowers as engineers is we're able to automate all of the tedious work. There's very few other fields where you're able to do this thing.

Starting point is 00:40:56 This is a thing uniquely that we're able to do. And this is a thing that I've just always enjoyed because it gives me more free time. time and I get to do the work I actually enjoy. And so today, the way this looks is a little different, but it mirrors this a little bit. So when Quad Code writes code, it generally, it will run tests locally, and this is something Cloud just often decides to do when it's relevant or it'll write new tests. So you kind of do this kind of verification. When we make changes to Cloud Code, Claude will also test itself. So it'll launch itself kind of in a sub-process, it'll verify itself and it'll test itself end to end. This is for your internal,

Starting point is 00:41:32 code implementation so you have like this test suite so they can test itself. Yeah, that's right, that's right. But it'll literally launch itself just in a bash process and kind of just see like, hey, do I still work? Wow. Okay. So it'll do this. And this is something that we just didn't code in.

Starting point is 00:41:45 Like it just with Opus 4.5 especially, it just started spontaneously doing this. It just wants to kind of check. So we do this. And then we also run Claude dash P. So this is the Claude Agent SDK in CI. So every pull request at Anthropic is code reviewed by quad code. And that actually catches maybe like 80% of bug, something like this. And it's the first round of kind of code review.

Starting point is 00:42:09 Claude will automatically address some of these. Some of them it'll leave to a human because it's not sure what to do. There's always an engineer that does the second passive code review. And, you know, there always has to be a person in the loop approving the change. So on the team, before anything goes into production, if you will, an engineer does look at it. Yes. As you're thinking of code review, would you do this for, every type of project or this is specifically because you know that this actually has real world

Starting point is 00:42:35 impact, people depend on it, you know, there's a lot of users. Let me put it the other way around. Like, can you see places where you would just not have an engineer review code? What situation would that be in? I think it depends how it's used. Yeah, I'd agree with that. Like, you know, if you're building some personal side project, like you can just yowl street to Maine, you know, like. Even before AI, you would have not reviewed. You just trust yourself or, you know, just shipped up production or cessation to production and do some changes that kind of stuff, right?

Starting point is 00:43:05 Exactly, exactly. The very first versions of quad code that were internal, like, you know, I committed straight to Maine. But then, you know, as soon as you have users and, you know, for Anthropic, our main customer base is enterprises, this is what we care about the most. For us, for safety reasons, security is really important, privacy is important.

Starting point is 00:43:20 These are all related. It's also very important for our customers. And so because this is an enterprise product, it has to be secure. It has to be, we have to make sure that it meets a certain bar. So we definitely use a lot of automation, but at least for now, there has to be a human in the loop just to make sure. One thing that is just known about LMS is they're non-deterministic. And by putting the LM as a reviewer, Claude, doing a review, it will give good feedback. But how would

Starting point is 00:43:50 you deal with the fact that you can be sure if it's always giving the feedback, you cannot be sure that even if it's capable of catching an issue that it will necessarily catch that. that. Are you doing anything in this loop to do deterministic thing? For example, linting is very deterministic, as you will very well know. Like, have you thought of marrying some of these ideas or are using, for example, are using linters on the code base where you found no need to for it? Yeah, absolutely. Absolutely. Yeah. This is just a, yeah. Yeah, we have type checkers. We have linters. We run the build. Cloud is actually so good at writing lint rules. So actually what I do now, I used to tally stuff up in the spreadsheet. Now what I do is when a coworker puts up a poor request and

Starting point is 00:44:27 I'm like, this is ventable. I'll just be at clod. Please write a wind roll for this in that PR on their PR. And we have, you know, you just run like slash. I think it's like set up GitHub or something like this. You can do this in quad code and it'll install the GitHub app, which then makes it so you can tag at Cloud on any poor request, any issue. I use this every single day. So very, very useful. So you want these deterministic steps. Also, though, there are there are ways to get Cloud to be a little bit more deterministic. So for example, you can do best. event, you can have it do multiple passes. And this is actually quite easy to do. So, you know, for example, the Coderview skill that we use internally, it's open source. And it's available in the Quad Code repo. And so all we do is, you know, we launch parallel agents to do stuff. And then we launch parallel de-duping agents to check for false positives. But essentially, best event, the way you implement it is all you say is cloud, start three agents to do this.

Starting point is 00:45:23 And that's it. Orr has just talks about building that enterprise infrastructure layer. the auth, the permissions, the security, that has to all work before you can ship to real customers. This makes it a great time to speak about our season sponsor, WorkOS. If you're building any SaaS, especially at AI product one, then authentication, permission, security, and enterprise identity can quietly turn into a long-term investment, samile edge cases, directory sync, audit logs, and all the things enterprise customers expect. It's a lot of work to build these mission-critical parts, and then some more to maintain them.

Starting point is 00:45:55 But you don't have to. WorkOS provides these building blocks as infrastructure, so your team can stay focused on what actually makes your product unique. That's why companies like Antrophic, OpenAI and Cursor already run on WorkOS. Great engineers know what not to build. If identity is one of those things for you, visit Workoas.com. And with this, let's get back to building CloudCode with Boris. How does CloudCode work in terms of architecture? So as an engineer, how can I imagine it's set up? It's, we can remember. some of this in the deep dive. And I think you told me that you had some pretty complex ideas when you started and you just simplified a lot of it. Yeah, yeah. It's very simple. Like, you know, there's not much to it. There's like, there's a core query loop. There's a few tools that it use, that it uses. We, we delete these tools all the time. We had new tools all the time. We're just always experimenting with it. So there's kind of this core kind of agent part of it. Then there's the, the two-e part of it. And then there's actually a ton of different pieces around security and making sure that everything that Quadco does is safe and that there's a human in the loop for when it happens. And by safety, do you mean as a user when it's doing stuff on my computer or also as entropic monitoring use cases that could be deemed unsafe?

Starting point is 00:47:14 Yeah, there's kind of a couple versions of this. Safety, there's just many, many layers. And for things like safety and security, there's no one perfect answer. So it's always a Swiss cheese model. You just need a bunch of layers. And with enough layers, the probability of catching anything goes up. And so you just have to kind of count the number of nines in that probability and pick the threshold that you want. And so for something like prompt injection, for example, we do this generally at three different layers.

Starting point is 00:47:38 So let's think about something like web fetch. So cloud fetch is a URL and it reads the contents of that web page and then it does something in quad code. So one of the risks for something like this is prompt injection. Maybe there's an instruction on that website to be like, hey, quad, delete all the folders or something like that. So we think about this in a number of ways. The most basic way is it's an alignment problem. And so Opus 4.6 is the most aligned model we've ever released because we've taught the model how to be more resistant to prompt injection. And so you can read about this on the model card and I think that was part of the release.

Starting point is 00:48:12 The second part is that we have classifiers at runtime where if there is a request that seems to be prompt injected, we block it. and we just make the model try again. And then the third layer is for something like WebFetch, we actually summarized the results in using a sub-agent, and then we return that summary back to the main agent. So again, this kind of reduces the probability of prompt injection. And so you can kind of see how this isn't just one mechanism. It's a layer, and by having a bunch of these different layers,

Starting point is 00:48:39 it just reduces the probability a lot. One interesting technical choice that you also mentioned is using rag or not Iraq retrieval, argumented generation. And you mentioned how in the earlier version of CloudCode, you use the local vector database to get some, to speed up search, and you layer through this away. Can you talk about how this one, because this was another example where, I guess,

Starting point is 00:49:02 did the model get better? Yeah, I mean, this is one of those things where we try so many different things. We try so many different tools. And just statistically, most of them we throw away. Even something like the spinner in QuadCode, I think it's gone through like 100 iterative. I want to say. Just the spinner.

Starting point is 00:49:20 And, you know, out of those, we landed maybe like 10 or 20 in production, and like 80 of them I probably just threw away because it didn't feel good enough. So just statistically, almost all the code we write, we throw away because it's just so easy to write this code and try stuff and see what feels good. So for something like RAG, we tried a bunch of different approaches early on. So the first one was RAG for retrieval because I think this, I was just like reading up, like how people were doing retrieval. and it seemed like all the papers were talking about RAG.

Starting point is 00:49:47 And so the way I did it was it was like a local vector database. I think it was like written TypeScript and you just looked on the user machine. And then I was using some like embedding model those in the cloud to compute the embeddings before storing it. And that worked like pretty good. But there's a lot of issues with RAG. So for example, I was finding that the code drifted out of sync. Like if I make a local function, it's not yet indexed. And so RAG isn't going to find it.

Starting point is 00:50:13 there's also this question of like how exactly is the index permissions so who can access it I can access it but then how do we like encode that in kind of permission policies how do we make sure no one else can access it how do we make sure that like if there's a rogue IT person within the company they can't access someone else's data this is really really important that we think about this yeah and so we just decided like it was sort of working but it also it also has a lot of downsides and so we tried a bunch of other stuff one of them was just using the model to kind of index everything recursively. That was kind of a cool idea.

Starting point is 00:50:48 There was another version where we just tried Glob and Grap. We tried a bunch of different stuff. It turned out that Agentic Search just outperformed everything. And when I say Agentic Search, it's a fancy word for glob and grep. That's all it is. Nice. So the model both caught good enough

Starting point is 00:51:03 and you realize that it can use these tools pretty efficiently. Yeah. And it was partially inspired, honestly, by my experience at Instagram. Because at Instagram, click to definition didn't work because the dev stack was just borked like half the time. And I think now it's better. And so what engineers learned to do instead is, let's say you're looking for the definition

Starting point is 00:51:24 of the function fu, instead of click to definition, what you would do is you would use the global index, which is quite good at meta, and then you would search for foo opening parentheses. And this worked pretty well. And it's funny because like this works for the model pretty well, too. interesting how one idea from one area can come to the other. One of the more advanced parts of cloud code that we also previously talked about

Starting point is 00:51:49 is the permission system. Can you talk about what was complex about it? And also you recently opened source sandboxing, right? Permissioning is really complex. There's like everything else that has to do with security, it's a Swiss cheese model. There are a number of classifiers that run to make sure the command is safe.

Starting point is 00:52:11 And there's also static analysis that we do to make sure the command is safe. As a user, you can also allow list particular patterns that you know to be safe. So, for example, some standard Unix utilities, we pre-allow because we know they're read-only, because we know they can't export data or anything like this. So we just won't prompt you for permission. But actually quite you tools fall into this category, because even something like the find command, there's actually a way to execute arbitrary. code as part of that command, because there's like system flags that you can use for this,

Starting point is 00:52:43 or even something like the said command, there's ways to use this. So there's just like all this like arcania about these various Unix utilities where it's actually not as safe as you think. And so we want to be by default fairly conservative about what we allow by default. As a user, though, you can configure in a while list. So you can say, for example, like these patterns are wild, these patterns are not allowed. And so we let you define that and we also check this allow list to make sure that it's safe. Yeah, and then you have this, like, neat permission system where every time we run a command that needs permission, you can decide to run it once,

Starting point is 00:53:17 or run it for either this session or whatever it makes sense or just to globally allow it going forward, right? That's right. This is a funny artifact. This was actually in the very, very first version of Quad Code. This is the way permissions worked. This is the very first release. This was like September 2024, the first internal release. I remember at the time, we weren't sure whether agentic safety could even be solved.

Starting point is 00:53:38 And so there was actually a lot of pushback and turn away from safety teams because they were like, okay, like, you can't just let the model run bash commands. Like, that's unsafe. So, like, what do you do? Like, this is not a solvable problem. So, like, we can't launch this. I brainstormed with Ben Mann. And Ben was, he started the labs team. He's one of the founders at Anthropic.

Starting point is 00:53:57 He's actually, he's the person that hired me to Anthropic. We just come up with permission prompts as the way to do this. You put the, if you're not sure, just ask the human and they can decide. No. I want to ask you about how. software engineering is done in general in terms of Antrophic. And one of the first questions,

Starting point is 00:54:14 which is a, I guess, a more formal one, but or from the outside, is titles or lack of them. Everyone at Antrofic has the same title member of technical staff. Why did this happen? And what does this result in this kind of like everyone, there's basically no titles,

Starting point is 00:54:30 right, except for one? I think it's kind of an acknowledgement that everyone just is figuring a stuff out. And if you kind of squint and look at the work people are doing, it's all quite similar. And it's kind of quite generalist. And if you talk to the average software engineer, they might not just be doing coding. They might also be doing a little design. They might also be talking to users. They might be writing their own product requirements. They might be writing software and also, you know, doing research. They might be writing product

Starting point is 00:55:02 code and also infrastructure code. At Anthropic, there's a lot of generalists. This is also, you know, from my background, this is one of the reasons that I gravitated towards it. And I think member of technical staff just kind of encodes this in the way that people talk to each other, even if they don't know each other. Without this title, the default would have been, I see your name on Slack and under your name, it's a software engineer. And then I'm like, well, okay, I guess you're like, you're the coding person. And so I'm not going to ask you like product questions. But when everyone's title is member of technical staff, by default, you assume ever, you assume ever does everything. And so it kind of inverts this, this relationship between people, even if you

Starting point is 00:55:37 don't know each other well yet. In a way, it's kind of this like optimism built into the, built into the structure. I think it's also a glimpse of the future, because I think this is where software engineering is going. I think this is where every discipline is going is more of this generalist model. It definitely feels like it in software engineering. And I heard this funny, common by Mark Andresen, how we said that there's this Mexican standoff happening in the tech world where the designers are saying that they're actually now doing like PM and engineering work. The engineer are saying we're doing design and like everyone thinks they're doing the work of the others and they're kind of standing there like I'm doing your work as well. But the reality is everyone's role is

Starting point is 00:56:20 expanding. Most of it thanks to AI because it makes easier for an engineer to do product work or for a person to engineer work and so on. So just what you've said. I remember back in the back in June or July of last year, I walked into the office and the data, there's a row of data scientist that's right next to the quad code team, at least at the time. And I walked in and our data scientist for the quad code team had quad code up on his monitor. And he was using it. And I was like, this is interesting because you're your data scientist. Did you have like, why are you using a terminal? Like, you didn't have no JS installed because we depended on no JS back then. I was like, are you, are you dog fooding it? Like, are you just like trying to like figure out

Starting point is 00:57:00 how this thing works or something. He was like, no, no, I'm like, I'm using it to run queries. He was just, like, using it to run SQL and it has, like, little, like, ASCII visualizations in the terminal. And then the next week, the entire row of data scientists had quad code running on their computers. And this expanded. And so if you look at the team today, on the quad code team, everyone codes.

Starting point is 00:57:22 The engineer's code are engineering manager codes, designers code, data scientist code, our finance guy codes, everyone on the team codes. And I think part of it is quad code just make it so easy. So you don't really have to understand the code base. You can just like dive in and kind of make small changes quite easily. But I think another thing is people are able to use quad code to do their jobs more, whether it's, you know, financial forecasts or, you know, data science or whatever. And by doing this, it's actually quite an easy crossover to just use it to write a little bit of code also.

Starting point is 00:57:57 So it's just a way to dip your toe in the water. One other interesting thing about how you work is Katu was talking about. She is, I guess, the title is the same, but people might gravitate for a role a bit more. I understand she's a little bit more on a product role. But you said that PRDs are just not really written inside entroping. And PRD's product requirement document. It's a well-known artifact across big tech and increasingly over larger startups where you write a spec. And the idea is that you write down your thoughts, people align, you send it over.

Starting point is 00:58:26 and now you know what to build. But apparently you're not doing much of this or at all. Some of this, I think, is because Anthropic is still, you know, it's still a startup. So you don't actually have to align with that many people. Usually you can just kind of talk about it or do it in Slack or whatever. But yeah, also part of it is, you know, like Kat used to be an engineering manager. She's extremely technical.

Starting point is 00:58:45 And I think this is this is the way that, you know, our product team thinks about it too is, you know, better to send a PR. You're doing a lot of prototyping instead. So, like, that's also something where when we talked about how you're building cloud code early on you were showing actually you had a whole thread about the number i think you did like 15 or 20 prototypes for the the to-do list and all of them interactive working and what surprised me compared to my past tech experience and you said that well you did this in like a day and a half all 20 tried it out got a feeling for it which incomprehensible for me it would have taken a week or

Starting point is 00:59:19 two weeks and people would have not done 20 they would have done three yeah so like are you seeing this is there an increase in prototyping and building and showing instead of, you know, writing things? Yeah, absolutely. I mean, on our team, the culture is we don't really write stuff. We just, we show. It's a little hard to reflect back on the time before because I think now just prototyping everything is so baked into the way that we build. Just everything is prototyped multiple times. Like, you know, we launched agent teams over this week. This is our implementation of swarms. It's very exciting because it just lets Quad do more work for long. more autonomously. You have a bunch of different uncorrelated context windows and you have this kind of communication between agents. They can just do more. This is something that Daisy and Suzanne

Starting point is 01:00:04 and other folks on the team and Karen, they prototype this for months. And they tried all in all probably hundreds of versions of this before they got a user experience that felt really good. It was just really, really hard to get right. There's just no way we could have shipped this if we started with, you know, like static mocks in Figma or if we started with a PRD or something like this. It's a thing that you have to build and you have to feel and you have to see how it feels. And to me, one of the big takeaways, even from there was like we probably should prototype more and just be more daring or just release your priors of how long it took to build a prototype or who needed to build back then it was always an engineer that needed to build, but it's probably

Starting point is 01:00:45 not true anymore. Yeah, that's right. I mean, we're in this world right now also where we just, we don't know what the right answer is. You know, I think back in the old way of building, you, the cost of building was high. And so you had to actually spend a lot of effort to aim very carefully before you take your shot. Because after you take your shot, it's very hard to course correct. You can only take so few shots. But now it's changed. The cost of building is very low. But also we don't know where we're aiming. So we just have to like, we have to try and we have to see what feels good. And it's just very,

Starting point is 01:01:14 very exploratory. And I think also a big part of it is humility where, you know, personally, I'm wrong. like half the time, I'd say like most of my ideas are bad. At least half of them are bad. And I don't know which half until I try it. And I don't get feedback from others as well sometimes. That's right. It's like I have to try it myself and then I have to see what others think

Starting point is 01:01:33 because, you know, my intuition does not always match others. When you were showing these prototypes of just how the tasks were built, you were telling me that you build the prototypes and then your process was always, you first looked at it, you tried it out, you got to feel for it. And then for the ones that you, felt were good, you showed it to others, and sometimes they gave you feedback, like, nah, this doesn't work. And then sometimes when it felt good, then you share it even broader.

Starting point is 01:01:57 So I feel like, you know, like, it's a mix, right? Where, like, sometimes you can decide already. And then sometimes you get feedback and then eventually some good ideas come out of it. Yeah, and there's a lot of examples of this. Like, we, we launched this kind of condensed view for file reads and file search, just because the model is just so egentic. Like, no, like, I felt like half the screen is these, like, file reads. And I actually don't care, like, you know, you read a thing.

Starting point is 01:02:18 I don't really care what it is. And so we condensed this down to make the output a little bit more readable. I really liked it after probably 30 prototypes or something like this. It took so much effort to make that feel really good and clean. We rolled it out to employees at Anthropic for about a month, and we had everyone dog fooded. And I fixed another probably dozen, dozen bugs, dozen tweaks based on all this feedback.

Starting point is 01:02:40 We launched it externally and almost all users liked it. But there were a few users that didn't because they want more expanded output. And so on the GitHub issue, I was just going back and forth with people to be like, you know, like, what don't you like? And people gave a lot of feedback. I shipped another version. Then some people liked it. Some people didn't.

Starting point is 01:02:56 And so I iterated again and kind of made it good. And it's actually, I think, almost there where people can configure it the way that they want, but still the default is really good. But this is just the process. You know, we get it right some of the time. We have to learn from our users. We want to hear from people so we can get it right. Do you use ticketing systems for your work where, you know, where you capture like, all right, here's the work I want to?

Starting point is 01:03:17 Or do you just pretty? much do the work as it comes in. So at Anthropic, we leave it up to teams on the quad code team. We leave it up to every person. Different people use this differently. For example, I don't use a ticketing system. Some people like to use Asana or notes or something like this. One of the coolest things that I saw, this is maybe like three months ago or something. We launched plugins. And the way we launched that is Daisy for a weekend. She had a very early version of swarms. and she let the swarm run, and she told that your job is to build plugins.

Starting point is 01:03:49 You have to come up with a spec, then you have to make an Asana board and split up into tasks, and then all the different agents have to build it. And she set up a container, and she set up a quad in dangerous mode. And she let it run for the entire weekend. It spawned a couple hundred agents.

Starting point is 01:04:05 They made 100 tasks on the Asana board, and then they implemented it. And that's pretty much the version of plugins that we shipped. These kind of coordination systems they used to be for humans, but I think nowadays it's just as much for models. Let's talk about Cloud Co-work. It's one of the very impressive things about this. It looks great, so I tried it out.

Starting point is 01:04:25 Inside Cloud, you have the Co-Work tab there, and you can, I feel it's a lot more visual way of running agents interacting with them. One of the surprising things I heard that it was built in 10 days. Can you take us through, like, what it took to build it and what does actually mean? Was it from the idea or like from the decision? of building it and how big was a team building it? The team was really small. It was just a few people.

Starting point is 01:04:47 For a long time, we felt that there is some product to be built for non-engineers. The reason we felt this is for a long time, people that were using quad code are non-engineers. And so, you know, in the product world, when you see latent demand, you see people jumping through hoops to use a product that was not designed for them. That's a really good sign. It's time to build another product that is built just for them. There's all these people on Twitter that there's this one guy that was using quad code to like monitor his tomato plants. I just, I love this.

Starting point is 01:05:19 It was like, get like a webcam set up and the quad was like, oh my God, I'm so happy that our plant is budding. And because it had like a webcam and just like every day it was like monitoring it. And it was so happy that the tomatoes were growing. There was someone that was using quad code to, you know, recover photos off of a corrupted hard drive. And it was like his wedding photos. Wow. You know, like I said, our entire finance team at Anthropic uses quad code, our sales. team uses quad code. So there's just all these people that are non-engineers that we're using it.

Starting point is 01:05:47 And at that point, quad code, it's available in a lot of form factors, right? Like, we started in a terminal. Then we expanded and we added support for IDEs. So we have extensions for, you know, every VS code-based ID, every JetBrains-based ID. There's also iOS and Android apps. There's the desktop app. There's web. So then there's like Slack and GitHub apps. So we kind of expand it to all these places to make quad code easier for engineers. But ultimately, none of these are built still for non-engineers. And so cloud code evolved a lot, but it still felt like there's a, there's kind of a gap. And there's a product that could make this even easier for people. And so for the last couple months, the team was kind of hacking around and just saying, like, what is the right product? And at some point,

Starting point is 01:06:31 someone came up with this idea of, like, what if we just take quad code, add some guardrails. So, for example, co-work ships with a virtual machine. This is one of the many ways that we make sure it's really safe, especially for non-technical users that don't want to read like bash commands to figure out what it's doing. And they were hacking on this. I think it was something like 10 days until end or something. It was just fully built with quad code. And then we shipped it. And can you give us a sense of like the complexity behind an app like this? And if we can walk through like what parts needed to be built, because from the outside, it's a little bit hard to tell, like, is there just a nice UI wrapper? That's, you know, like, I don't know, like a few hundred lines.

Starting point is 01:07:10 of code. I'm just being obviously I'm provocative here or behind the scenes it's actually really complex piece of software. The reason I ask is like Uber is a great example where people look at the app. It looks really simple. I work there and I know it's it's really, really complex because you don't see a lot of the complexity. There's a lot of regional things. There's a lot of back end things that are all hidden. So from just from looking at it, cloud co-work, it's hard to tell how much of this is additional business logic that needed to be carefully thought out versus it's actually just a nice little thin wrapper on top of the model. In some places, I think there's less complexity than you would think.

Starting point is 01:07:44 In some places, there's more complexity. So on the product side, it's quite simple because it's just the quad desktop app. So you know, you download the quad app. It's a single desktop app. It has a tab for cowork. It has a tab for code. It has a tab for chat. So it is just one app.

Starting point is 01:07:57 And we're able to inherit a lot of that product logic. There's some UI rendering code. Under the hood, you know, it's just the same quad code running. It's the same quad agent SDK that powers quad code. A lot of the complexity actually is about safety. because we know, like I said, we know the user is non-technical, and so we just want to make sure they have a good experience.

Starting point is 01:08:16 And so, for example, if someone launches the app and then, you know, like they delete a bunch of family photos, that's really not good. And so we wanted to make sure that we protect against this, so you can't accidentally do that. And so that's where a lot of the guardrails came from. So there's a bunch of classifiers running on the back end.

Starting point is 01:08:30 This is for safety and, again, extra mitigations for things like prompt injection and, you know, risks like this or on security. On the front end, there's an entire virtual machine that we ship, there's a bunch of operating system, system level integrations to make sure people don't accidentally delete things. So just around safety, there's a lot there.

Starting point is 01:08:50 And then we also have to rethink the permission system because we inherit the permission system from quad code. But also for co-work, actually a big part of the value is not just running locally, but it's using all of your tools the way that quad code uses it. But the thing is, for non-technical users, your tools aren't really available with CLIs. some of them are available over MCP. Many of them are available in a browser.

Starting point is 01:09:13 And so co-work is really, really good when you pair it with a Chrome extension. And this is the way that I usually use it. So, you know, for example, I use it every week to do project management for the team. We have like, we have a spreadsheet that tracks kind of at a really high level what everyone's working on. And this is kind of my personal way of project managing, you know, other people, like I said, use Asana, other people use notes or whatever. For my own test, I don't use anything. But kind of for the team overall, I have the spreadsheet. and I have co-work kind of check-in

Starting point is 01:09:38 and I just ask co-work every week, hey, can you look at the rows for any status that has not been filled out? Can you just ping the engineer on Slack? And so it'll open one tab in Chrome for the spreadsheet. It'll open another tab with Slack. And then it'll just start messaging engineers in Slack. And it just one-shots it.

Starting point is 01:09:56 There's like one engineer's name for some reason they can't autocomplete, but everything else it just gets. And so this is actually like, from a safety point of view, we also thought pretty deeply about this Chrome extension and how this works and how the permissioning model should interact with this local permissioning model. So there's also a bunch of code to kind of make sure that that feels smooth.

Starting point is 01:10:15 And what's the text tag behind this? I assume a lot of will be similar to the cloud app, but is it is it electron, type script, those kind of things or something else? Yeah, just electron and typescript. Actually, some of the people working on it are early electron folks. So Felix, who's, you know, the creator of co-work, he was a really early engineer on electron. he helped build it. Oh, amazing. And co-work launched macOS only. What was the reason for, both for choosing this platform first and for now only choosing this platform? Yeah, so Windows coming soon. I think probably by the time this podcast comes out, we will have Windows support. We just wanted to start early and start learning. You know, like everything we do at Anthropic,

Starting point is 01:10:56 it's kind of like the way that I told my own story, one of the things I like about Anthropic is it just really, really matches the way that people here think about it. You know, back to this point where, like, we don't have high certainty about the things that we build. And our intuition is often wrong. And so we just have to, like, learn from users and figure out what people actually want. And you just spend a lot of time listening to people and understanding the feedback deeply. This is the way that we build a product. And so we always launch a little bit before it's ready.

Starting point is 01:11:24 We did this for Quad Code. When we launched Quad Code, initially, it didn't even support Windows. Also, it didn't support, you know, like a lot of different stacks. and then over the coming weeks we added support for every stack. Now quad code supports every single stack. You know, like Windows, whatever weird Linux destroy use, MacOS, we support everything. And so for core work also, we just wanted to launch early. We wanted to start with Mac because that was just the starting point.

Starting point is 01:11:48 But yeah, it's going to support everything. One thing you mentioned is getting feedback. I'm curious, both for cloud code and for cloud cowork, how do you go about things like observability, monitoring, when you're rolling out, do you use any feature flags? And I'm more interested in, like, did you build custom tools for this, or did you decide to use certain vendors? Because especially for observability, I'm sure that this is both important, but it also sounds like pretty high scale in terms of the number of users that we can derive or it's this will not be a small operation. Yeah, there's some off-the-shelf vendors that we use.

Starting point is 01:12:23 There's some custom code that we use. So it's actually, it's a mix of both. There's nothing too surprising about it. There's one thing about Anthropic that's kind of interesting is because we're, enterprise company and we care a lot about privacy and security, we can't see people's data. And so, you know, like if someone reports a bug, like, I actually can't pull up your logs to kind of see what's going on. A lot of work goes into kind of figuring out how to log events and things like this in a privacy preserving way. This is just very important to the way that we operate.

Starting point is 01:12:50 For co-work, what kind of learnings have you had so far? It's been out for, I think, a few weeks now. Did you see something unexpected? Are you shaping the product based on feedback that you're getting? Yeah. Every day the team is landing so many fixes. The most surprising thing is just how much people are loving it, to be honest. When Quadicode first came out, it actually wasn't an overnight hit. This is something people think it was, but it was sort of a slow takeoff at the beginning. And I think the first big inflection was in May when we released Opus 4 and Sonnet 4, that's when it really clicked. And that's when our growth became exponential. But at the beginning, it was sort of a research preview. People didn't really know how to use it. Some people got it immediately, but most

Starting point is 01:13:31 people didn't. It took a little while. For co-work, it's a much steeper growth trajectory than Quad Code was at the beginning. So it's just been an instant hit. And that's actually a bit very surprising. I didn't really expect that. One of your new releases, which came out just very recently, it was, I think yesterday, a day before when we're recording this podcast, was Agent Teams. And as I understand the idea with what Agent Teams, Agent Teams, Agent Swarms, instead of single agent, you can have a lead agent and it can delegate to its different teammates. How do you start experimenting with this and how do you decide to ship it now? We're always doing experiments, right? There's all sorts of ways to get more mileage out of quad code.

Starting point is 01:14:16 One way you can do it is by extending context. Another way is auto-compacting context. So it's essentially infinite context. And that's what we have right now. Another way is using sub-agents. So you have multiple agents kind of working together. There's just like a lot of different approaches to get a little bit more mileage out of the context window. There's this one idea called uncorrelated context windows. That's what we call it. And the idea is you have multiple context windows, but they essentially start fresh, so they don't know about each other.

Starting point is 01:14:44 And so an example of this is like a correlated context window is if you have one, if you have the model and it does a task, and then you have it just do a second task in that same context window. And in this case, the second task knows about the first one because it's in the same window. But for something like a sub-agent, it's uncorrelated. because the main agent prompts the sub-agent's context window is fresh. Besides that prompt, it doesn't know what's in the parent context window.

Starting point is 01:15:07 And you can see this actually a little bit in, for example, like sub-agents versus skills. Because when you run a skill, you know, or slash command, it sees the parent context window versus for a sub-agent. It doesn't. So it's uncorrelated. There's some cases where you want that context. There's some cases when you don't. And there's this kind of interesting thing where uncorrelated context windows and

Starting point is 01:15:29 just throwing more context at the problem and throwing more tokens at it, when the windows are uncorrelated, it gives you better results. It's actually a form of test time compute to do this. And for something like teams, we've been experimenting with this for a while, I think, since maybe like October or September or something like this. And it really just felt like with Opus 4.6, it clicked, where the model figured out really how to use this. And sometimes you see these kind of cute exchanges where the agents are talking to each other

Starting point is 01:15:58 and they're like discussing something. And it's just very cool to see. It's very like humanistic in a way. But there's other times where you just get very good results. And so we had a bunch of internal evaluations, for example, where we have quad build something very, very complex, something more complex than what a single quad would build. And we saw the results just really, really improve with Opus 4.6 with teams.

Starting point is 01:16:19 And that's why we felt it's the right time to release it. We also wanted to be careful. And the reason you have to opt into it, the reason it's a research preview is it uses a ton of tokens because it's just a bunch of clods that are running. Not everyone wants this all the time. So just excited to see how people use it and, you know, to hear the feedback. It's something you want for fairly complex tasks. You don't probably want this for every task. The main quad decides the roles for the subclods. We don't have a kind of a regimented way to do this. It's context specific. I wouldn't say there's one right

Starting point is 01:16:49 way to do it. I think actually a lot of the magic of this comes out of this idea of uncorrelated context windows. It's less about the specific configuration of the agents. But it, it's a lot of the magic You know, it's something that people should experiment with. I don't think there's a one-size-fits-all. Have you seen use cases even in, even, I know it's still recent, but have you seen use cases where it could look, it looks promising this approach, the swarm approach? Well, you know, like I said, before, plugins were fully built with swarms.

Starting point is 01:17:13 There's a bunch of other features since that were built in this way. So yeah, I think for anything where you see a single quad struggling, swarms can help. It's an interesting to look at. Talking about change in general, with under carpet that you had a really, interesting exchange back in December where when he posted that he's never felt as much behind as a programmer as he is now because of the progress with AI. And then you share the story about how you start to debug a memory leak, the old-fashioned way, and then Claude just one shot at it.

Starting point is 01:17:46 I think it was a reflection of like how everyone is feeling that things are changing so fast. And in the holiday break, I started to feel that things have ever really shifted. How did you, I guess, come to terms with this or start to embrace this change? This is something I really struggle with. The model is improving so quickly that the ideas that worked with the old model might not work with a new model. The things that didn't work with a new model might work or with the old model might work with a new model.

Starting point is 01:18:16 And it's weird because there's just not a lot of other technologies like this. So I just don't really have a lot of experience to draw on to figure out how I should approach this. And it's been this new skill that I've had to learn. In a way, it's like you just always have to bring this beginner mindset. Honestly, like, I'm using the word humility a lot, but you always just have to bring this kind of intellectual humility because just all of these ideas that were bad before are now good and the inverse. I think that's honestly it. It's something I constantly have to remind myself about. And back in the, it's funny, back in the old world, when someone tries an idea again and we've tried it in the past and it didn't work.

Starting point is 01:18:58 Usually the feedback is like, why are you doing this again? Yeah, yeah. You should learn. This is, I mean, we used to call a bit of a gatekeeping, but it was somewhat valid where I know with architecture, someone came and said, like, why don't we do microservices and someone said, we tried it and it didn't work. And if you tried it a year or two or three years ago, it was kind of valid, right? Because not much has changed.

Starting point is 01:19:16 Yeah, that's right. And it's something like Microsoft, it says, it's funny because it's like every 10 years, it goes in and out of style. But yeah, now it's, I think, the first time ever where it's actually not crazy. to just try the same idea every few months because the model improves and it just works. And I actually see this with engineers on the team, like, people that are newer to the team, people that are newer to engineering, sometimes do things in a better way than I do. And I just have to look at them and I have to learn and I have to adjust my expectations.

Starting point is 01:19:45 You know, like an example of this is, you know, when we release features, sometimes I'll like screenshot myself using them on, you know, on X or on threads or whatever just to kind of talk about it. But recently, Tarek, our, you know, our Devereaux guy, he actually codes a lot. He's amazing. And he just started automating this. So he's having, like, Cloud Code generate its own videos for its launches. And he just started doing this. And, you know, this is something like I thought would be, you know, maybe it's possible.

Starting point is 01:20:12 It's not something I would have tried because I wouldn't have thought the model was ready, but he just did it and it just kind of worked. One thing that I felt like just a bit, like, odd about. And I think a lot of the developers kind of relate is, I've, come to terms with this, starting from Opus 4.5, and also similar models, like I think GPT 5.2 gave me similar vibe as well. The models have been just really good at writing code, and I realize that I don't think I will handwrite the code when I want to get stuff done, if I actually want to, you know, get the pleasure of writing, I can still do it. But one thing I reflected on is,

Starting point is 01:20:49 it's just been so much effort to get good at coding. I remember when I, when I was learning, when I started from like kind of hacking around to go into university to learning C and C++, and it was just bloody hard and actually going through my first few jobs where I started to become better at. It became better at debugging. And there's a point where like a lot of my identity was tied to being good at coding. That's how we used to get jobs or higher paying jobs. When I was an engineering manager, when we designed the interview loop at Uber, we had talk with

Starting point is 01:21:18 managers of what we need to screen for. And we would talk like, well, what do developers do most of their time about? 50% of time they code, therefore we place about 50% of a signal was all about coding. So there was a lot of things identical coding because it is just hard. I think we all know that it takes grit, it takes some level of intelligence to get good at it. And there's a sense of loss of like, well, I think it's great on one end that the model can do it. But it feels that something really quickly got taken away that I don't think I personally thought it would happen this quickly. and I'm, I think a lot of other people are feeling like this.

Starting point is 01:21:54 Some people move on a bit easier, but there's definitely the sense of grief. How did you think about it? Because, again, you're an example of you wrote so much code at Facebook, also outside of it. I know it was just a tool of doing it, but not many people could do what you did. And now the models can also work as good as you have, or if not better. That's the challenge. Yeah, I think it's something that used to. be a thing that we do as software engineers, it's becoming a thing that everyone is able to do.

Starting point is 01:22:25 There was a moment, you know, like, when I started coding, it was a very practical thing and it was a way to get things done. And at some point, I just fell in love with the art of coding and like languages and kind of the tools themselves. And at some point, I kind of fell down this rabbit hole. I wrote this, like, I wrote a book about, you know, a programming language. TypeScript. You were the first ever type script book with O'Reilly. Yeah, yeah, yeah, that's right. It was funny, actually, there was this, like, there was this amazing moment for me in my little town in Japan.

Starting point is 01:22:56 I went to the bookstore and I found that book translated in Japanese. No, in this tiny town. That was just like the coolest moment. And then I actually realized I don't remember typescript at all because I was only writing Python for a couple years at that point. Yeah, and like at some point I started the first, the biggest typescript meet up in the world. That was in those in SF and I got to meet kind of a lot of my heroes. There was like Chris Colwell who were like general theory of reactivity. There was a Ryan Dahl, the guy that made Node.

Starting point is 01:23:22 One of the first times that I went really deep into this community and just the language itself and the tools themselves. And for something like TypeScript, there's this beauty in the type system because Heilsberg is just like he's just brilliant. Like the idea of like conditional types and just like anything can be a literal type. And there's these very deep ideas that even the most hardcore function. languages do not have. Like, even in something like Haskell, like, it doesn't go this far. And Anders just took it and he pushed it much further than had been pushed. And, you know, like Joe Pamer and a bunch of other folks kind of explore to a lot of these

Starting point is 01:24:03 ideas and thought of this. And I think for them it was also very practical, right? Because they had these large on-type JavaScript code bases. How do you gradually migrated to something typed? And you have to come up with these very beautiful ideas to do this. For me, Ischala was another kind of rabbit hole that I fell into and kind of like, this functional programming world. And still, when I write code and when the model writes code, I always think in the types

Starting point is 01:24:24 first. That's what matters is what is the type signature. That matters more than the code itself and getting that right. So there is this beauty to it. There's an art to it for sure. But in the end, it's a practical thing. And in the end, this is a thing that we use to build things. And, you know, it's a means.

Starting point is 01:24:44 It's a means to an end. It's not an end to itself. I think one metaphor I have for kind of this moment in time that we're in is the printing press in, you know, like the 1400s or whatever. Because at that moment, it was actually quite similar, right? Like there was a group of scribes that, you know, knew how to write. And it was, as I understand, of course, we never lived there. But as I imagine, it was a hard process to learn. You needed to learn.

Starting point is 01:25:10 You needed to get the equipment. You probably needed some sponsorship or being selected. Yeah, yeah. Practicing because you needed to produce the same thing. over and over again, and few people could do that, and I assume it was either high prestige or highly paid, or who knows, let's assume it was, but then the printed card press came along. Yeah, yeah, and at least in Europe, like, you have to, like, a lord or a king or something had to employ you, and then you had to go through, you know, years of training.

Starting point is 01:25:35 And there was this class of scribes that knew how to write. They were employed by someone like this. Often the king themselves, like, or, you know, the queen was not literate. So it was this very, very niche skill, and it was like less than one percent of the population was literate in Europe, you know, back then. And then the printing crust came out. And what happened? So the cost of printed material went down something like 100x over the next, I think,

Starting point is 01:26:00 30 years, 50 years or something. The quantity of printed materials went up like 10,000 X in the next 50, 100 years. This was the first effect. Literacy, it took a little while for it to catch up. So I think global literacy, it went up to something like 70%, but that took like another 200 years, 300 years. Because learning to read is just very hard. Learning to write is hard. It takes a lot of effort. It takes education system. It takes, you know, infrastructure to have paper and ink in the free time to do

Starting point is 01:26:28 this instead of working on a farm. So it kind of, it took early stage of industrialization to actually get there. But I think this effect of making it so this thing that was locked away in ivory tower and now it's accessible to everyone, this is just, you know, like none of the things around us would exist today without this. Like, if, if, if, if, you know, if, we weren't literate if the people that built, you know, this microphone weren't, weren't literate, it would have just been very hard to have a modern economy. None of these things would exist. And I just kind of think about back then if people had to predict what would happen when the printing press came out. No one would have predicted that the microphone would become a thing. So I

Starting point is 01:27:06 just feel like this is the best, the best analog for the moment that we're in right now. Yeah, it's interesting that you say that some of the kings were ill, who are employing the scribes because if we're being honest with ourselves, we have business owners who know what they want to build and there are employing software engineers because they themselves cannot write code. And I think we'd like to mock the CEOs who are coming there, coming to the team. They might even have a drawn prototype or whiteboard and saying this should be easy, but of course they don't understand how difficult it is.

Starting point is 01:27:41 But there seems to be a bit of analogy where there's a person who wants what they want, but until now they needed to hire a software, a specialist who can build that, and there's always that disconnect between the idea and the person. And just like with the printage press, like what would happen if they could actually express them? Like the king could actually read or write their own letters. They wouldn't need that middleman. And things become more efficient.

Starting point is 01:28:05 I mean, of course, for the scribe, it's not the best news necessarily. But, I mean, smart scribes can also do, you know, so someone needs to like write the books, run the press, etc. Yeah, exactly. And if you think about what happened to the scribes, right, like they cease to become scribes, but now there's a category of writers and authors. Like, these people now exist. And the reason they exist is because the market for literature has expanded a ton.

Starting point is 01:28:29 And I guess also if we think about, like, back then a scribe's work was read by a few people. And with the printing press and author, there's a lot more authors and some of them are not really read, but some of them have wider reach than they could imagine. There's new careers that exist because of that. Yeah. I love the analogy. And the most exciting thing for me is it's just so impossible to say today what will happen after this happens and after this transition happens. Just, you know, the economy, as we know, it would not have existed without it.

Starting point is 01:29:03 So what's next? Like what is the thing that we can't even predict today that will exist because anyone can do this? Well, we cannot predict, but I think we can. look at what is working right now. If you look around in your environment, may that be the team across Entrophic, who are software engineers or builders or members of technical staff, however we call them who to you or standout? What are they doing? What skills have they built up? And how have they changed the way they work? It's hard to name individuals because honestly, this is just the strongest, these are the strongest people I've ever worked with in my career.

Starting point is 01:29:41 There's all sorts of different archetypes. There's some people that are really amazing prototypers. So take something from zero to point five. Just, you know, figure out, like, what are some cool ideas? What did the technology unlock? There's other people that are amazing at finding product market fit. So kind of point five to one or maybe zero to one. There's other people that span different disciplines. And I'm just seeing more and more of these people. Like I said, like people that span product engineering and infrastructure engineering or, you know, product and design or design and engineering. I think I'm just seeing a lot more of these of these hybrids. What's a belief that changed from last year to this year?

Starting point is 01:30:17 Something that, you know, like you either believed or a conviction that you had, that you either revised or completely through a way? I think one thing I wasn't sure about is how big a problem is safety, to be totally honest. I joined Anthropic because, like I said, I read a lot of sci-fi and I kind of, I know how bad this thing can go if it goes bad. It wasn't something I was sure about. but seeing it from the inside and then seeing how the new risks that have arisen in the last year, it just makes me much, much more worried about it.

Starting point is 01:30:50 So I think it's, it was kind of an important thing for me. Now it's just the most important thing for me is how do we make sure this thing goes well? I think it's safe to say you were a really great software engineer even before all the AI thing started. And you seem to be a very productive engineer, of course, part of a team as well, but also individually. what are some skills of like, you know, before being a software engineer that are still as valuable or maybe even more valuable than before? And what are ones that are maybe just not as much and they're best left behind? Probably, okay, so stuff that's left behind is, best off behind is maybe like very strong opinions about like code style and languages and things like this. Like I can't wait to get past like these endless language debates and framework debates and all the stuff.

Starting point is 01:31:38 because the model can just like, you know, use whatever language and framework, and if you don't like it, it can just rewrite it for you. So it just doesn't matter anymore. I think something that still matters a lot today is thing, it's being methodical and hypothesis driven. This matters both in product design in this world where everything is being disrupted and we need to figure out what to build next. And this is something everyone is thinking about. But it also matters for engineering day today, you know, like something like debugging. You just have to be very methodical about it. And the model can do this and it can help a lot. But I think still we're in this transition point where you still need to have the skill. I don't know if you're still going to need to have it in six months. Other skills that I think are more valuable are being curious and being open to doing things beyond your swim lane.

Starting point is 01:32:28 So, you know, if you're working on engineering but you really understand the business side, you can just build really awesome products. And I think the next, you know, billion dollar product, you know, like after quad code, whatever the next startup is that, you know, becomes the next trillion dollar startup, it might just be like one person that has some cool idea. And their brain just is able to think across, you know, engineering and product and business or, you know, like design and finance and something else. Like it's people are going to become more and more multidiscipline and this will become more and more rewarded. So in some ways, I think this will be the year of the generalist. I think the other skill that's actually been rewarded of it is having a short attention span. I was being rewarded now. Oh, yeah. It's, you know, like teenagers are using, you know, like, TikTok and all this stuff. And I think in some ways it's kind of dangerous for society because like you want people that can think deeply and can contemplate ideas and aren't just moving on to the next idea very quick. But in some ways, I think this year is kind of the year that is going to reward. It's like, year of ADHD. Because the work for me has become jumping between quads. It's become managing

Starting point is 01:33:41 clods. And so it's not so much about deep work. It's about how good am I about context switching and, you know, jumping across multiple different contexts very quickly. Could I add that from what I understand, what all you said, maybe we could add one thing, which is adaptability, because you're saying, of course, that ADHD and you can jump across. But of course, earlier, you were very good focusing deeply on one thing as well. And what strikes me about you, and maybe this is true for other people as well, you're just kind of very open to adapting your working style

Starting point is 01:34:11 and seeing what works well for this stage, especially when things are changing. I think the one certain thing we can be sure is whenever the next model comes out, they'll change again. And you need to be curious and open to adapting how you work, right? Yeah. And as closing, what's a book or books that you would recommend?

Starting point is 01:34:28 I've gone down a Sishin-Lew rabbit hole. So he's the three-body problem guy, but he actually has like a lot of other really great books. I really love his short stories. He has a couple books of short stories. I'm a big fan. For people that are new to sci-fi and you want like a little bit like harder sci-fi,

Starting point is 01:34:44 I really love Accelerondo by Strauss. This is a book I would totally recommend. It's like essentially the product roadmap for the next 50 years. It starts with takeoff kind of starting to happen and kind of AI singularity. And then it ends up with like these kind of like group lobster consciousnesses. orbiting Jupiter. And it's just like amazing. And the thing that I think it really captures is just the pace,

Starting point is 01:35:08 this like quickening, quickening, quickening pace of how this feels. It really matches the feeling right now. And then on the technical side, I would strongly recommend functional programming and Scala. Even if language choice just doesn't matter as much anymore, I think there is this art to functional programming that just teaches you how to code better. And it'll just teach you how to think in types. If you read this book,

Starting point is 01:35:30 I think what's really important is to do the exercises. also, and I've gone through, and I've done all of them probably like three times over. And it's just amazing. It really just like knocks this idea of functional types into your head. And it's just a thing you can't stop thinking about. Boris, thank you so much. This was awesome. Yeah.

Starting point is 01:35:48 Thanks, Garege. This was a really interesting conversation. And the thing that I keep coming back to is to Boris's prontic personality. The idea that medieval scribes were this tiny elite who could write, employed by kings who themselves were often illiterate. and that we soft rangers might be in a similar position today. We are the scribes. We spent years mastering this craft,

Starting point is 01:36:08 and now the printer press is arriving. But what Boris told me is that the scribes did not disappear. They became writers and authors, and the entire market for written work expanded beyond anything anyone could have predicted. I do find this hopeful and also appreciate that Boris didn't sugarcoded. The other thing that struck with me is just how differently the cloth code team built software. No PRDs, no mandatory ticketing system, designers and data scientists, and finance people all writing code

Starting point is 01:36:33 and building dozens or hundreds of prototypes before shipping a feature. And Boris is shipping 20 to 30 poor requests a day without editing a single line by hand. And there are different verification systems in place, ClaudeCode reviewing its code, automated lint rules, Best of End Passes, and Human Code Review.

Starting point is 01:36:50 If you've enjoyed this podcast, please do subscribe on your favorite podcast platform and on YouTube. A special thank you if you also leave a rating on the show. Thanks and see you on the next one.

The Pragmatic Engineer - Building Claude Code with Boris Cherny

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.