The Changelog: Software Development, Open Source - The era of the Small Giant (Interview)

Starting point is 00:00:01 My friends, welcome back. This is the change log. We feature the hackers, the leaders, and those living in this crazy world we're in. Can you believe it? Yeah, Damian Tanner's back on the show after 17 years. Wow. Okay, some backstory. Damian Tanner, founder of Pusher, now building layer code. He returned to the podcast, technically, officially for the first time, but he sponsored the show. He was one of our very first sponsors of this podcast 17 years ago almost, I want to say. I'm estimating, but he's sponsored. He was one of our very first sponsors of this podcast 17 years ago, almost I want to say. I'm estimating, but It's pretty close to that. I think that's so cool. So he's back officially talking about the seismic shift happening right now in software development. I know you're feeling it. I'm feeling it. Everyone's feeling it. So from first time sponsor the podcast to a frontline builder in the AI agent era,

Starting point is 00:00:49 Damien shares raw insights on why SaaS is dying, why Code Review is becoming a bottleneck, maybe nonexistent, and how small teams can build giant things. A massive thank you to our friends. our partners, our sponsor, yes, talking about fly.io, the home of changedlaw.com, love fly, and you should too, launch a sprite, launch a fly machine, launch an app, launch whatever, on fly. We do, you should too. Learn more at fly.io. Okay, let's do this.

Starting point is 00:01:24 Well, friends, I'm here again with a good friend of mine, Kyle Goldbreith, co-founder and CEO of depot. dot dev slow builds suck depot knows it Kyle tell me how do you go about making builds faster What's the secret? When it comes to optimizing build times to dry build times to zero You really have to take a step back and think about the core components that make up a build You have your CPUs you have your networks you have your disks All of that comes into play when you're talking about reducing build time And so some of the things that we do at Depot

Starting point is 00:01:59 We're always running on the latest generation for arm CPUs and AMD's CPUs from Amazon, those in general are anywhere between 30 and 40% faster than GitHub's own hosted runners. And then we do a lot of cache tricks, both for way back in the early days when we first started Depot, we focused on container image builds. But now we're doing the same types of cash tricks inside of GitHub Actions, where we essentially multiplex uploads and downloads of GitHub Actions cache inside of our runners so that we're going directly to blob storage with us high of throughput as humanly possible. We do other things inside of a GitHub Actions Runner,

Starting point is 00:02:37 like we cordon off portions of memory to act as disk so that any kind of integration tests that you're doing inside of CI that's doing a lot of operations to disk, think like you're testing database migrations in CI. By using RAM disks instead inside of the runner, it's not going to a physical drive, it's going to memory. And that's orders of magnitude faster. The other part of build performance is the stuff that's not

Starting point is 00:03:01 the tech side of it. it, it's the observability side of it, is you can't actually make a build faster if you don't know where it should be faster. And we look for patterns and commonalities across customers, and that's what drives our product roadmap. This is the next thing we'll start optimizing for. Okay, so when you build with Depot, you're getting this. You're getting the essential goodness of relentless pursuit of very, very fast builds near zero speed builds. And that's cool. Kyle and his team are relentless on this pursuit. You should use. Use them. Depot.dev.

Starting point is 00:03:35 Free to start. Check it out. One liner change in your get-up actions, depot.dev. Well, friends, I'm here with a long-time friend, first-time sponsor of this podcast, Damian Tanner. Damien, it's been a journey, man. This is the 18th year of producing the change log. As you know, when Netherlands and I started this show back in 2009, I called. corrected myself recently. I thought it was November 19th. It was actually November 9th, was the very first, the birthday of the change log, November 9th, 20, yeah, 2009. And back then,

Starting point is 00:04:17 you ran Pusher, Pusher app. And that's kind of when Spontregor podcast was kind of like almost charity. Right. Like you didn't get a ton of value because there wasn't a huge audience, but you want to support the makers of the podcast and, you know, we were learning. And, you know, You know, obviously, open source was moving fast, and we were trying to keep up and GitHub was one year old. I mean, like, this is a different world. But I do want to start off by saying, you are our first sponsor of this podcast. I appreciate that, man. That's right kind of you.

Starting point is 00:04:49 You know, reflecting on Puscher, we kind of just ended up creating a lot of great community, especially around London and also around the world with Pusher. Yeah. And I really love everything we did. And we started an event series. And in fact, another kind of like coming back around Alex Booker, who works at Mastra. He's coming to speak at the AI Engineer London Meetup branch that I run. And he started and ran the Pusher Sessions, which became really well-known talk series in London. Okay. Were you at the most recent AIE conference? I was in SF, yeah. Okay. What was that like? I kind of jump in the shark a little bit, so I kind of want to talk, I want to juxtapose like

Starting point is 00:05:42 pusher then, time frame developer to like now, which is drastically different. So don't, let's not go too far there, but how was AIE in SF recently? It was a good experience, always a good injection of energy going to SF. I live just outside London. but you know what the venue is quite big and it didn't have that like together feel as much as some conferences and but it was the first time though I sat and you know huge conference hall and I think it was like wind surf or something's chatting and I was like this is this is really like we're all miners at a conference about mining automation and we're like we're engineers

Starting point is 00:06:23 so we're super excited about it but right it's kind of weird like it's going to change all of our jobs. All right. It's like I'm working right now to change everything I'm doing tomorrow, right? I mean, that's kind of how I viewed it. I was watching a lot of the playback. I wasn't there personally this time around, but I do want to make it the next time around.

Starting point is 00:06:44 But, you know, just the Sean Swick Swang, the content coming out of there, everybody's speaking. I know a lot of great people are there obviously pushing the boundaries of what's next for us, the frontier, so to speak. but a lot of the content, I mean, almost all the content was like top, top notch. And I feel like I was just watching the tip of humanity, right? Like just experiencing what's to come because in tech, you know this as being a veteran in tech. We shape, we're shaping the future of humanity in a lot of cases because technology drives that.

Starting point is 00:07:17 Technology is a major driver of everything. And here we are at the precipice of the next, the next, next, next thing. And it's just wild to see what people are doing with it, how it's changing everything we know. Everything I feel like is like a flip. It's a complete, not even a one, it's like a 720. You know what I mean? Like it's three spins or four spins. It's not just one spin around to change things.

Starting point is 00:07:43 I feel like it's a dramatic forever. Don't even know how it's going to change things, changing things thing. And, you know, bringing it back to the push of days, it's the vibe we had then. You know, there was this period around just before Pusher and the first half of Pusher, I felt like where we were going through this, maybe it was called like the Web 2. But there was a lot of great software being built and a lot of, you know, the community. And I think the craft that went into, especially like the Rails community. and we're just, we're able to build incredible web-based software.

Starting point is 00:08:24 And then, you know, we've gone through like the commercialization, industrialization of SaaS. And what gets me really excited is now when we're, you know, we run this AI engineer London branch and incredible communities come together. And it's got that energy again. And I guess the energy is very exciting. There's new stuff. Everyone can play a part in it. And we're also just all.

Starting point is 00:08:49 completely working it out. And it's like you've got the, you know, folks on the main stage of the conference. And then you've got, we'll chat about it later, maybe like Jeffrey Huntley posting his meme, Ralph Wigan blog post. It's like the crazy ideas

Starting point is 00:09:07 and innovation is kind of coming from anywhere, which is brilliant. Yeah, some satire happened to. I think there was a conversation that was quite comedic. I can't remember who the talk was from, but I was really just enjoying just the fun nature of what's happening and having

Starting point is 00:09:25 fun with it, not just being completely serious all the time with it. For those who are uninitiated, and I kind of am to some degree because it's been a long time, remind me and our listeners, what exactly was Pusher, and I suppose the tail end of that, how are things different today than they were then? Busher was basically a WebSockets Push API. So you could push anything to your web app in real time. So just things like notifications into your application. We ended up having a bunch of customers, maybe in finance or crypto or any kind of area where you needed live updating pricing.

Starting point is 00:10:04 In the early days, at one point Uber was using Pusher to update the cars in real-time in the app before they built their own infra. And it was funny. I remember the stand-up, because we ran a consultancy, where we were chatting about the WebSockets in browsers, and we're like, oh, this is cool, how can we use this? And the problem is, you know, we were all building Rails apps. So like, okay, we need like a separate thing, which manages all the WebSocket connections to the client. And then we can just post an API request and say, push this message to all the clients. It was a simple idea, and we took it seriously and built it into pretty formidable dev tool used by millions of developers and still use a lot today.

Starting point is 00:10:51 And we eventually exited the company to Message Bird, who are a kind of European Twilio competitor. Actually, at one point, we nearly sold the company to Twilio. That would have been a very different timeline. According to my notes, you raised $9.2 million, but it's a lot of money back then. I mean, it's a lot of money now, but like, that was tremendous. That was probably 2010, right? 2011, maybe. The bulk of that we raised later on from Bolton.

Starting point is 00:11:21 Okay. The first round was maybe half a million. Very, very, very, it was, and it started out the agency. So we built the first version in the agency. Just for fun, I suppose. And maybe, maybe some tears on your part. juxtapose the timelines right you you got an acquisition ultimately but you mentioned twillie was an opportunity how would have that been different in if you can like branch the timeline

Starting point is 00:11:51 it would have been a great experience to work with the team at twillia there's incredible people have worked at twillio and move through to lio um i don't know i haven't calculated it but uh we didn't sell because the offer wasn't good enough in our minds. It was a bit of a lobel and it was a stock. In hindsight, the stock hasn't gone very well. So it turns out it was a good financial decision, but yeah, I would have loved that experience. I think, yeah, Twilio became the kind of OG for dev rel, right, and dev community. And we run, you know, how we got to know them is we did a lot of combined events with them and hackathathons with them. That was a fun time. Yeah, they were like the origination.

Starting point is 00:12:40 Daniel Morrill was, Daniel Morrill was, you know, very much quintessential in that process of a whole new way to market developers. And I think that might have been the beginning of what we called Devrel today. Would you agree with that? I mean, it's like the, I mean, if there was a seed that was one of many, probably. But I think one of the earliest seeds to plant of what Deverell is today. Crazy times, man.

Starting point is 00:13:04 So how do you think about those times of Pusher? and the web and building APIs and building SaaS services, et cetera, and, you know, pushing messages to Rails apps, how are today's days different for you? It's exciting because the web and software is just completely changing again. Like I feel like we had that with Web2, right? That was the birth of software on the internet. hosted software on the internet.

Starting point is 00:13:41 And it's such a embedded thing in our culture, in our business, as developers, a lot of us work on that kind of software. Most businesses run on SaaS software now. And I have to remind myself, like, there was a time before SaaS. And therefore, there can be a time after SaaS. And there can be a thing that comes after SaaS. and it's not a given that SaaS sticks around. I mean, like any technology, we tend to kind of go in layers, right?

Starting point is 00:14:14 We still have a bunch of copper phone lines around the place, and we use them for other things, and we're slowly replacing them. These changes, you know, in the aggregate, take a lot of time. But I guess, you know, the thing that can shift more quickly is the direction things are going. And really in the last,

Starting point is 00:14:35 few months, I think I've been more and more convinced by my own experiences and things I've seen playing with stuff, that it's entirely possible and probably pretty likely that there is a post SaaS. And I think, I don't know if everyone realizes, but like the, or everyone is with that intention, but like all of us playing with agents and LLMs, where they're, is to build the software or to do things, we are doing that. We're probably doing that instead of building a SaaS or we're using it to build a SaaS, right? It's already playing out amongst the developers. Yeah.

Starting point is 00:15:19 It's an interesting thought experiment to think about the time before SaaS and the potential, as you may say, the potential time after SaaS. I'm curious because I hold that opinion to some degree. I think there's, you know, what SaaS stays and what SaaS goes if it dies. And you said in the pre-calment, burst above a little bit here, you did say, and I quote, all SaaS is dead. Can you explain in your own words? All SaaS is dead. I think I should probably go through my journey to here to kind of to illustrate it. Because it, give us the TLDR first, though.

Starting point is 00:16:00 Give us the clip. and then go into the journey. Okay, okay. The TLDR is SaaS. So there's a few layers as like the building of software or parts to software. There's a building of software and then there's the operating of software

Starting point is 00:16:14 to get something wrong. And I think most developers are very familiar of like the building of software is changing now. But the operating software, the operating of work, the doing of work in all industries and all knowledge work.

Starting point is 00:16:31 can change like we've changed software. And SaaS is made for humans, slow humans to use. The SaaS UI is made for a puny human to go in, you know, understand, you know, work at this complex thing and it has to be in a nice UI. If it's not a human actually doing the work that they do in the SaaS, if it's an AI doing that work, why is there a UI, why is there a, Why is there a SaaS tool? Right.

Starting point is 00:17:03 The AI doesn't need a SaaS tool to get the work done. It might need a little UI for you to tell you what it's done. But the whole idea of humans using software, I think, is going to change. It can. Yeah. Well, you've been steeped in, and I still want to hear your journey, but I'm going to step in one second. You've been steeped in APIs and SaaS for a while. So I hold that opinion that you have, that I agree.

Starting point is 00:17:31 that if the SaaS exists for a UI for humans, that's definitely changing. So I agree with that. What I'm not sure of, and I'm still questioning myself, is like what is the true solution here is there are SaaS services that can simply be an API. You know this, you built them. You know, I don't really need the web UI. Actually, I kind of just prefer the CLI.

Starting point is 00:17:53 I kind of prefer just JSON for my agents. You know, I kind of prefer Markdown for me because I'm the human. I want those good pros. I want all of it local. so my agents can, you know, mine it and, you know, create sentiment analysis and, you know, all this fun stuff you could do with DuckDB and Parquet and just super, super fast stuff across embeddings and, you know, PGVector, all those fun things you can do on your own data. But that's where I stop is I do agree that the web UI will go away or some version of it. Maybe it's just there as like a dashboard for those who don't want to play in the dev world with CILIs and APIs and MCP and whatnot. But I feel like SaaS shifts.

Starting point is 00:18:32 Like my take is CLI is the new app. That's my take. Is that SaaS will shift, but I think it will shift into CLI for a human to instruct an agent and an agent to do. And it's largely based on an API, JSON, you know, clear defined endpoints, great specifications,

Starting point is 00:18:53 standards that they get more and more mature as a result of that. Yeah, I guess we should probably kind of tease apart SaaS the business and SaaS the software. Okay. Because, yeah, I agree. The interface is changing. The interface that we use, whether it's visually, a CLI or a chat conversation or something, but the way we communicate with the software is changing, right?

Starting point is 00:19:19 It's a much more natural language thing. We don't have to dig in the UI to find the thing to click. but also so much of the software we use that we call SaaS that we access remotely if you can just magic that SaaS locally or within your company right there's no need to access that SaaS anymore right you just you just have that functionality you just ask for that functionality and and it's been built but yeah SaaS the SaaS the business I guess this is the challenge for companies today is is they're going to have to, if they want to stay in business,

Starting point is 00:19:56 they're going to have to shift somehow because, yeah, I mean, there's still got to be some harness. Harness is the wrong word because you use that in coding agents, but like some infrastructure, some cloud, some coordination, authentication data storage. There's still a lot to do. And I think there's going to be some great opportunities of companies to do that. And maybe a CRM, you know, a Salesforce,

Starting point is 00:20:23 something, you know, manages to say, hey, we are, you know, and that, you know, people like Salesforce trying to do that. Like, we are the place to run your sales agents, write, write your, you know, magically instantiated CRM code that you want just for your business. Maybe there'll be some winners there. But the idea that, I think the thing that's going to change SaaS's business Sasha's software is the idea that like everyone has to go and buy the same version, you know, of some software which they remotely access and can't really change. Okay. I'm feeling that for sure.

Starting point is 00:21:04 Take us back into the journey then because I feel like I cut you off and I don't want to, I don't want to disappoint you but not let you go and give the context, the key word for most people these days, the context for that that blanket statement that SaaS is dead or dying. Yeah, okay, I'll give you a bit of the story. So my company layer code, I'll just give you a little short on that. We provide a voice agent's platform. So anyone can add voice to their agent. It's developer tool, developer API platform for that. And we're now ramping up our sales and marketing. And we kind of started doing it the normal. ways. We kind of got a CRM. We got some marketing tools. And I was just finding, we went through a CRM or two, and I was just finding them like, these are like the new CRMs, right, that are supposed to be good. They were just really, really slow. And then I just couldn't work out how to do

Starting point is 00:22:08 stuff. It was like, I had to go and set up a workflow. And it was like, I needed training to use this CRM tool. And I'd been having a lot of fun with Claude Code and Codex, kind of both flipping between them, kind of getting a feel for them. And so I just said, build me. I just voice dictated, you know, a brain dump for like 10, 15 minutes. Here's the CRM I need. And also, it wasn't just like a boring CRM. It was like, I need you to make a CRM that kind of engages me as a developer who doesn't wake up and go, let's do sales. You know, gamify it for me. And then here are the ways I want you to do that. And it just did it. That was my kind of like coding agents moment. And I think you have that moment when you do a new project. We use an

Starting point is 00:23:05 LLM and a completely greenfield project. And there's no kind of existing code. It's going to kind of mess up or get wrong. And the project's not too big. And just build a whole. freaking CRM. And it was really good. It was a good CRM and it worked really well. And so that was like my kind of like level one

Starting point is 00:23:26 awakening, which was like this idea that you can just have the SAS you want instantly, it suddenly felt true. Because I had done it and I have cancelled the old CRM system now. And there's a bunch of other tools

Starting point is 00:23:42 I plan to cancel. Not because they're all crap, but because it's harder to use them than it is to just say what I want. Because I kind of have to learn how to use those tools. Whereas I can just say, make me the thing, make me the website I want instead of using a website builder tool or make me the CRM that I want to use. And then there's this like different cycle that you have, the like loop that. you have of improvement where it's not a one's, it's not build and then use the software. It's like as you're using the software, you can improve the software at any time. And we've still got to work out how this works.

Starting point is 00:24:31 Like who has the power to change the software? And how do you share that amongst a team, right? And do I have a branch of the software that I, or do I have different, like my own views or something in the CRM that I can mess around with. But just as a within our team of three doing this stuff in the company, it was like, oh, you're annoyed with this part of the software. Just change it. Just change it, yeah.

Starting point is 00:24:57 Yeah. When it annoys you at the exact point of time and then continue with the work. Right. And I assume you're probably still doing like a GitHub or some sort of like primary GitHub, not literally GitHub, but a Git repository as a hub for your work, right? and you probably have pull requests or merge requests. So even if your teammate is frustrated, improves the software, pushes it back,

Starting point is 00:25:21 you're still using the same software and you're still using the same traditional developer tooling, which is pull requests, code review, merging, builds. Yeah, but I think that's going to have to change as well. Okay, take me there. I woke up this morning with that feeling. Okay, that's changing too. How's it changing?

Starting point is 00:25:39 With the CRM and with something we've been building, this week, there were new pieces of software. They weren't existing code basis. I didn't have any prior ideas and taste and requirements about what the code should look like. I think this is the thing that slows people down with coding agents. You use it on existing repo and LLMs have bad taste. they just give you kind of the most common denominator, kind of bad taste version of anything, whether it's like writing a blog post or coding, right? And so when you use it on an existing project

Starting point is 00:26:20 and then you review the code, you just find all these things wrong with it, right? It's like, you know, right now they love doing all this, like, really defensive tri-catch in JavaScript or really verbose stuff or right now a utility function that exist in a library, already. But when you start on a new project and you just use YOLO mode and you're just, you know, you're building something for yourself as well, right? And it works. Like, where,

Starting point is 00:26:55 where's the code? Why? Why review the code? I think we're only in this temporary weird thing where we're like trying to jam. Like, we have these existing software processes. that I ensure we deliver high quality software, secure software, good software. I think it's hard. We can't throw that. We've got sock too. We can't throw those out the window for everything that exists today. But for everything new that you're building, you've got an opportunity to kind of pull

Starting point is 00:27:24 apart question and collapse down all of these kind of processes we've built for ourselves. Processes that were built to ensure humans don't make mistakes. Right. Right. And help humans collaborate and help humans manage change in the repository and everything. It's like if the humans aren't writing the code anymore, we need to question these things. Are you moving into the land of Agent First then? It sounds like that's where you're going. I feel like I'm being pulled into it by, yeah. I'm, I'm, I'm, I'm, I'm kind of like that there is a tide. I can't resist.

Starting point is 00:28:07 I'm falling in the hole. And we're kind of like, we're dipping our toes in, right? Trying to try out an LLM, try out cursor tab. And then we're kind of in there and we're swimming, trying to swim the way we normally swim

Starting point is 00:28:19 in the way we want to go. And suddenly I've just gone, just like relax and just let the river take you. Just let it go, man. Just let it go. It's new. It's scary. Mm-hmm.

Starting point is 00:28:33 It feels kind of terrifying. and I don't have the answers to how we do code review. But, you know, if you look at like a lot of, you know, teams talking about using AI coding agents and their resisting project, everyone's big problem now, code reviews, right? Because everyone using coding agents is producing so many PRs. It's like it's piling up in this review process that has to be done. the new teams that don't have that process in place, they are going multiple times faster right now.

Starting point is 00:29:11 This is the year we almost break the database. Let me explain. Where do agents actually store their stuff? They've got vectors, relational data, conversational history, embeddings, and they're hammering the database at speeds that humans just never have done before. And most teams are duct-taping together,

Starting point is 00:29:32 a Postgres instance, a vector database, maybe Elasticsearch for Search, it's a mess. Our friends at Tagger Data looked at this and said, what if the database just understood agents? That's agentic Postgres. It's Postgres built specifically for AI agents, and it combines three things that usually require three separate systems. Native Model Context Protocol servers, MCP, hybrid search, and zero copy forks. The MCP, integration is the clever bit your agents can actually talk directly to the database. They can query data, introspect schemas, execute SQL, without you writing fragile glue code. The database essentially becomes a tool your agent can wield safely. Then there's hybrid search. Tagger Data

Starting point is 00:30:20 merges vector similarity search with good old keyword search into a SQL query. No separate vector database, no elastic search cluster, semantic and keyword search in one transaction. One engine. Okay, my favorite feature, the forks. Agents can spawn sub-second zero-copy database clones for isolated testing. This is not a database they can destroy. It's a fork. It's a copy off of your main production database if you so choose.

Starting point is 00:30:48 We're talking a one-terabyte database fort in under one second. Your agent can run destructive experiments in a sandbox without touching production, and you only pay for the data that actually changes. That's how copy on right works. All your agent data, vectors, relational tables, time series metrics, conversational history lives in one queryable engine. It's the elegant simplification that makes you wonder why we've been doing it the hallway for so long. So if you're building with AI agents and you're tired of managing a zoo of data systems,

Starting point is 00:31:24 check out our friends at Tiger Data at Tiger Data.com. They've got a free trial and a CLI with an MCP server. You can download to start experimenting right now. Again, taggerData.com. What is replacing code review if there's no code review? Is it just nothing? I think, like, us as developers, we need to think more like, we need to put ourselves in the shoes of PMs, designers, managers.

Starting point is 00:31:56 Because they don't look at the code, right? They say we need this functionality. We build it. We do our code reviews. We ensure it works. And the PM, whoever, goes, oh, yeah, great. I've used it, it meets the requirement. It's great, right.

Starting point is 00:32:13 They're comfortable not looking at the code. They're moving along. They're closing the deal. They're with the customer. They're integrating. They're like, I am confident that the intelligent being that created this code did a good job. Now, I think the only reason we're kind of stuck in these old process

Starting point is 00:32:30 because many of them are set in stone, but also because LLMs aren't quite. quite smart enough yet. They still make stupid mistakes. Right. You still need a human in the loop and on the loop. Yeah, I mean, they're still a bit dumb, and they get done with silly things, and they do up stuff, right? They'll go the wrong direction for a while, and I'm like, no, hang on a second.

Starting point is 00:32:51 That's a great thought here, but let's get back on track. This is the problem we're solving, and you've sidequested us. It's a fun sidequest, if that was the point, but that's not the point. But this is going to change, right? And this is one of the hard things is trying to put ourselves in the mind of like set of what it's going to be like in a year. And I think I've only been, you know,

Starting point is 00:33:16 after us being able to play with LLMs for several years, it feels like I've, I can feel the velocity of it now, right? Because I've felt chat, TPP3, 4, 5, Claude Code, Codex. And now I can go, oh, okay, that's what it feels like for it to get better. And it's going to keep getting better for a few more years.

Starting point is 00:33:44 And so it's kind of like self-driving cars, right? They're like not very useful while they're worse than humans. But suddenly when they're safer than a human, like, why would you have a human? Yeah. And I think it's the same with coding. Like all this process is to stop humans making. mistakes. We make mistakes. Like our mistakes are not special better mistakes. They're still like,

Starting point is 00:34:08 we f*** up stuff in code. We cause security incidents. And so I think as soon as the LLMs are twice as good, five times as good, ten times better at outputting good code. That doesn't cause these issues. We're going to start to let go of this concern, like these things, right? We're going to start to trust them more. leaned on recently and it was really with opus 4-5. I feel like that's when things sort of changed because I'm with you on the trend from chat GPT or GPD3 onto now and feeling the incremental change. I feel like Opus 4-5 really changed things. And I think I heard it in an AIE talk or at least the intention of it if it wasn't verbatim was trust the model. Just trust the model.

Starting point is 00:35:01 As a matter of fact, I think it was one of the guys, man, they were building an agent. and in the, it was maybe his agent layer or layer agent, something like that, maybe borrowed something from your name, layer code. I have to look it up. I'll get the talk. I'll put it in the show notes. But I think it was that talk. And I was like, okay, the next time I play, I'm going to trust the model.

Starting point is 00:35:23 And I will sometimes, like, stop it from doing something because I think I'm trying to direct it a certain direction. And now I've been like, wait, hang on a second. And like, this code's free, basically. It's just going to generate anyways. Let's see what it does. case, I'm like, you know, roll it back or a worse case. It's like just generate better. You know what I mean? Like ultra think. Right. You know, what's the worst that could happen?

Starting point is 00:35:42 Because it's going fast than I can anyway. So let's see, even if it's a mistake, let's see the mistake. Let's learn from the mistake because that's how we learn even as humans. I'm sure Ellen's the same. And so I've come back to this, this philosophy or this, this thought, almost to the way you describe it, like falling into this hole, slipping in via gravity. not excited at first, but then kind of like excited because like, well, it's good in there. Let's just go.

Starting point is 00:36:10 Just trust the model, man. Just trust the model. And it can surprise you. And I think that still gives me that like dopamine here that I would have coding, right? When I was coding manually, you know, you'd get a function right

Starting point is 00:36:29 and you'd be like, oh, it works. Yes. And now it's like you've got like the whole application, right? And you're like, I just did a problem for the whole thing works. That's right. Yeah. It's really exciting. And yeah, it's fun right now. And I mean, it's going to keep changing. This is just a bit of a temporary phase or a now. But I think for many of us building software, we love the craft of it, which you can still do. But also the like, the making a thing is also one of the exciting bits of it.

Starting point is 00:37:09 And the world is full of software still. Like you think about so many interactions you have with like government services or whatever. Not saying that they're going to adopt coding agents particularly quickly, but there is a lot of bad software in the world. And software has been expensive to build. And that's because it's been in high demand. And so I don't think we're going to run out of stuff to build. I think even if we get 10 times faster or 100 times faster,

Starting point is 00:37:41 there's so much useful software and products and things and jobs to be done. Close this loop for me. Then you said SaaS is dead or dying. I'm paraphrasing because you didn't say, or dying. I'm just going to say we're dying. I'll parentheses. That's my parentheses. I'll add it to your thing.

Starting point is 00:38:00 How is it going to change then? So if we're making software, there's still tons of software to write, but SaaS is dead. What exactly are we making then if it's not SaaS? I know that not all software is SaaS, but you do build something, a platform, and people buy the platform. Is that SaaS? What changes? You mentioned interface. It's like where do you see it going? I think we're moving.

Starting point is 00:38:22 And so this is the next level, the next kind of revelation I had was I started using the CRM. And I was like, this is cool. This is super fast. This is better than the other CRM. You know, and I can change it. Cool. I'm doing some important sales work. I'm an enriching leads.

Starting point is 00:38:43 And then I kind of woke up a few days later as like, why am I doing the work? Like, what's going on here? I create an interface for me to use, right? Why can't Claude Code just do the work that I need to do for me? I know it's not going to be with the same taste that I have. and I know it's going to make mistakes, but I can have 10 of them do it at the same time. And it's not a particularly fun idea, fully automated sales

Starting point is 00:39:13 and what that means for the world in general. But it's the particular, like, vertical where I had this kind of... Well, the enriching certainly makes sense for the LLN that do, right? The enriching is like, come on, I'm just the API. I'm copying things. And a lot of it is so manual still. And so that the revelation was just waking up and then going, okay, Claude Codd Codes going to do the work for me today. Like it does for software, it builds the software for me.

Starting point is 00:39:42 I'm going to give it a Chrome browser connection. That's still an unsolved problem. There's a lot of pain in LLMs chatting to the browser. But there's a few good ones. And I'm going to let it use my LinkedIn. I'm going to let it use my X, and I'm going to connect it to the APIs that I need that

Starting point is 00:40:05 aren't pieces of software but are like data sources, right, and get enriched and search things. And then I just started getting it to just, to just do it. And it was really quite good. It was slow,

Starting point is 00:40:21 but it was really quite good. And that was a kind of, that was like that moment where we typed, build this feature in Claudecode, build this, but it was suddenly like, this thing can just do anything a human can do on a computer. The only thing holding it back right now is the access to tools and good integrations with the interfaces, like the old software it still needs to use to do what a human does. And a bigger context window, and it would be great if it was faster,

Starting point is 00:40:56 but I can run them in parallel. So the speed's not a massive, massive problem. And in the space of a week, I built the CRM. And then I got Claude Code to just do the work, but I didn't tell it to use the CRM. I just told it to use the database. And I just ended up throwing away the CRM. And now we have this little clawed code harness

Starting point is 00:41:21 that overrides the Claude Code system prompt, sets up all the tools and gives it an escalate database. And I've just got like, I need to vibe code a new CRMUI, but I've just got like a database viewer that the non-technical team used to kind of look at the leads and stuff like that. It's just a kind of beekeeper kind of database viewer. And now Claude Code is just doing the work. We've only applied it there.

Starting point is 00:41:55 this is just a like clawed code is like this kind of little innovation in an agent that can do work for a long time and we already know people use chat gpti for all sorts of different things beyond coding right and so suddenly i i think these these coding agents are a glimpse of all knowledge work can be sped up or replaced administration work can be replaced with these things now Yeah. These non-technical folks, why not just invite them to the terminal and give them CLI outputs that they can easily run and just up arrow to repeat or just teach them certain things that maybe they weren't really comfortable with doing before. And now they're also one step from being a developer or a builder because they're already in the terminal. And that's where Claude's at. Yeah, I mean, that's what we've done now. I've seen some unexpected kind of teething in. issues with that. I think there's just, the terminal feels a bit scary to non-technical people,

Starting point is 00:43:04 even if you explain how to use it. Like when they quit Claude code or something, they're just kind of like lost. They're like, oh my gosh, where to Claude? Yeah. Yeah. And it was onboarding one of our team members. It's like, okay, open the terminal and then I'm like, okay, we'll go to CD. What if the terminal was just cloud code, though? What if you built your own terminal that was just in cloud code? I actually think that specific UI, whether it's terminal or web UI, it's kind of neither here and all there. But there is the magic is a thing that can access everything on your computer or a computer. Right.

Starting point is 00:43:42 And they're doing that with, I think it's called co-work. Have you seen co-work yet? So it's like I haven't play with it enough to know what it can and can't do. I think I unleashed it on a directory with some PDFs that I had collected that was around business structure. And it was like an idea I had like four months ago with just, you know, a different business structure that would just make more sense primarily around tax purposes, you know. And I was like, hey, you know, revisit this idea. I haven't touched it in forever. And it was a directory.

Starting point is 00:44:14 And I think it went and it just did a bunch of stuff. But then it was like, come in with ideas. I'm like, no, those are not good ideas. I don't know if it's like less smart than cloud code is in intent or whatever, but it's kind of, I think that's what you're trying to do with co-work. But, you know, you could just drop them into essentially a directory,

Starting point is 00:44:36 which is what cloud code lives. And it lives in a directory of maybe files that is a application or knows how to talk to the database as you said your CRM does. And they can just be in a cloud code instance just asking questions. Show me the latest leads. Yeah. I could use a skill if you want to go that route. Or it can just be smart enough to be like, well, I have a neon database here.

Starting point is 00:44:56 The neon CTL, CLI is installed. I'm just going to query it directly. Maybe I'll write some Python to make it faster. Maybe I'll store some of this stuff locally and it'll do it all behind the scenes. But then it gives this non-technical person a list of leads. All they had to do is be like, give me the leads, man, you know? And then you mentioned enabling them as builders. I think it then

Starting point is 00:45:19 it is a window into that because then when they want something Oh, they get curious, right? They'll be like us. They're just going to be like, hey, build me a report for this, build me a web app for this. Help me make this easier.

Starting point is 00:45:31 Yeah. You'd be surprised how easy that is. Help me make it easier is one of those weird ones. And Cloudcode will also auto-complete and just let you tab and enter. And I've noticed that those things gotten more terse. Like maybe

Starting point is 00:45:46 I think the last one I did was like, that's interesting. It was like super short. It was like, I like it, comma, space implemented. Was the completion for them. I like it, comma space implemented. I was like, okay, is that how easy it's gotten now to like just spit out a feature that we were just riffing on that you know the problem? You understand the bug, we just got over. And now your response to me to tell you what to say, because you need me the human to get you back in the loop, at least in today's repel.

Starting point is 00:46:16 is I like it implemented. You know what I mean? I found myself just responding with the letter Y. And a lot of the time it just knows what to do, right? Even if it kind of like is a bit ambiguous, you're kind of like, you'll work it out. So I think it's a very exciting that Anthropic release this co-work thing because they've obviously seen that inside Anthropic,

Starting point is 00:46:41 all sorts of people using ClaudeCode. And, you know, when we, When we think about, okay, someone starts there for non-coding purposes, but stuff is done with code and CLI tools and some MCPs or whatever, APIs. And then the user says, well, make me a UI to make this easier. So, for instance, I had to review a bunch of draft messages that I wrote. I was like, okay, this is kind of janky in the terminal. Make me a UI to do the review.

Starting point is 00:47:11 Then I just did it. And I think that's, right, where software has changed. because when the LLM is 10 times faster, I mean, if you use the GROC with a Q endpoints, right, they're insanely fast. It's going to be fast. Then if you can have any interface you want within a second, why have static interfaces, right? Yeah, I'm camping out there with you. What if, what if everything was just in time? I think like that interface. What if it,

Starting point is 00:47:51 what if I didn't need a shirt with you? Because you're my teammate, but what if you could do the same thing for you? And it solves your problem. And you're in your own branch and what you do in your branch. And it's like Vegas. And it stays there. It doesn't have to be said anywhere else.

Starting point is 00:48:03 Right. Like just leave it in Vegas, right? What if in your own branch, in your own little world as a sales development representative, for example, an SDR who's trying to help the team, help the organization grow and all they need is an interface? what if it was just in time for them only? And it didn't matter if it was maintainable.

Starting point is 00:48:23 It didn't matter how good the code was. All it mattered was that it solved their problem, get the opportunity, and enable them to do what they got to do, to do their job. And you just take that and multiply it or copy and paste it on all the roles that makes sense for that just in time world.

Starting point is 00:48:38 It completely changes the idea of what software. It also completely changes this how we interact with a computer and what a computer does and what it is for. I just love this notion that every user can change the computer, can change the software as they're using it as they like it. I think that's a very, it's essentially everyone's a developer.

Starting point is 00:49:07 Yeah, I mean, it's the ultimate way to use a computer. Like all the gates are down, right? There's no, there's no, there's no geeky pretty more. If I want software the way I want software, so long as I have authentication and authorization, I got the keys. Right. To my kingdom I want to make. And I think also the agents can preempt, right? I haven't tried this yet, but I was thinking of giving it the little sales thing. We have a little prompt where it says, where it's like, if a web UI is going to be better for the user to do this, review this, then just do it.

Starting point is 00:49:46 So then instead of you asking it, you know, you ask it to do some work and then it comes back and be like, oh, what, I've made you this UI where I've like displayed it all for you. Have a look at it. Let me know if you're happy with it. I mean, this is getting kind of like wild this idea. But it's kind of how we can, like think about how we communicate with each other as humans as employees, right? We have back and forth conversations. We have email, which is a bit more asynchronous. you know, we put up a preview URL of something. Like, I think all of those communication channels can be enabled in the age

Starting point is 00:50:26 in you're chatting to. And like, I haven't liked this kind of like product companies of sell, you know, the initial messaging where people are sort of like digital employees, right? But something like that's going to happen. And I don't think it's the exciting bit for me is the human computer interaction. It's right. It's like, and this is how it's, it kind of exciting in the context of layer code and why we love voice is like voice is this OG communication method. Whereas humans, we've been speak, we started speaking before we're writing.

Starting point is 00:51:01 And it's, it's kind of quite a rich communication medium. And it's a terrific, like if your agents can be really multimedia, whether it's you're doing voice with them, text with them. They create a web UI for you. You interact with the UI with them. There doesn't have to be these strict modes or delineations between those things. Well, let's go there. I didn't take us there yet, but I do want to talk to you about what you're doing with layer code. I obviously produce a podcast, so I'm kind of interested in speech to text to some degree because transcripts, right? And then you have the obvious version, which is like you start out with speech, you get something or even a voice prompt.

Starting point is 00:51:45 What exactly is layer code? And I suppose we've been 51 minutes deep on nerding out on AI essentially and not at all on your startup and what you're doing, which was sort of the impetus of even getting back in touch. I saw you had something new you were doing. And I'm like, well, I haven't talked to Damien since he's sponsored the show almost 17 years ago. It's probably a good time to talk, right? So there you go. That's how it works out.

Starting point is 00:52:11 has your excitement and your dopamine hits on the daily or even minute by minute changed how you feel about what you're building with layer code and what exactly are you trying to do with it? Well, there's, and we've talked a lot about the building of a company and the building of software now. And I think founders today have that is as important as the thing that they're building, right? because if you just head into your company and operate it like you did even a few years ago, right? Using no AI, using all your kind of slow development practices, using our slow sales and marketing practices, you're going to really get left behind. And so there is a lot to be done in working out and exploring what the company of the future

Starting point is 00:53:08 looks like, what the software company of the future looks like. I'm very excited about the idea that we can build large companies with small teams. I think a lot of developers, well, I mean, there is a lot of HR and politics and culture change that happens when teams get truly large and companies get truly large. And this is one of the kind of founding principles when we started our startup was, let's see how big we can make this with a small team. And that's very exciting because I think you can move fast and you can keep culture and keep a great culture. And so that's why we invest a lot of our energy into the building of the company.

Starting point is 00:53:58 And what we build and what we provide right now is a, and our first product is a voice infrastructure voice API for real-time building real-time voice AI agents. And this is currently a pretty hard problem. We focus a lot on the real-time conversational aspect. And there's a lot of kind of wicked problems in that. Conversations are dynamic things. And there's a lot of state changes and interruptions and back channeling and everything that happens.

Starting point is 00:54:39 And if you're a developer building an agent, it could be your sales agent, it could be a developer coding agent, and you want to add voice AI. There's a bunch of stuff you're going to bump into when you start building that. And it's a pretty, it's interesting. We kind of see our customers.

Starting point is 00:54:58 We can kind of predict where they are on that journey, right? Because there's a bunch of problems that you, that you don't kind of preempt. And then you just quickly, SLAM into them. And so we've solved a lot of those problems. And so with layer code, you can then just take our API, plug it into your existing agent backend.

Starting point is 00:55:15 So you can use any backend you want, and you can use any agent LLM library you want, and any LLM you want. So the basic example is a NextJS application that uses the Versal AI SDK. We've got Python examples as well. you connect up to the layer code voice layer and put in our browser SDK. And then you get a little voice agent microphone button and everything in the web app. We also connect to phone over to a layer. And then for every turn of the conversation, whenever the users finish speaking,

Starting point is 00:55:53 we ship your back end that transcript. You call the LLM of your choice. You do your tool calls, everything you need to do to generate a response like you normally do for a text agent. then you start streaming the response tokens back to us. And then as soon as we get that first word, we start converting that text or speech and start streaming back to the user. And so there's a kind of bunch of stuff you have to do

Starting point is 00:56:15 to make that really low latency, make that a real-time conversation where you're not waiting more than a second or two for the agent to respond. So we put a lot of work into refining that. And there's also a lot of exciting innovation happening in the model space for voice models, whether it's the transcription

Starting point is 00:56:32 or the text of speech. And so we give you the freedom to switch between those models, right? So you can try out some of the different voice models, some that are really cheap and really, you know, got really casual voices and some like 11 labs that are much more expensive, but they're very professional, clean voices. And you can find the right for your kind of experience that you want. There's a lot of trade-offs, right, in voice between latency, price, quality.

Starting point is 00:56:58 So users explore that and find the right fit for their voice agent. That is interesting. So NextGS, SDK, streaming, latency, is you're meant to be the middleware between implementation and feedback to user? Yeah, we handle everything related to the voice, basically. And we let you just handle text, like a text chatbot, basically. No heavy MP3 or wave file coming down. Just to text.

Starting point is 00:57:36 Yeah, and everything's streaming. And so the very interesting problem to solve because the whole system has to be on real-time. So the whole thing, we call it a pipeline. I don't know if that's a great name for it because it's not like an ETL loading pipeline or something, but we call it pipeline. But the real-time agent system, our back-end,

Starting point is 00:57:58 When you start a new session, it runs on Cloudflare workers. So it's running right near the user who clicked to chat with your agent with voice. And then from that point on, everything is streaming. So the microphone input from the user's browser streaming in, that is then getting streamed to the transcription model in real time. The transcription model is spitting out partial transcripts. we send that partial transcript back to you, so we can show you what you're saying

Starting point is 00:58:29 if you want to show them that. And then the hardest bit in this whole thing is working at when the user is finished speaking. It's so difficult because we pause. We make sounds. We pause and then we start again. And conversation is such a dynamic kind of it's like a game almost, right?

Starting point is 00:58:59 Yeah. So we have to do some clever things, use some other AI models to help you detect when the users end speaking. And when we have enough confidence, like there's no certainty here, but we have enough confidence who think the users finish their thought,

Starting point is 00:59:14 then we finalize that transcript, you know, finish transcribing that last word and ship you that whole user utterance, like whether it's a word, a sentence, a paragraph, the user's spoken. The reason we have to kind of like, we can't stream at that point, right? We have to like bundle up this user utterance and choose an end is because LLMs don't take a streaming input. I mean, you can stream the input, but like you need the complete thing, the complete question to send to the LLM to then make a request to the LLN to start generating a response, right?

Starting point is 00:59:50 There is no duplex LLM that takes input and generates input output at the same time. Technically, what if you constantly wrote to a file locally or wherever the system is, and then at some point it just ends and you send a call that sends the end versus packaging it up and sitting the whole thing once it's done? Like you incrementally just line by line. It's like maybe even like, I don't know, I'm not sure how to describe, but that's how I think about it. What if you constantly wrote to something that, and you just said, okay, it's done. And what was there was the done thing? Yeah, yeah, yeah.

Starting point is 01:00:26 So we can do that in terms of like, because we have the partial transcripts. Yeah. So we can stream you the partial transcripts and then say, okay, now it's done. Now make the LLM call. Then you make the LLM call. But interesting, sending text is actually super fast in the context of voice conversation, right? And actually the default example is crazy. I didn't think this would work until we tried it.

Starting point is 01:00:50 but it just uses a web hook. When the user finishes speaking, the basic example sends your next JS API route a web hook with the user text. And turns out the web hook, sending web hook with a few sentences in it, that's like, that's fine, that's fast. It's all the other stuff like

Starting point is 01:01:08 then waiting for the LLM to respond. Yeah, that's actually not the hard part. I mean, you have maybe a millisecond-ish or a few milliseconds, but it's not going to be a dramatic shift. Right. way I described it versus how you... Yeah, and we've got a web socket endpoint now,

Starting point is 01:01:23 so we can kind of shave off that HTTP connection and everything. But, yeah, then the big, heavy latency items come in. So generating an LLM response, most LLMs we use right now, they're optimized, the ones we're using coding agents, right? They're optimized for intelligence,

Starting point is 01:01:43 not really speed. Then when people optimize for speed, the LLM labs, they tend to optimize for just, token throughput. Very few people optimize for time to first token. And that's all that matters in voice, is I give you the user utterance. How long is the user going to have to wait before I can start playing back an agent response to them? And time to first token is that right? How long before I get the first kind of word or two that I can turn into voice and they can start hearing? The only

Starting point is 01:02:17 major LLM lab that actually optimizes for this or maintains a low latency of TTFT is Google and Gemini Flash Open AI Most voice agents now Doing it this way are either using

Starting point is 01:02:32 GPD40 Or Gemini Flash GPU4O has got some annoying The OpenAAA endpoints Having have some annoying inconsistencies And latency And that's kind of the killer in voice, right? It's it's about

Starting point is 01:02:47 bad user experience if it works, you know, the first few turns of the conversation are fast, and then suddenly the next turn, the agent takes three seconds to respond. You're like, is the agent wrong? Is the agent broken? But then once you get that first token back, then you're good, because then you can, you send that text to us, you start streaming text to us, and then we can start turning into full sentences. And then, again, we get to this batching problem. the voice models that do text a voice, again, they don't stream in the input. They require a full sentence of input before they can start generating any output because again, how you speak, how things are pronounced depends on what comes later. And so you have to then buffer the

Starting point is 01:03:34 LLLM output into sentences, ship the buff sentence by sentence to the voice model. And then as soon as we get that first chunk of 20 millisecond audio we chunk it up into. We stream that straight back down web sockets from the Cloudflare worker, straight into the user's browser and can start playing the agent response. Friends, you know this, you're smart. Most AI tools out there are just fancy autocompletes with a chat interface. They help you start the work, but they never do the fun thing that you need to do, which is finish the work. That's what you're trying to do. the follow-ups, the post-meeting admin, the I'll get to that later tasks that pile up into your Notion workspace, looks like a crime scene. I know mine did.

Starting point is 01:04:17 I've been used to Not notion agent, and it's changed how I think about delegation, not delegation to another team member, but delegation to something that already knows how I work, my workflows, my preferences, how I organize things. And here's what got me. As you may know, we produce a podcast, it takes prep, it's a lot of details, there's emails, there's calendars, there's notes here and there. and it's kind of hard to get all that together. Well, now my Notion agent helps me do all that. It organizes it for me. It's got a template that's based on my preferences, and it's easy. Notion brings all your notes, all your docs, all your projects into one connected space that just works.

Starting point is 01:04:52 It's seamless. It's flexible. It's powerful. And it's kind of fun to use. With AI built right in, you spend less time switching between tools and more time creating that great work you do. The art, the fun stuff. And now with Notion Agent, your AI doesn't just help you with your work. finishes it for you based on your preferences.

Starting point is 01:05:09 And since everything you're doing is inside Notion, you're always in control. Everything Agent does is editable. It's transparent. And you can always undo changes. You can trust it with your most precious work. And as you know, Notion is used by us. I use it every day. It's used by over 50% of Fortune 500 companies.

Starting point is 01:05:27 And some of the most fastest-scoring companies out of their like OpenEye, Ramp, and Versel. They all use Notion Agent to help their team send less emails, cancel more meetings, and stay ahead. doing the fun work. So try Notion. Now with Notion agent at notion.com slash change log. That's all overcase letters, notion.com slash changelog to try your new AI teammate, Notion agent today. And we use our link. As you know, you're supporting your favorite show, the change log. Once again, notion.com slash changelog. You chose TechScript to do all this. We're pretty set on cloud fire workers from day one. and it just solves so many infrastructure problems

Starting point is 01:06:13 that you're going to run into later on. I don't think we'll need a DevOps person ever. It's such a... That's interesting. It's such a wonderful... There are constraints you have to build to, right? You're using V8 JavaScript, browser JavaScript, and a Cloudflare worker, right?

Starting point is 01:06:35 Tons of Node APIs don't work. There is a bit of a compatible. ability layer. You do have to do things a bit differently. But what do you get in return? Your application runs everywhere, 330 locations around the world. There is essentially zero cold start. CloudFair workers start up in the time while the SSL negotiations happened, happening. The worker has already started. And you have very few limitations to your scaling, extremely high concurrency. Every instance is very kind of isolated. That's really important in voice as well. There's kind of often quite big spikes like 9 a.m. everyone's calling up.

Starting point is 01:07:24 Everyone's calling up somewhere that's got a voice agent and asking to kind of book an appointment or something. You get these big spikes. You want to be out to kind of scale. And you need to scale very quickly because you don't want people waiting around. And if you throw everyone, if you throw tons of users on the same system and you start kind of overloading it, then suddenly people get this problem where the agent starts responding in three seconds instead of one second. And it sounds weird. But, yeah, Cloudflag gives you an incredible amount of that for no effort. And I think compared to kind of Lambda and stuff, it's also pretty, like the interface,

Starting point is 01:08:02 like, it's just a HTTP interface to your worker. There's nothing in front. and you can do web sockets very nicely. And there's this crazy thing called durable objects, which I think is a bad name. And it's also a kind of weird piece of technology, but it's a little JavaScript runtime that is persistent, basically,

Starting point is 01:08:31 and has a little escalate database attached to it. And it is, I don't know the right word of it. It's kind of like, it's not the right word for JavaScript, but it's basically, think of it like ThreadSafe. So you can have it,

Starting point is 01:08:45 take a bunch of WebSocket connections and do a bunch of SQL rights to its SQL light database it has attached. And you can, you don't have to do any kind of special stuff dealing with concurrency and atomic operations. So you can, you know, the simple example is just have like a,

Starting point is 01:09:03 implement a rate limiter or a counter or something. like that. You can do it very simply in durable objects. You can have as many durable objects as you want. Each one of them has an SQLite database attached to it. You can have 10 gigabyte per one. And you can then kind of like do whatever, you can kind of shard it however you want, right? You could have a durable object per customer that tracks something that you need to be done real time. You could have a durable object per chat room as long as you don't kind of, like it does have a set amount of computer durable object, but you can use it for all sorts of magical things. And I think it's a real underknown thing that Cloudflare has, that, you know, coming from

Starting point is 01:09:47 Puscher, it's like, it's like the kind of real-time primitive now. And a lot of the stuff we'd reach for something like Puscher, durable objects, especially when you're building a fully real-time system is really grateful. Yeah. You chose TypeScript based on Cloudflare workers, it sounds like, because that gave you 330 locations across the world, general objects, great ecosystem, no DevOps. For those who choose Go or I don't think you'd choose Rust for this because it's not the kind of place you'd put Rust. But Go would compete for the same kind of mind share for you. How would have the system been different if you chose Go? Or can you even think about that? that. I haven't actually written any go. So I don't know if I can give a good comparison, but from the perspective, like what I, what we do have out there is similar real-time voice agent platforms in Python. And I think because a lot of the people building the models, the voice models, then built coordination systems like layer code for coordinating the real-time conversations.

Starting point is 01:10:59 Python was the language they chose. And I think what's more important is the patterns rather than the specific languages. And so we actually wrote the first implementation with RXJS. And that has an implementation in most popular languages. I hadn't used it before. But we chose it because it was for stream processing. It's not really for real-time systems. But it gives you subjects, channel, these kind, has its own names for all these things.

Starting point is 01:11:37 But basically it's like a pub subby kind of thing. And then it's got this kind of functional chaining thing where you can then kind of pipe things and filter things and filter messages and things like that. And that did allow us to build the first version of this kind of quite dynamic system. We didn't touch on it. But interruptions is this other really difficult dynamic part where, whilst the agent is speaking its response to you, if the user starts speaking again,

Starting point is 01:12:06 you then need to decide in real time whether the user is interrupting the agent. Or are they just going, mm-hmm, yep, and like agreeing with the agent. Oh, gosh, yes. Or are they trying to say, oh, stop. I bet it's a hard problem to solve. We have to still be transcribing audio,

Starting point is 01:12:20 even when the user's hearing it. And we've got to deal with background noise and everything. And then when we're confident the user is trying to interrupt the agent, we've then got to do this whole kind of state change where we tear down all of this in-flight LLM request, in-flight voice generation request and then as quickly as possible start focusing

Starting point is 01:12:45 on the user's new question and especially if their interruption is really short, like stop. Like suddenly you've got to like tear down all the old stuff, transcribe the word stop, then ship that as a new LLM request of the back end, generate the same, and then get the agent speaking back as quickly as possible. And it's all happening down one pipe, as it were, at the end of the day, right?

Starting point is 01:13:09 It's like audio from the browser microphone and then audio replaying back. And we would have bugs like, you'd interrupt the agent, but then when it started replying, there'd still be a, you know, a few chunks of 20 minutes, second audio from the old response snuck in there. or, you know, the old audio would be interleaved with the new audio from the agent back and you're kind of in the, you know, audacity or something, some audio editor try to work out like, what's going, why does it sound like this? And you're like rearranging bits of audio going, ah, okay, the responses are taking turns every 20 milliseconds. It's interleaving the two

Starting point is 01:13:50 responses to try and work out what's going on. Real pain in the up to the bar. Yeah. When you solve that problem with the interruption, do you focus on the false examples, the true examples? So do you like have these, if it is an interruption, you can tell it's an interruption by these 17 known cases. Like how do you direct that interrupt? You can, it really depends on the use case. How you configure the voice agent really depends on how the voice agent is being used, right? like a therapy voice agent needs to behave very differently than a, you know, a vet appointment booking answering phone agent. Yeah, a lot of dogs barking in the background.

Starting point is 01:14:39 Yeah, there's that. And when we call that audio environment, so it's often an early issue uses have as like... Interesting. They're like, well, my users call from cafes and it kind of really misunderstands them. and big problem with audio transcription, it just transcribes any audio it hears, right? So if someone's talking behind you, it doesn't know that,

Starting point is 01:14:59 the model doesn't quite know that's irrelevant conversations, just transcribing it all. But if you imagine like the, you know, therapy voice agent, it needs to actually not respond too quickly to the user. It needs to let the user have long,

Starting point is 01:15:14 pondering thoughts, long sentences. Big pauses. Yeah. You know, like maybe, tears or crying or just some sort of, you know, human interrupt, but it's not a true interrupt. It's something that you should maybe even capture in a parentheses. And so you can choose a few different levels of interruption, right?

Starting point is 01:15:34 You can just interrupt when you hear any word. By default, we interrupt when we hear any word. That's not a filler word. So we filter out, things like that. And then if you need some more intelligence, you can actually just ship, off the partial transcripts to an LLM in real time. So let's say the user speaking and starts interrupting agent, every kind of word you get, or every few words, you fire off a request to Gemini Flash and you say,

Starting point is 01:16:08 here's the previous thing that the user said, here's what the agent said, here's what the users just said, respond yes or no, do you think they're interrupting the agent? and you get that back in about 250, 300 milliseconds. And you just can't, as you get new transcripts, you cancel the old ones. You just constantly try and make that request until the user stop speaking. Then you get the response from that.

Starting point is 01:16:31 And then you can kind of make quite an intelligent decision. But these things feel very hacky, but they actually work very well. Well, the first thing I'm thinking about there is that Gemini Flash is not local. So you do have to deal with an outage or latency or downtime. Or in the in Claude,

Starting point is 01:16:52 I would say Cloud Web's case most recently, a lot of downtime because of usage, like really heavy usage. And the last two days I've had more interruptions on Cloud Web than ever. And I'm like, it's the Ralph effect. Yeah, it's the Ralph effect. And I'm like, okay, cool.

Starting point is 01:17:09 I get it. You know, I'm not upset with you because, I mean, like, I empathize with how in the world do you scale those services. So why does your system not allow for a local LLM to be just as smart? Then Gemini Flash might be to answer that very simple question. Like an interrupt. It's a pretty easy thing to determine. Yeah, yeah, yeah.

Starting point is 01:17:29 I think smaller LLMs can do that. Gemini is just incredibly fast. I think because of their TPU infrastructure, they've got an incredibly low TTFT, time to first time. which is the most important thing. But I agree. There are small LLMs. And actually,

Starting point is 01:17:49 I think probably maybe one of the grok with a Q, llamas actually might even be a bit faster. We should try that. But you make a point about reliability. People really notice it in voice agents when it doesn't work, right? And especially if a business is relying on it to collect a bunch of calls for them. And so that is one of the other helpful things that platforms like our providers. Or even this cost.

Starting point is 01:18:13 I imagine over time, cost. Right now you're probably fine with it because you're innovating and maybe you're finding out like customer fit, ability, you know, reliability, all those things. And you're sort of just in time building a lot of the stuff and you're maybe okay with the inherent cost of innovation. But at some point, you may flatten a little bit. And you're like, you know what? If it had been running that locally for the last little bit, we'd have saved 50 grand. I don't know what the number is, but, you know, the local model becomes a version of free when you own the hardware and you own the compute and you own the pipe to it and you can own the SLA latency to it as well and the reliability

Starting point is 01:18:50 that comes from that. And there's some cool. There's a new transcription model from Navidia and they've got some voice models as well. And so there was a there was a great demo of fully open source local voice agent platform that was done with pipecat, which is the Python coordination agent infra open source project that I was mentioning. And they've got a really great pattern. They have a plugin pattern for their voice agent. And I think that's the right pattern. And we've adopted a similar, and other frameworks have done that. We've adopted a similar pattern for us when we've rebuilt it recently.

Starting point is 01:19:29 And the important thing is the plugins, you know, this is kind of, the plugins are independent things that you can test in isolation. That was the biggest problem we had with RXJS as the whole thing was kind of, You know, those mixing, kind of audio mixing things where you have like cables going everywhere. It was kind of like that, right? With RXGS subjects going absolutely everywhere. It was kind of hard, it was hard for us as humans to understand. It was the kind of code where you come back to a week later and go, what was happening here?

Starting point is 01:20:05 Yeah. And things like, you know, oftentimes we'd end up writing code where the code at the top of the file, was actually the thing that happened last in the execution of it. Basic stuff like that, just because that's how the RXGS was kind of telling us to do it or kind of like guiding us and how we had to initialize things. But that was one of the key things.

Starting point is 01:20:32 We moved to a plugin architecture. We moved to a very basic, we've got no kind of RXJS style stream processing plugin. It's just all very simple JavaScript with async iterables. And we just pass a waterfall of messages down through plugins. And it's so much better. And we can take out a plugin if we need to.

Starting point is 01:20:54 And we can unit test a plugin. And we can write integration tests and mock out plugins up and down. And we're about to launch that. And that's just such a game changer. And interestingly, tying back to LLMs, we ended up here because with the first implementation, we found it hard as developers to understand the code we'd written. The LLMs were hopeless. They just could not hold the state of this dynamic, crazy, multi-subject stream system in their head.

Starting point is 01:21:29 Well, context was everywhere, right? Like, it was here, it was there. Even if you, I would do things like take the whole file. I was like copying and pasting files into chat GPT Pro being like, you have, you definitely have the context here, fix this problem, and they solve the problem. Fix it. And part of the problem was that complexity. I mean, not having the ability to test things in isolation, then meant we couldn't have a kind of TDD loop, whether it was with a human or with an agent. And so, and because we couldn't use, couldn't use agents to add features to

Starting point is 01:22:04 this platform, to the core of the platform, it was slowing us down. And so that's when we really started to use coding agents ClaudeCode and Codex like really properly and hard is we were like I spent like two weeks just in ClaudeCoding Codex and the mission was

Starting point is 01:22:26 if I can get the coding agent to write the new version of this was kind of not even a refactor it had to be rewritten. Start from scratch, yeah. First. principles. Then it will, by virtue of it, writing it, it'll understand it. And then I'll be able to use

Starting point is 01:22:46 coding agents to add features. And I started with literally the API docs for our public API, because I didn't want to change that. And the API docs of all of the providers and models we implement with, like the speech to text and text to speech model provider endpoints. And it's just some ideas about, I think we should just use a simple waterfall pipe, like, pass, messages through the plugins. And that experience was really interesting because it felt like molding clay. Because I did care about, I really cared about how the code looked because I wanted a humans as well as engineers.

Starting point is 01:23:26 The agents aren't quite good enough to build this whole thing from a prompt, but I think they will be in a year or two, right? but it did an okay job and it needed a lot of like reprompting refactor this re-architect this but it felt like clay in one sense because it was like and you mentioned this earlier like you can just write some code and even if it's wrong you've kind of learned some experience yeah i was able to just say write this whole plug-in architecture and do it and it would do it and i'd be like oh that seems a bit wrong that's hard to understand I was like, write it again like this, write it again like this.

Starting point is 01:24:05 And I suddenly got that like experience of throwing away code because it hadn't taken me weeks and weeks to write this code. Yeah. It'd taken me 10 minutes and I was there for you didn't have a son course. Just threw it away. I used to have your chat session too. So even if you had to scroll back up a little bit or maybe even copy that out to a file for long-term memory if you needed to,

Starting point is 01:24:26 you still have that there as a reference point. Yeah, I find myself doing similar things where it's just like, just trust the model, throw it away, and do it again if you need to learn the mistake, go down the wrong road for the learning. Make the prompt better. And it did a terrific job. And then the bit that really got it over the finish line was then I said, I gave it this script that we used to have to do manually to test our voice agent.

Starting point is 01:24:55 You know, it's like connect to the voice agent, say this to the voice agent, tell it to tell you a long story. Now interrupt the story. you shouldn't hear any leftover audio from the long story like all these things there's like 20 different tests you had to do i gave it that script and i was like write the test suite for all of these tests and and i and then it did and i gave it all these bugs we had in our backlog i was like write tests for this and i just started doing TDD in our backlog and it was great then then i was like write a bunch i did like a chaos monkey thing i was like write a bunch of tests for like crazy stuff that you

Starting point is 01:25:31 could do with the API. Found a bunch of bugs and issues, security issues. And then it kind of, it got it working, had a bunch of unit tests, and I was still having to kind of do a bit of manner of testing. And then one day I was like, you know what, I really want like a, no one's made an integration test thing for voice agents. There are a few observation platforms, observability platforms and eval platforms. I was like, I just, I wanted to simulate conversation.

Starting point is 01:25:58 And, you know, it's so, it's so, like. that's part of the magic is trying something that you're like, this is a pain in the ass to build. Or like, how is this even going to work well? I just got it to build it. And I recorded some WAV files of me saying things and I gave it to them.

Starting point is 01:26:14 And it was like, make an integration test suite for this. And feed the WAV files like you're having a conversation and check the transcripts you get back. Wow. And it did a great job. And then it was actually able to fully simulate those conversations to do all the tests. And then that, I mean, we've got these practices like TDD, which are, which are going to hold value, right?

Starting point is 01:26:36 It was so valuable for the model for the agent to be running the test, fixing the test, running the test, fixing your tests. And that feels a bit like magic when you get that working. So much to cover in this journey. While I'm so glad we had this conversation, I kind of feel like a good place to begin to end, not actually end. is back to this idea that is on your about page. And I'm, I just got a remarkable because I, I love the write and I really hate paper.

Starting point is 01:27:10 Because this thing has Linux on it. And I wrote an, an API that I now API to my remarkable pro tablet. So amazing. You can, you need to be out to Claude Codex from your tablet. That's next. I just got it.

Starting point is 01:27:24 I just got it. So like, that's the next thing is I'm going to have this. It's a little playground for me, basically. but it's it's real time so if you see me looking over here writing audience or even you damien i'm not not paying attention i'm writing things down and the one thing i wrote down earlier from your about page was the era of the small giant which you alluded to but you didn't say those exact words and the reason why i think it might be a good place to begin to end is i want to i think you might be able to encourage the the single developer that maybe in the last couple months they've just begun to touch and not resist falling into this gravity hole or however we describe this, this resistance we've had as developers loving to read our own code and code review and all the things as humans.

Starting point is 01:28:10 And now to, you know, not resist as much or if at all and just trust the model to give them this word of encouragement towards, hey, you're a single developer. And in your case, Damien, you don't need a DevOps. It's not that they're not valuable or useful, but you chose a model, a way to develop your application to solve your problem that didn't require a ops team. Give them that encouragement. What does it mean to be in this era of the small giant world? I think the like the hardest thing is our own mindset, right? I just found this with coding agents. You start off putting in things where you kind of have an idea, you know what to expect out of it. And then you start just putting in stuff that just seems a bit ridiculous and ambitious.

Starting point is 01:28:55 and oftentimes it fails, but more and more it's working, that's a very magical feeling. And it's a very revealing kind of experience. And so I think we can all be more ambitious now. I think we all, and especially as engineers, we know how the whole thing works, right? There is a lot of power everyone's being given with vibe code. being out to vibe code, there are a lot of security issues.

Starting point is 01:29:28 I think they'll be solved over time. But as engineers, we have the knowledge to be able to take things fully through, deploy things, scale them, fix the issues that the LLMs can't still get stuck on. But we can do so much more now. We can be so much more ambitious now. And I think the thing that

Starting point is 01:29:54 you know, everyone should, every engineer should be doing now is not only trying out Claude Code and Kodak's and doing something new and fun. I mean, the great thing is, it's, it's so, there's so low risk, it's so easy to do that you can build something ridiculous and fun that you've always wanted to do, right? Like, yeah. You can just, you can just, you can build something for a, for a friend, for your wife. It's like, that, that's really exciting. And, I think this Ralph Wiggum thing, very kind of basic idea is give a spec.md, a to-do.md, just an ambitious task or a long list of tasks in a markdown file. And you run a shell script, and all it does is it just says to Claudeco, do the next bit of work.

Starting point is 01:30:52 when there's no more work to do return complete and the shell script just greps for complete and if it hasn't seen that word in some XML tags complete it just calls Claudecode again and like many of these things it seems like a terrible idea it seems ridiculous but it is also incredible what it can do and so I think that's probably one of like

Starting point is 01:31:19 to feel what the future is going to be like, I feel like you write down something very ambitious in a markdown file or transcribe an idea you have that you've been thinking about for a while. And you set a Ralph Wiggum script off in it. And you just go for a long walk or go and have lunch. And when you come back, I mean, it's a very exciting feel. And as a developer, it's very fun because then you get to go through all this code and be like, why did it do? And you're like, oh, that was, that's pretty specific. that it did it like that. Okay, that was that was quite a good idea. I mean, I like messed up this bit, but it's, it's, that's, I, I just feel like that's a very, very, very exciting experience.

Starting point is 01:32:04 Very cool. I definitely, I agree with that. I'm, I'm, I'm looking forward to writing that to do.md or spec.md. And just going for that one, because I haven't done it yet. I've only peaked at some of the videos and some of the demos, but I haven't tried the Ralph Wiggum loop. I'm going to post on X, a one-liner Ralph as well, because I think you can just, then you can just copy and paste and go. There's their blog post to read. Yeah, well, I feel like with everything, I want to make it more ceremonious because I want to, it's not because it needs to be, because I want to know. I want to give myself space to think of something that I think of something that I

Starting point is 01:32:46 think would be challenging for me even, you know, and then give it to the thing and go away, like you said, and come back happy. I want to save space to do that when I can give it full mind chair versus the incremental 20 minutes or 10 minutes or whatever it might be that I have available to give it. I kind of want to give it a bit more ceremony, not because it deserves it because I want to actually do it for myself. So I'm just in this like constant learning scenario. It's a pretty wild to, it's a pretty wild era to be a developer and to be an enabled developer, you know, like these non-technical folks that may get introduced to a terminal like thing that basically just clawed in a directory where they can ask questions and get a

Starting point is 01:33:34 just in time interface that managed to them only. Like that's a really, really, really cool world to be in. And it doesn't mean that software goes away. It just means there's going to be a heck of a lot more of it out there. And I do concur that maybe maybe code review doesn't matter anymore. Maybe it won't in a year. Maybe it won't in six weeks. I don't know how many weeks will take.

Starting point is 01:33:56 Let's truly end with this. What's over the horizon for you? What's over the horizon for layer code? What is coming? So the show will release next Wednesday. So you've got a week given that horizon. And no one's listening to now. It's a week from now.

Starting point is 01:34:12 What's on the horizon for you that you can give us a peek at? Is there anything? We are working really hard to bring down the cost of voice agents. There is a magic number of $1 an hour for running a voice agent where suddenly a huge number of use cases open up, whether it's consumer applications, gaming. There are so many places where voice AI will be super valuable. super fun and isn't implemented yet. And with the choices we made, being on Cloudflare with the

Starting point is 01:34:50 system we've built, we're going to be able to bring out the lowest cost platform. I'm very excited for that. And most of all, very excited just to see voice AI everywhere. Voice AI is just such a wonderful interface, right? I find myself dictating all the time to code code and you can kind of get out your thoughts so much, so much better. And I'm excited to see how many applications we can enable adding voice AI into their application. And then we get an insight into like the future of voice AI as well with the companies that are working for a lot of them as startups and they're building some crazy, crazy new things with voice AI on our platform. platform. So there's going to be some amazing stuff with voice coming out this year.

Starting point is 01:35:45 What's the loying fruit? What's the sweet spot for layer code right now that you can invite folks to come and try? Well, the great thing is we've got a CLA, single command. You can run and you'll get a NextJS demo app all connected to your layer code voice agent. And you can get a voice agent running up running with a minute. So it's super fun worth trying. And then from that point you can use Claude Codex and just start building from there. Well, friends, right here at the last minute, the very last question, Damien's internet dropped off or something happened. I'm not sure, but it was a fun conversation with Damien. Kind of wow to be talking to somebody 17 years later after being one of the first, if not the first, I'm pretty sure the first sponsor

Starting point is 01:36:37 of this podcast. What a wild world it is to be this deep in years and experience in history in software and to just still be enamored by the possibilities. I hope you enjoyed today's conversation with Damien and we'll see you next time. Well, friends, the yellow mode philosophy is out there. The code review is a bottleneck, maybe non-existent,

Starting point is 01:37:03 SaaS may be dying or dead. It's time to trust the model, building a CRM just in time. What kind of world is this we're living in? Did you think that the beginning of 2026 would be this kind of year? Now, I know if you're listening to this podcast at the very end and you're a Spotify hater, well, guess what? AI is here to stay.

Starting point is 01:37:22 You should read the tea leaves. And that's just me being honest. But seriously, you can't deny the impact that AI is having. Everyone is talking about it. Everyone is using it. And those who aren't, well, we'll see. I know our friends over at Depot, our friends at Notion, and our friends at Tiger Data. They're all loving this podcast, just like you.

Starting point is 01:37:43 Check them out, depot.dev, notion.com slash changelog, and of course, tigerdata.com. Much thanks, much love. Appreciate the support. But hey, friends, this show's done. This show's over. I'm glad you listened. We'll see you again real soon.

The Changelog: Software Development, Open Source - The era of the Small Giant (Interview)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.