Screaming in the Cloud - Coding Agents, Chaos, and the Future of Dev Work with Dexter Horthy

Episode Date: May 28, 2026

In this episode, Corey Quinn sits down with Dexter Horthy, CEO and Co-founder of Human Layer, to unpack what engineers are getting wrong about AI, especially when it comes to coding agents.Fr...om the obsession with “just throwing more tokens at the problem” to the reality of building scalable AI workflows, Dexter shares hard-earned insights on how to actually push models to their limits. They dive into the evolution of developer workflows, the rise of AI-powered software factories, and why understanding context and verification matters more than raw model power.If you’re building with AI or trying to, this episode will challenge how you think about what these systems can (and can’t) do.Show highlights: (00:00)Throwing Tokens Too Far(01:04) Meet Dexter Horthy(01:52) Personal AI Benchmarks(04:12) Human Layer Race Condition(05:59) Rewrites and Tech Debt(07:19) Software Factories Mindset(10:20) Verifiable Problems and Token Limits(13:45) Agents in the Trenches(18:05) GitHub at Agent Scale(26:23) Safety Ethics and Closing ThoughtsAbout Dexter: Dexter Horthy is the CEO and Co-Founder of HumanLayer, where he helps engineering teams tackle complex problems in large codebases using coding agents. Previously, he worked in DevOps, SRE, and Solutions Engineering at Replicated, and contributed to lunar navigation software at NASA JPL. Outside of work, he’s a fan of tacos and burpees, though not necessarily in that order.Links: LinkedIn: https://www.linkedin.com/in/dexterihorthy/Website: https://humanlayer.devSponsored by: duckbillhq.com

Transcript
Discussion (0)
Starting point is 00:00:00 I regret saying this because in many ways this is a good idea, but I think people are going way too far on the like throw more tokens at the problem. Welcome to Screaming in the cloud. I'm Corey Quinn. I'm joined today by Dexter Horthy, the CEO and co-founder of Human Layer. And by all accounts, he appears to be human. Dex, thanks for joining me. Dude, I'm so stoked to be here. This episode is sponsored in part by my day job, Duck Bill. Do you have a horrifying AWS bill? That can mean a lot of things. Predicting what it's going to be. Determining what it should be.
Starting point is 00:00:40 Negotiating your next long-term contract with AWS. Or just figuring out why it increasingly resembles a phone number, but nobody seems to quite know why that is. To learn more, visit duckbillhq.com. Remember, you can't duck the duck bill bill, which my CEO reliably informs me is absolutely not our slogan. So for those who have not had the pleasure of encountering your particular, we'll call it perspective, what is it you say it is you do here? Amazing. So I am obsessed with getting the most out of AI. How do we take whatever the current models we have outside of training and fine-tuning and like task-specific stuff?
Starting point is 00:01:24 What can we as engineers who are not working in a big lab do to push these models to their limits? most recently in the last like six to nine months, most of that has been around coding agents because I think it's one of the most misunderstood and also has the highest ceiling if you do it right. It seems to me like this is one of those areas where you're taking a half hour out of your day to have this conversation with me
Starting point is 00:01:47 and during that half hour, the whole game is going to change again. This isn't an area where you can hold still. A year ago, I had a whole bunch of problems that, oh, these are things that the coding tools will struggle with. I'll just keep that as sort of a personal benefit. benchmark. Well, I ran out. You ran out of personal benchmark? What did your benchmark used to be? Do some analysis of a 150 megabytes of JSON so I can have discussions with models about my Twitter corpus from a seven-year run. There were build weird backend systems for me that just
Starting point is 00:02:18 sort of started working. I replaced my Adobe Creative Cloud subscription by building in a custom podcast recorder into a brand, into a web app that I use for the Monday podcast that I record for the last week in AWS podcast. It's basically a bunch of workflow tools of things that, well, that's hard. That's what smart people do. I still have some, though. I mean, I have a Bloomberg keyboard on my desk at work,
Starting point is 00:02:40 which has a fingerprint reader that if you don't pay Bloomberg, you can't read. There's nothing on it on the Mac. Claude code went nuts on it, and apparently there's some encryption thing. It needs to basically be able to break through. So, you know, I either need to get someone with an actual Bloomberg subscription and do a wire capture on it,
Starting point is 00:02:55 or I can just put that back on the, well, until cryptography falls. I suppose I'll have to live with it. Yeah, and you probably don't want to get caught asking a frontier model to reverse engineer something that you're supposed to be paying for. It's a good way to get banned from Anthropic for a while. Seriously, interesting, because there's nothing about this that I use this as a standard keyboard. It has a fingerprint reader in it. I want to use the fingerprint reader, the end.
Starting point is 00:03:21 This is not about stealing things from Bloomberg, to be clear. There's nothing unethical in this request. It's, I would love to be able to use the fingerprint reader built into my keyboard. board. The end. Yeah, the evals are getting harder to find good ones that the models can't solve. I still do have a couple, and like I have built this actually this like sort of personal mental model. Like every time I'm doing something with AI that becomes so hard that I either end up spending like ton of time going back and doing like 30 different sessions just to understand the problem and then another 10 sessions to actually figure out the solution, I will like flag that and I have
Starting point is 00:03:56 a little journal of things that AI is not good at solving. And then I come back to that Git repo at that GitShah. And every time there's a new model, I see, can you one shot this problem? Can you actually go figure out the problem? Do you have an example? And I'm sure this example will age like fine milk. It's been working for about six months. There is a race condition bug in the current version of the human layer open source.
Starting point is 00:04:18 We ended up forking that open source repo and making a closed source for now just because open source is a little bit. It's going through its own weird moment right now and ours. Yeah, open source will be going through weird moments for 30 years. but I hear you. Yeah. Our vision does not require us to be open source. It's an extra set of distractions that we just don't want to worry about right now so we can focus. But if you currently pop open the current version of Human Layer, if you can get a model to
Starting point is 00:04:41 one shot the race condition between the Tauri Rust native app, the Vite front end that it serves, the Golang Demon that runs locally, that launches ClaudeCode Sessions, that launch a standard IOMCP server that loops back to the demon, that serves. approval request to the front end and all the way back through all that chain and your model can one shot the solution to that race condition. I know what it is. We haven't pushed the fix to it. We fix it in our close source. But that is that is one of my evals that I, every time I want to test a new model or test a new workflow, we throw it at that. And the correct answer is that workflow is insane. Have you considered not doing that? I mean, so this is the other problem with AI
Starting point is 00:05:23 slop is we haven't talked about problems with AI slop, but we tried the like don't read the code thing for about six months and found ourselves running away from it with our hair on fire. And this may be a skill issue. I find that it's odd because when I do back end stuff or infrastructure stuff, I often have to slap the chainsaw out of the thing's hands. But on front end, I don't know anything about front end, so I assume it's right. It feels like the blast radius might be smaller. A little bit, but also front end is very like, once your front end becomes super tangled,
Starting point is 00:05:53 I mean, it was both back end and front end and how they talked together that caused us to throw out this entire code base. We could have fixed it, but we decided there were other architecture things we needed to rethink anyways. So it would be easier to start Greenfield and throw it out and start over, which is the thing you were never supposed to do. With AI, you can do more. AI makes that a lot better. I found that, oh, this thing that I built to serve a particular purpose and fix a problem that I have no longer serves that purpose because of requirements change or something. Great. Throw it out, baby bathwater and all. The baby's floating face down. It's fine. And we're going to go ahead and start over from scratch. That used to be a three-week project.
Starting point is 00:06:31 Now it's, it'll be done by then my coffee break. I remember the second job I ever had. I started and I came into a three-month refactor that was on month six. And it was like, we're going to upgrade all the frameworks. We're going to pause feature dev. The CTO convinced the CEO that it was going to be okay. It would be over quickly. And it had to happen no matter what. He had like bargained with the product leadership of the company to be allowed to spend a couple months like upgrading and cleaning things up and removing tech debt. And of course, it went twice as long. And, like, my first week was like, okay, this thing is due on Friday.
Starting point is 00:07:03 Everyone has lost patience. And it is now a death march for the next two weeks to actually get this thing out. And, of course, shipped a million bugs. And we eventually, like, recovered. But, yeah, like, you're not supposed to do that. When an engineer says we need to rewrite this thing, you're supposed to tell them to go read a book about why you shouldn't do that. You have a background doing the DevOps SRE dance, which means that you're often the voice of moderation
Starting point is 00:07:24 in dev environments, where everyone wants to build features and do exciting things, you're like, hey, let's make this sustainable, let's slow down, let's be conservative with things like databases, file systems, the stuff that leaves a mark when it breaks. Now it seems like you're almost championing
Starting point is 00:07:39 acceleration of features. What was that transition like? You say I'm like a DevOps SRE. I have done plenty of DevOps and SRE. I did a ton in the Kubernetes world. I was at a startup called Replicated for like seven years where we helped people package up their Kubernetes app and ship it to other people.
Starting point is 00:07:54 data centers. But I would frame it less as like the voice of reason. I've always been a like impatient, fast, like let's ship value. Let's, you know, be scrappy and like figure out like what risks are tolerable and what corners should never be cut, of course. How do we be responsible in our irresponsibility? I played a lot of Starcraft two growing up and, or Starcraft and one and two. And I forget who said this, but like it's, it's an incredible exercise in like early stage companies, not obviously. obviously large debt, like not just like seed, like all the way through ABC, whatever, because it forces you to make hard decisions with incomplete information. And it forces you to do that hundreds of times a minute. Oh, absolutely. One of the hard lessons for me when we're building
Starting point is 00:08:39 Skyway over at Duckville has been. We are willingly accepting technical debt. That is something we are doing with our eyes open on it. And we're making the decisions that will not ideally screw us over later. But if we get to that point, we can fix the technical debt. And if we don't, it won't matter anyway. So that took a bit of change in my perspective, because historically, I was never at a company this early. I was in after product market fit. Okay, developers have taken the environment as far as they can. Everything's on fire all the time. Can you help us? Yes, I can. Basically, my entire job and career have been paying off technical debt. Yeah. And it's really fun. I love paying off technical debt. I mean, so come back to your question of like, how did you go from
Starting point is 00:09:19 the more conservative voice of reason to like, hey, we need to figure out how to accelerate things, is like, I would frame it less as DevOps SRE. I would frame it as like, I've been building software factories my entire career. Like, not on purpose, but I always looked up the most to the engineers that maintain the software factory, whatever part of it it was, whether it was the environment, the like system that allowed you to spin up like temporary testing sandboxes with the full stack so that a PM could look at it, or the CICD pipeline, or the thing that did the automated testing. That was always the most fascinating thing for me because I saw early on the people who invested in that would have compounding returns. You write the feature, you get a feature,
Starting point is 00:09:59 you improve the factory 10%, where you get 20% of your time back the next day, and you can spend half of that making the factory even better, and the other half of it writing more code. And this is how like Will Larson was like an elegant puzzle. There's like this part of the curve where you have, you have invested so much in the thing that builds the thing that you're now just like, leaving everybody behind in the dust. So I am curious, when you take a look now, since what you do, more or less, is telling people how to effectively work with AI coding agents,
Starting point is 00:10:28 what are people getting wrong the most? What can we take away from this as far as, oh, I'm going to get better results with Claude Code after listening to you? I regret saying this because in many ways this is a good idea, but I think people are going way too far on the like throw more tokens at the problem. Are we talking about G-Stack without mentioning G-Stack?
Starting point is 00:10:44 We're talking about Gastown, G-G-Stack, Ralph Wiggum, any number of good ways to throw. throw more tokens at a problem. And in general, if you design the problem correctly, throwing more tokens at it may be helpful, especially if you can create good deterministic back pressure, right? The reason why Ralph Wiggum was able to create this cursed programming language with a model that was not that, you know, like a Sonnet 3-7 or like pre-like everyone else thinks AI is good model, is because it was building a programming language. And a programming language is infinitely verifiable. You great code in the language. You try to compile it. Compiler breaks. You go fix the compiler. You
Starting point is 00:11:18 compiler works, you run the program, program breaks, you go fix whatever the compiler is putting in. But it's like, it's very easy for the model to check its work and tell if it's done a feature right. Not a lot of problems have that characteristic. And people are trying to apply these techniques that worked really well, throwing more tokens at the problem for these like very verifiable problems at problems that are not verifiable. That is, it also feels like that that is what everyone is doing to a point where now we're seeing token capacity constraints. from the major providers. Anthropic, as of this recording, has done some strange things with session windows and double usage. Part of me wonders if that is a byproduct of people throwing tokens of problems. That's interesting. The whole Anthropic thing of like, okay, we need to control
Starting point is 00:12:05 open claw usage and we need to make sure that, hey, people are taking our subsidized inference and only my general take on that whole thing is like if Anthropic wants to give a discounted plan and tell you how you can and can't use it. Like, that's their prerogative. Everybody I know who is serious, all of our enterprise customers, they're paying for token anyways. And it's like, cool. Like, no one, no one promised you cheap inference.
Starting point is 00:12:27 Nobody owes you cheap inference. You can say what you will about anti-competitiveness, right? Like, the example that Theo gave me was actually pretty good is like, Amazon wants to kill diapers.com. So they just take the same product and sell it cheaper. They sell it at a loss because they can afford to. And then one day when that, when all those like, you know, one-off businesses or out of business, then they can charge whatever they want.
Starting point is 00:12:48 That's why I am interested in a lot of the local LLM research that's being done. I want to be able to have a coding agent that runs locally and uses, makes tool use. And sure, it's going to be slower and it might not be as great. But a lot of what I do isn't that complicated. Go ahead and modernize the version of Python. This dumb little script is written in Go is the sort of thing that, okay, that takes half an hour and basically heats up my laptop, I don't care as much. Yeah, that makes sense. So what are you seeing as emerging trends these days other than throwing tokens at things? I don't know. Every other person I talk to is like
Starting point is 00:13:25 accidentally reinventing Gastown from first principles. But I don't know if I want to say that's a trend. It's just a like there is a thing that engineers like to do, which is to glue systems together and see how they work and improve them over time. And you start with three prompts and then you wake up the next day and suddenly you have a hundred. And you're the only one that knows how to use it. it? For me, something that I have begun to deeply appreciate about agents is one of the things I look for when I was interviewing SREs once upon a time, where you start throwing a problem at them and seeing how deep they go. And the right way to get through an interview like that is never give up, never surrender. So I will see these things, oh, I can't, I don't have access to that. So here's what
Starting point is 00:14:08 I'm going to do instead to get to the reason that I'm, that this thing is misbehaving. I've seen it start pulling TCP dumps. I've seen it start packet crafting. It's doing ridiculously in-depth things. I haven't seen S-Trace yet, but I'm waiting for it, where it's using very deep tools to get at the answer. In many cases, past a point of reason. But it's doing a lot of the stuff that I would do if I weren't lazy. I care about figuring out why I have this non-deterministic delay on an API that I build, but not enough to actually go diving into it. But I can turn this thing loose and it'll tell me. This episode is sponsored by my own company, Duck Bill. Having trouble with your AWS bill, perhaps it's time to renegotiate a contract with them. Maybe you're just wondering how to
Starting point is 00:14:55 predict what's going on in the wide world of AWS. Well, that's where Duck Bill comes in to help. Remember, you can't duck the Duck Bill bill bill, which I am reliably informed by my business partner is absolutely not our motto. To learn more, visit Duck Bill, H. The adoption of cloud code was the first thing that made me believe that CloudWatch was actually useful. CloudWatch is incredibly powerful, incredibly useful with a user interface that is garbage. It's the data structure underneath everything good, but it itself, it is terrible to work with, but agents do not care. Exactly. Agents don't care what it looks like because they're just plumbing through JSON anyways.
Starting point is 00:15:38 I remember a tweet I saw when I first got back on Twitter in like 2050. or 2016. And it was a tweet from Coda Hale. And the picture was like, it was one of those CloudWatch charts where you just have like three little dots and one line because it's like not filling in the gaps between everything.
Starting point is 00:15:55 And like the caption was like, CloudWatch was a technical marvel. Like it's incredibly powerful. But how did anyone look at this and say, yes, this is good. This is what we should ship to customers. In October in 2018, Cloudwatch is of the devil,
Starting point is 00:16:09 but I must use it. And I wound up talking. about how it violated every one of AWS's then 14 leadership principles. And that was how I met the then GM of CloudWatch. And they fixed a lot of it. It's still not great, but it's not the nightmare tire fire that it was back in those days. I do miss aspects of this. Of old CloudWatch?
Starting point is 00:16:33 Yeah, back then, when you got something like this working back then, it was because you really cared. You suffered for it to get it out the door. Now it feels like that barrier has been lowered, which. is, I want to be clear, a good thing. But it's having a bunch of knock-on effects. GitHub is on fire based upon the sheer number of commits and agents stuffing things into it. They're not helping themselves by whenever it comes back up for half a second, babbling about co-pilot and then it falls over. People can draw connections that aren't necessarily there. I do think that they
Starting point is 00:17:02 finally showed up in a way, and maybe this is just like me being too terminally online. But like, some VP from GitHub came online and on Twitter. He's like, here's the problem. Here's what we're doing about it. We know it's an issue. Like, here's what I can say about it. Yeah. And it was like, oh, I'm no longer worried about this problem. It's a shame that it took people complaining online for 24 hours a day, for weeks straight for them to come out and do that. There's a corporate comms lesson in here, and that's very Microsoft, where my issue with Azure security for a long time was not the security issues, which aren't great, let's be clear here, but my problem was the complete stonewalling silence coming out of Redmond. I yell at AWS
Starting point is 00:17:39 about this all the time. When they say nothing, they are, far too big now to get the benefit of the doubt. They're a nearly $3 trillion company that is going to have the worst assumed about them until they start talking, at which point, oh, okay. Now, sure, some people aren't going to believe what they say. Some people are always going to want to needle them. And I get that. But at least they're trying at that point instead of, well, maybe if we shut up, they'll go away. Do you think we're going to get an agent optimized GitHub, or do you think someone else is going to have to build that. I am cynical in that this is going to make me sound ancient, but Git was a Marvel.
Starting point is 00:18:21 It was a distributed tool for source control, and the first thing we did is centralize it again. Awesome. It is not that hard in isolation to run a Git repo. It is a static web server with a few extra bits. It's all the ecosystem stuff on top of it that starts getting tricky. It's the fact that it sparks off agents, the fact that it does web hooks, the Rback, which is no small thing, the, fact that it can track issues, the pull request model, the discussions around it.
Starting point is 00:18:47 A part of the problem even now is describing what GitHub is exactly. So some aspects, trivial to replace for agent scale. Others? I don't know, boss. That's a heavy lift. I have a couple friends who are like crazy system engineers. And like last year they built a Git server from scratch in Rust that is like fully protocol compliant and also has like rest APIs for every Git protocol operation.
Starting point is 00:19:12 and it's like super performant. They built it for like five coding infrastructure. It's like every single project on V0 lovable. They don't, those aren't there. Companies like that every single time someone opens a browser, you need to create a Git repo. Now, there are two problems with this.
Starting point is 00:19:25 They have a great shot, but there are two problems with this. Oh, several, actually. One is everyone can build a tool that solves their particular problem. How is, hell is other people's requirements. I've been down that road enough.
Starting point is 00:19:36 So here's my pitch for you. Is like, what is the minimal set of APIs needed to create a headless GitHub so that anybody who wants to can kind of vibe code the front end part which is like you know code still matters but like you can't break everybody else's infrastructure and you can't like and you can throw it out
Starting point is 00:19:55 and rebuild it pretty quickly what is the bare set of operations you need to create something that I can build I'm not going to rebuild GitHub I'm not going to vibe code my own Git server but if you give me a really reliable backend that fits the right interface I'll happily like build my own front end on it
Starting point is 00:20:11 and integrate it into my vibe-coded CRM manager plus project manager, plus like the thing I'm using to run my business of like my custom SaaS that is built on like solid bones and the back end, but I bring the information together how I like. Jay Get Out of the Eclipse project supports a native Git repository backend of an S3 bucket or other object store. So technically that would qualify. Like S3 is pretty solid. You're not going to beat that from a raw infrastructure perspective. Okay. And if you don't have too much traffic because you're only hosting your own version of it, you could just run Git on top of S3.
Starting point is 00:20:47 And as long as the interface is right. You could run Git on top of a Linux box on a pie somewhere and just use SSH as your interface. I guess if you were going to build this as a product for other people to do. Right. Hell is other people's requirements. Well, that's where it gets tricky is because, okay, why? So you have your friends building this in Rust for vibe coding purposes. Awesome.
Starting point is 00:21:04 Great. Why would I use that instead of vibe coding my own? Well, so they didn't vibe code this. They, like, wrote every token by. hand. A year ago, I was like, you've got to get on this cloud code thing. And they were like, no, it's not good enough. Our code is perfect. And I'm like, now I'm like, wow, there are a shrinking number of pieces of software that meet that standard. There's also a network effect to GitHub. Everything integrates with it. The ecosystem is the hard part. This is why you'll never
Starting point is 00:21:26 replace Salesforce either. It's not the API on top of a database. It's the ecosystem. I'll take it a step further. I don't like MCPs for most things. Like AWS has five or six MCPs that I'll find useless because you've already got the AWS CLI. And in theory, the models already know how to do this, which is awesome. Watching it stumble through trying to get the parameters right, just like I do, it's like, oh, computers, they're just like us, is fun from my perspective in a cynical sad way. Sort of the ant farm situation, right? Yeah.
Starting point is 00:21:57 It can do everything it needs to do without going down the MCP path that clutters the context window. So yes, and I think this is one of the most common complaints about MCP, I think my pushback on that would be like, that is only true if you have a bash tool. And in a lot of cases, you a way want to run an agent without a bash tool for safety, security, reliability. I actually think one of my predictions is by the end of 2026, most agents are going to remove the bash tool and replace it with something either like more narrow and scoped or some minimal bash-like thing that has a lot less flexibility. I think we're going to find out because that's a really interesting point of view. A challenge that I would have here in your shoes trying to help people use these tools better,
Starting point is 00:22:42 why don't I just put on my enterprise pants, do an evaluation, that's 18 months, and by that point, we're in a brave new world again because this stuff is iterating so quickly. Why wouldn't I just wait for the foundation models to improve and solve these problems for me? Well, if you need 18 months to make a decision, then you probably should. I think the reason that I wrote that paper about context engineering a year ago that was like, hey, look, I built a thing for the agent ecosystem. Turns out nobody's shipping vertical AI to the enterprise and actually like delivering results is using any of that stuff. They're all ignoring the bitter lesson.
Starting point is 00:23:20 They're all building very specific prompts and pipelines and workflows to improve the capabilities of today's models. was because I really believe now that there will always be a frontier for the model, right? And it's very jagged. You have certain things that can do 40% accuracy, certain things you can do 99% accuracy, and everything in between for every single task under the sun, from coding to health care, to law, to every single thing you could want to do, right? Except for the thing that whatever listener is listening to this and saying, well, that's the thing I do, therefore it could never truly be replaced by a computer.
Starting point is 00:23:56 Yes, many such cases, probably our entire pitch, right, is like, hey, there's things the models are good at and the things that the models aren't good at and we don't think they're going to get good at them anytime soon. And so we are obsessed with building workflows of like, how do you give humans more leverage, right? Where are the parts where like, yes, a model may eventually get this right or if you throw enough tokens at the problem, the model might get it right. But the performance is still low enough that like if you put a human in here, it is high leverage for a human to read it. for example, read a 200 line, markdown doc that summarizes a code change we're going to make and restear at the 25,000 foot level before going down into the weeds and writing the thousand or 2,000 lines of code or whatever it is. So we've encountered an inflection point recently where it happened very quickly,
Starting point is 00:24:43 where open source projects got a bunch of security reports that were AI powered, slop, nonsense. And that was terrible. And at some point now, they're still getting a bunch of them, but they're all valid and good. and actual security problems. People are turning off their bug bounty program just because they need to, they need to deal with the influx of this. And cynically, they didn't budget for this, which I get. But it's wild now where it feels like I could take Claude code, throw it at some well-known tool. Like, great, find the following type of security problem. Go with a little bit of sneering. Yeah. The supply curve for discovered CVEs has shifted way to the right. It's become much, much cheaper,
Starting point is 00:25:25 faster and easier to find vulnerabilities. And so basic macroeconomics, right, the price must fall then. Like, everyone's going to need to cut their bug bounty from $200 of finding to $2 a finding. And then at some point, it's like, well, all right, I have a zero day that gets me remote access to any EC2 instance out there. Like, I don't care what the bug bounty is because that's worth millions and millions and millions of dollars a zero day on certain markets, similar to I have an iPhone zero day. Okay. Maybe that's basically, do you want? to do the right thing, or do you want to be rich? I would like to believe there's a path to do both.
Starting point is 00:26:00 I do, too. I have to sleep at night. Yes. But this does tie back to something you said at the beginning, where as I'm using this to figure out what those USB codes are, whenever I swipe my finger on the fingerprint reader built into the keyboard, you're right. If I'm starting to try to steal Bloomberg stuff, as you mentioned, that could wind up getting me turned off by Anthropic. Security research, though, clearly that is not happening at scale.
Starting point is 00:26:22 How is this being navigated by the providers? I listen to a really good podcast with Boris Churny with Ryan Peterman, and he talks about just some of the safety. It was a very short snippet of it, but they're talking about the safety requirements, and safety is not just like, is the model going to go Terminator and kill us all? It's like they have test environments. They have models they haven't shipped because someone found out that the model would, if you had prompted it, like not even that hard, you could get it to help you develop a biological weapon. It's for a novel. Yes. Yeah, exactly. I'm writing sci-fi. How would you do this? It's the same problem you have in all security scenarios, right, where there's a huge asymmetry of like an attacker has to find one tiny hole and the defender has to cover all infinite potential holes in the security boundary.
Starting point is 00:27:12 I do not envy the model providers here. We are dealing with many ways what is a frontier ethics problem. Frontier ethics. Right versus wrong. For example, putting content, even the training of the models, putting a blog post that you write out, that you wrote by hand out on the internet for anyone who comes by to read, great, awesome. Models come and train on all of it. Well, okay, now, is that acceptable use? Is it not? Because that is how humans wind up learning things. It's only a question of scale. Maybe that doesn't make sense, but it does seem to me that we're pushing ethical boundaries and frontiers all the time with ways that copyright wasn't designed to build a deal with this. Yeah, it's super interesting. There's like a, there's like a price there's like now baked into our ethics of like what is acceptable reuse of someone else's material
Starting point is 00:27:59 there is a like price we put on of like hey if you're going to go read an article and then spend three hours yourself slaving over a blog post that has some quotes and citations and it's well made and it's well written and you put a lot of effort into it that's okay but if someone else just slops out a bunch of copy that's like i don't want to say it's unethical but it's like it's not valued human behavior like we're all smart enough to realize that that like we we as humans value like effort and investment and like what makes art good is not what the thing looks like. I mean, part of it is it has to look good. But like you look at a painting in a museum, part of what makes it good is the story that went into it and the emotion and energy that
Starting point is 00:28:36 went into it that makes you appreciate it. Yeah, that's how it makes you feel. Yeah. I mean, we talked about technical writing a lot. I do want to quickly come back to your question because I think, I think I, like, we were both love tangents and this is my third cold brew of the day. But you ask something about like, why invest in all of these. workflows and prompting and getting the most out of the models today if they just get smarter in a generation and then all of that is now irrelevant. Yeah, I got my 2024 book, Chachipity for Dummies. Why can't I just use that for all my prompting types? Well, so I think there's, there's an interesting, like, set of skills that are translatable across models. They're not translatable
Starting point is 00:29:13 across, like, building harnesses or workflows around models for a specific task, but understanding, like, how transformer-based attention works. And they're not translatable. And they're quadratic nature of attention and the like increasing cost and decreasing quality of results you get as you put more and more into the context window is a skill set that will be relevant no matter how like as long as we have transformer based attention and nobody has been able to come up with an attention model that beats transformers they have linear attention we have mamba jamba it's like yes you have achieved linear attention but you have somehow regressed on everything else like all the tasks and the usefulness is not is not there yet. And so I think there's this skill set that like if people are working with AI, you have kind of three options. You can kind of like yolo out prompts and just be like, cool, it's not worth trying anything more than just take the smartest model and do the minimum effort and see what it can do and be happy with that. Or you can like learn how to push those models 10 to 15 percent further on specific tasks, right? And maybe you make them worse at certain tasks and
Starting point is 00:30:17 better at other tasks by the way that you prompt them or the way you like stitch together context Windows and a workflow. And then the next frontier model comes out and it's better in every way than all of the custom code you wrote. But those skills of understanding how context windows work and how attention works and how to get more out of a model today is still going to translate and it's going to enable you with a little bit of work. But if you're constantly like at the frontier trying to push things to their limits, if you understand these season and you invest in this like core intuition about LMs, you will always be able to generate a solution that is 10, 15% better, maybe 50% better at a specific task because you're kind of applying these base concepts. And so people tell me, like,
Starting point is 00:30:58 Dex, this is all going to get bitter lesson. And I'm like, I think that's how we get to AGI. I mean, SWIC said this too, is like, the way we get to AGI is we continually like ignore the bitter lesson and trying to make these things better. And that's how we learn what the next generation of model needs to do over and over again. That is fractally weird. If that makes sense. It's a little weird. We'll see how it plays out. The cynical thing you could say is, like, here we are engineers trying to make sense of this crazy new world that's moving so, so fast, and trying to figure out how we can add value to a thing that's there. And then retcon justifying of like, no, it's worth putting in this effort because the next models will be smarter, but I'll
Starting point is 00:31:32 be able to make them even smarter over and over again until AGI. If people want to learn more about what you're up to and how you view the world, where's the best place them to find you? If you want the cutting edge stuff, just follow me on Twitter. I'm Dex-Horthy, D-E-X-H-R-T-H-Y, And then, you know, we're building products in this space. You can go to humanlayer.dev. We will be launching soon. I know I get, you can come hang out at our Discord, but it's literally just a wall of angry people asking me, like, when the heck are you going to launch this thing?
Starting point is 00:31:58 We're kind of in private preview with a small group. We are looking forward to giving it to more people soon. But if you go to human layer. You can sign up on the list. You'll get the launch announcement announcements and you can see some of the fun stuff we're hacking on. And we'll put links to that in the show notes. Dext, thank you so much for taking the time to speak with me. I appreciate it.
Starting point is 00:32:15 This was a delightful journey around a bunch of places I did not expect to be talking about, but I had fun the whole way. That's the entire point. Dex Horthy, CEO and co-founder of Human Layer. I'm cloud economist Cory Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice. Whereas if you've hated this episode, please, we have a five-star review on your podcast platform of choice, and then have your model write a dumb comment on that platform, and then while just wait for a smarter model to come along that can dunk on you right back. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.