Screaming in the Cloud - Coding Agents, Chaos, and the Future of Dev Work with Dexter Horthy
Episode Date: May 28, 2026In this episode, Corey Quinn sits down with Dexter Horthy, CEO and Co-founder of Human Layer, to unpack what engineers are getting wrong about AI, especially when it comes to coding agents.Fr...om the obsession with “just throwing more tokens at the problem” to the reality of building scalable AI workflows, Dexter shares hard-earned insights on how to actually push models to their limits. They dive into the evolution of developer workflows, the rise of AI-powered software factories, and why understanding context and verification matters more than raw model power.If you’re building with AI or trying to, this episode will challenge how you think about what these systems can (and can’t) do.Show highlights: (00:00)Throwing Tokens Too Far(01:04) Meet Dexter Horthy(01:52) Personal AI Benchmarks(04:12) Human Layer Race Condition(05:59) Rewrites and Tech Debt(07:19) Software Factories Mindset(10:20) Verifiable Problems and Token Limits(13:45) Agents in the Trenches(18:05) GitHub at Agent Scale(26:23) Safety Ethics and Closing ThoughtsAbout Dexter: Dexter Horthy is the CEO and Co-Founder of HumanLayer, where he helps engineering teams tackle complex problems in large codebases using coding agents. Previously, he worked in DevOps, SRE, and Solutions Engineering at Replicated, and contributed to lunar navigation software at NASA JPL. Outside of work, he’s a fan of tacos and burpees, though not necessarily in that order.Links: LinkedIn: https://www.linkedin.com/in/dexterihorthy/Website: https://humanlayer.devSponsored by: duckbillhq.com
Transcript
Discussion (0)
I regret saying this because in many ways this is a good idea, but I think people are going way too far on the like throw more tokens at the problem.
Welcome to Screaming in the cloud. I'm Corey Quinn. I'm joined today by Dexter Horthy, the CEO and co-founder of Human Layer.
And by all accounts, he appears to be human. Dex, thanks for joining me.
Dude, I'm so stoked to be here.
This episode is sponsored in part by my day job, Duck Bill. Do you have a horrifying AWS bill?
That can mean a lot of things.
Predicting what it's going to be.
Determining what it should be.
Negotiating your next long-term contract with AWS.
Or just figuring out why it increasingly resembles a phone number,
but nobody seems to quite know why that is.
To learn more, visit duckbillhq.com.
Remember, you can't duck the duck bill bill,
which my CEO reliably informs me is absolutely not our slogan.
So for those who have not had the pleasure of encountering your particular, we'll call it perspective, what is it you say it is you do here?
Amazing. So I am obsessed with getting the most out of AI. How do we take whatever the current models we have outside of training and fine-tuning and like task-specific stuff?
What can we as engineers who are not working in a big lab do to push these models to their limits?
most recently in the last like six to nine months,
most of that has been around coding agents
because I think it's one of the most misunderstood
and also has the highest ceiling if you do it right.
It seems to me like this is one of those areas
where you're taking a half hour out of your day
to have this conversation with me
and during that half hour, the whole game is going to change again.
This isn't an area where you can hold still.
A year ago, I had a whole bunch of problems that,
oh, these are things that the coding tools will struggle with.
I'll just keep that as sort of a personal benefit.
benchmark. Well, I ran out. You ran out of personal benchmark? What did your benchmark used to be?
Do some analysis of a 150 megabytes of JSON so I can have discussions with models about my Twitter
corpus from a seven-year run. There were build weird backend systems for me that just
sort of started working. I replaced my Adobe Creative Cloud subscription by building in a custom
podcast recorder into a brand, into a web app that I use for the Monday podcast that I record for
the last week in AWS podcast.
It's basically a bunch of workflow tools of things that,
well, that's hard.
That's what smart people do.
I still have some, though.
I mean, I have a Bloomberg keyboard on my desk at work,
which has a fingerprint reader that if you don't pay Bloomberg,
you can't read.
There's nothing on it on the Mac.
Claude code went nuts on it,
and apparently there's some encryption thing.
It needs to basically be able to break through.
So, you know, I either need to get someone
with an actual Bloomberg subscription and do a wire capture on it,
or I can just put that back on the,
well, until cryptography falls.
I suppose I'll have to live with it.
Yeah, and you probably don't want to get caught asking a frontier model to reverse engineer something that you're supposed to be paying for.
It's a good way to get banned from Anthropic for a while.
Seriously, interesting, because there's nothing about this that I use this as a standard keyboard.
It has a fingerprint reader in it.
I want to use the fingerprint reader, the end.
This is not about stealing things from Bloomberg, to be clear.
There's nothing unethical in this request.
It's, I would love to be able to use the fingerprint reader built into my keyboard.
board. The end. Yeah, the evals are getting harder to find good ones that the models can't solve. I still
do have a couple, and like I have built this actually this like sort of personal mental model.
Like every time I'm doing something with AI that becomes so hard that I either end up spending
like ton of time going back and doing like 30 different sessions just to understand the problem
and then another 10 sessions to actually figure out the solution, I will like flag that and I have
a little journal of things that AI is not good at solving.
And then I come back to that Git repo at that GitShah.
And every time there's a new model, I see, can you one shot this problem?
Can you actually go figure out the problem?
Do you have an example?
And I'm sure this example will age like fine milk.
It's been working for about six months.
There is a race condition bug in the current version of the human layer open source.
We ended up forking that open source repo and making a closed source for now just because open source is a little bit.
It's going through its own weird moment right now and ours.
Yeah, open source will be going through weird moments for 30 years.
but I hear you.
Yeah.
Our vision does not require us to be open source.
It's an extra set of distractions that we just don't want to worry about right now so we can focus.
But if you currently pop open the current version of Human Layer, if you can get a model to
one shot the race condition between the Tauri Rust native app, the Vite front end that it serves,
the Golang Demon that runs locally, that launches ClaudeCode Sessions, that launch a standard
IOMCP server that loops back to the demon, that serves.
approval request to the front end and all the way back through all that chain and your model can
one shot the solution to that race condition. I know what it is. We haven't pushed the fix to it. We
fix it in our close source. But that is that is one of my evals that I, every time I want to
test a new model or test a new workflow, we throw it at that. And the correct answer is that
workflow is insane. Have you considered not doing that? I mean, so this is the other problem with AI
slop is we haven't talked about problems with AI slop, but we tried the like don't read the code
thing for about six months and found ourselves running away from it with our hair on fire.
And this may be a skill issue.
I find that it's odd because when I do back end stuff or infrastructure stuff, I often have
to slap the chainsaw out of the thing's hands.
But on front end, I don't know anything about front end, so I assume it's right.
It feels like the blast radius might be smaller.
A little bit, but also front end is very like, once your front end becomes super tangled,
I mean, it was both back end and front end and how they talked together that caused us to throw out
this entire code base. We could have fixed it, but we decided there were other architecture
things we needed to rethink anyways. So it would be easier to start Greenfield and throw it out
and start over, which is the thing you were never supposed to do. With AI, you can do more.
AI makes that a lot better. I found that, oh, this thing that I built to serve a particular
purpose and fix a problem that I have no longer serves that purpose because of requirements change
or something. Great. Throw it out, baby bathwater and all. The baby's floating face down. It's
fine. And we're going to go ahead and start over from scratch. That used to be a three-week project.
Now it's, it'll be done by then my coffee break. I remember the second job I ever had. I started and I came
into a three-month refactor that was on month six. And it was like, we're going to upgrade all the
frameworks. We're going to pause feature dev. The CTO convinced the CEO that it was going to be okay.
It would be over quickly. And it had to happen no matter what. He had like bargained with the product
leadership of the company to be allowed to spend a couple months like upgrading and
cleaning things up and removing tech debt.
And of course, it went twice as long.
And, like, my first week was like, okay, this thing is due on Friday.
Everyone has lost patience.
And it is now a death march for the next two weeks to actually get this thing out.
And, of course, shipped a million bugs.
And we eventually, like, recovered.
But, yeah, like, you're not supposed to do that.
When an engineer says we need to rewrite this thing, you're supposed to tell them to go read
a book about why you shouldn't do that.
You have a background doing the DevOps SRE dance, which means that you're often the voice of moderation
in dev environments,
where everyone wants to build features
and do exciting things,
you're like, hey, let's make this sustainable,
let's slow down, let's be conservative
with things like databases, file systems,
the stuff that leaves a mark when it breaks.
Now it seems like you're almost championing
acceleration of features.
What was that transition like?
You say I'm like a DevOps SRE.
I have done plenty of DevOps and SRE.
I did a ton in the Kubernetes world.
I was at a startup called Replicated for like seven years
where we helped people package up their Kubernetes app
and ship it to other people.
data centers. But I would frame it less as like the voice of reason. I've always been a like
impatient, fast, like let's ship value. Let's, you know, be scrappy and like figure out like what
risks are tolerable and what corners should never be cut, of course. How do we be responsible in our
irresponsibility? I played a lot of Starcraft two growing up and, or Starcraft and one and two. And I forget
who said this, but like it's, it's an incredible exercise in like early stage companies, not obviously.
obviously large debt, like not just like seed, like all the way through ABC, whatever, because it
forces you to make hard decisions with incomplete information. And it forces you to do that
hundreds of times a minute. Oh, absolutely. One of the hard lessons for me when we're building
Skyway over at Duckville has been. We are willingly accepting technical debt. That is something we are
doing with our eyes open on it. And we're making the decisions that will not ideally screw us
over later. But if we get to that point, we can fix the technical debt. And if we don't, it won't
matter anyway. So that took a bit of change in my perspective, because historically, I was never
at a company this early. I was in after product market fit. Okay, developers have taken the
environment as far as they can. Everything's on fire all the time. Can you help us? Yes,
I can. Basically, my entire job and career have been paying off technical debt. Yeah. And it's really
fun. I love paying off technical debt. I mean, so come back to your question of like, how did you go from
the more conservative voice of reason to like, hey, we need to figure out how to accelerate things,
is like, I would frame it less as DevOps SRE. I would frame it as like, I've been building
software factories my entire career. Like, not on purpose, but I always looked up the most to
the engineers that maintain the software factory, whatever part of it it was, whether it was
the environment, the like system that allowed you to spin up like temporary testing sandboxes
with the full stack so that a PM could look at it, or the CICD pipeline, or the thing that did
the automated testing. That was always the most fascinating thing for me because I saw early on the
people who invested in that would have compounding returns. You write the feature, you get a feature,
you improve the factory 10%, where you get 20% of your time back the next day, and you can spend
half of that making the factory even better, and the other half of it writing more code. And this is how
like Will Larson was like an elegant puzzle. There's like this part of the curve where you have,
you have invested so much in the thing that builds the thing that you're now just like,
leaving everybody behind in the dust.
So I am curious, when you take a look now,
since what you do, more or less,
is telling people how to effectively work with AI coding agents,
what are people getting wrong the most?
What can we take away from this as far as,
oh, I'm going to get better results with Claude Code
after listening to you?
I regret saying this because in many ways this is a good idea,
but I think people are going way too far
on the like throw more tokens at the problem.
Are we talking about G-Stack without mentioning G-Stack?
We're talking about Gastown, G-G-Stack, Ralph Wiggum,
any number of good ways to throw.
throw more tokens at a problem. And in general, if you design the problem correctly, throwing more
tokens at it may be helpful, especially if you can create good deterministic back pressure, right?
The reason why Ralph Wiggum was able to create this cursed programming language with a model that
was not that, you know, like a Sonnet 3-7 or like pre-like everyone else thinks AI is good model,
is because it was building a programming language. And a programming language is infinitely verifiable.
You great code in the language. You try to compile it. Compiler breaks. You go fix the compiler. You
compiler works, you run the program, program breaks, you go fix whatever the compiler is putting in.
But it's like, it's very easy for the model to check its work and tell if it's done a feature right.
Not a lot of problems have that characteristic.
And people are trying to apply these techniques that worked really well, throwing more tokens at the problem for these like very verifiable problems at problems that are not verifiable.
That is, it also feels like that that is what everyone is doing to a point where now we're seeing token capacity constraints.
from the major providers. Anthropic, as of this recording, has done some strange things with
session windows and double usage. Part of me wonders if that is a byproduct of people throwing
tokens of problems. That's interesting. The whole Anthropic thing of like, okay, we need to control
open claw usage and we need to make sure that, hey, people are taking our subsidized inference
and only my general take on that whole thing is like if Anthropic wants to give a discounted
plan and tell you how you can and can't use it.
Like, that's their prerogative.
Everybody I know who is serious, all of our enterprise customers, they're paying for
token anyways.
And it's like, cool.
Like, no one, no one promised you cheap inference.
Nobody owes you cheap inference.
You can say what you will about anti-competitiveness, right?
Like, the example that Theo gave me was actually pretty good is like, Amazon wants to kill
diapers.com.
So they just take the same product and sell it cheaper.
They sell it at a loss because they can afford to.
And then one day when that, when all those like, you know,
one-off businesses or out of business, then they can charge whatever they want.
That's why I am interested in a lot of the local LLM research that's being done.
I want to be able to have a coding agent that runs locally and uses, makes tool use.
And sure, it's going to be slower and it might not be as great.
But a lot of what I do isn't that complicated.
Go ahead and modernize the version of Python.
This dumb little script is written in Go is the sort of thing that, okay, that takes half an hour and basically heats up my
laptop, I don't care as much. Yeah, that makes sense. So what are you seeing as emerging trends these
days other than throwing tokens at things? I don't know. Every other person I talk to is like
accidentally reinventing Gastown from first principles. But I don't know if I want to say that's a
trend. It's just a like there is a thing that engineers like to do, which is to glue systems together
and see how they work and improve them over time. And you start with three prompts and then you wake up
the next day and suddenly you have a hundred. And you're the only one that knows how to use it.
it? For me, something that I have begun to deeply appreciate about agents is one of the things I
look for when I was interviewing SREs once upon a time, where you start throwing a problem at them
and seeing how deep they go. And the right way to get through an interview like that is never give up,
never surrender. So I will see these things, oh, I can't, I don't have access to that. So here's what
I'm going to do instead to get to the reason that I'm, that this thing is misbehaving. I've seen it
start pulling TCP dumps. I've seen it start packet crafting. It's doing ridiculously in-depth things.
I haven't seen S-Trace yet, but I'm waiting for it, where it's using very deep tools to get at the
answer. In many cases, past a point of reason. But it's doing a lot of the stuff that I would do
if I weren't lazy. I care about figuring out why I have this non-deterministic delay on an API that I
build, but not enough to actually go diving into it. But I can turn this thing loose and it'll tell
me. This episode is sponsored by my own company, Duck Bill. Having trouble with your AWS bill,
perhaps it's time to renegotiate a contract with them. Maybe you're just wondering how to
predict what's going on in the wide world of AWS. Well, that's where Duck Bill comes in to help.
Remember, you can't duck the Duck Bill bill bill, which I am reliably informed by my business
partner is absolutely not our motto. To learn more, visit Duck Bill, H.
The adoption of cloud code was the first thing that made me believe that CloudWatch was actually useful.
CloudWatch is incredibly powerful, incredibly useful with a user interface that is garbage.
It's the data structure underneath everything good, but it itself, it is terrible to work with, but agents do not care.
Exactly.
Agents don't care what it looks like because they're just plumbing through JSON anyways.
I remember a tweet I saw when I first got back on Twitter in like 2050.
or 2016.
And it was a tweet from Coda Hale.
And the picture was like,
it was one of those CloudWatch charts
where you just have like three little dots
and one line because it's like not filling in the gaps
between everything.
And like the caption was like,
CloudWatch was a technical marvel.
Like it's incredibly powerful.
But how did anyone look at this and say,
yes, this is good.
This is what we should ship to customers.
In October in 2018,
Cloudwatch is of the devil,
but I must use it.
And I wound up talking.
about how it violated every one of AWS's then 14 leadership principles.
And that was how I met the then GM of CloudWatch.
And they fixed a lot of it.
It's still not great, but it's not the nightmare tire fire that it was back in those days.
I do miss aspects of this.
Of old CloudWatch?
Yeah, back then, when you got something like this working back then, it was because you
really cared.
You suffered for it to get it out the door.
Now it feels like that barrier has been lowered, which.
is, I want to be clear, a good thing. But it's having a bunch of knock-on effects. GitHub is on fire
based upon the sheer number of commits and agents stuffing things into it. They're not helping
themselves by whenever it comes back up for half a second, babbling about co-pilot and then it
falls over. People can draw connections that aren't necessarily there. I do think that they
finally showed up in a way, and maybe this is just like me being too terminally online. But like,
some VP from GitHub came online and on Twitter. He's like, here's the problem. Here's what we're doing
about it. We know it's an issue. Like, here's what I can say about it. Yeah. And it was like,
oh, I'm no longer worried about this problem. It's a shame that it took people complaining
online for 24 hours a day, for weeks straight for them to come out and do that.
There's a corporate comms lesson in here, and that's very Microsoft, where my issue with Azure
security for a long time was not the security issues, which aren't great, let's be clear here,
but my problem was the complete stonewalling silence coming out of Redmond. I yell at AWS
about this all the time. When they say nothing, they are,
far too big now to get the benefit of the doubt. They're a nearly $3 trillion company that is going to have the
worst assumed about them until they start talking, at which point, oh, okay. Now, sure, some people aren't
going to believe what they say. Some people are always going to want to needle them. And I get that.
But at least they're trying at that point instead of, well, maybe if we shut up, they'll go away.
Do you think we're going to get an agent optimized GitHub, or do you think someone else
is going to have to build that.
I am cynical in that this is going to make me sound ancient, but Git was a Marvel.
It was a distributed tool for source control, and the first thing we did is centralize it again.
Awesome.
It is not that hard in isolation to run a Git repo.
It is a static web server with a few extra bits.
It's all the ecosystem stuff on top of it that starts getting tricky.
It's the fact that it sparks off agents, the fact that it does web hooks, the Rback,
which is no small thing, the,
fact that it can track issues, the pull request model, the discussions around it.
A part of the problem even now is describing what GitHub is exactly.
So some aspects, trivial to replace for agent scale.
Others?
I don't know, boss.
That's a heavy lift.
I have a couple friends who are like crazy system engineers.
And like last year they built a Git server from scratch in Rust that is like fully
protocol compliant and also has like rest APIs for every Git protocol operation.
and it's like super performant.
They built it for like five coding infrastructure.
It's like every single project on V0 lovable.
They don't,
those aren't there.
Companies like that every single time someone opens a browser,
you need to create a Git repo.
Now, there are two problems with this.
They have a great shot,
but there are two problems with this.
Oh, several, actually.
One is everyone can build a tool
that solves their particular problem.
How is,
hell is other people's requirements.
I've been down that road enough.
So here's my pitch for you.
Is like, what is the minimal set of APIs
needed to create a headless GitHub
so that anybody who wants to
can kind of vibe code the front end part
which is like you know code still matters
but like you can't break everybody else's infrastructure
and you can't like and you can throw it out
and rebuild it pretty quickly
what is the bare set of operations you need
to create something that I can build
I'm not going to rebuild GitHub I'm not going to vibe code
my own Git server but if you give me
a really reliable backend that fits the right
interface I'll happily like
build my own front end on it
and integrate it into my vibe-coded CRM manager plus project manager, plus like the thing I'm using to run my business of like my custom SaaS that is built on like solid bones and the back end, but I bring the information together how I like.
Jay Get Out of the Eclipse project supports a native Git repository backend of an S3 bucket or other object store.
So technically that would qualify.
Like S3 is pretty solid.
You're not going to beat that from a raw infrastructure perspective.
Okay.
And if you don't have too much traffic because you're only hosting your own version of it,
you could just run Git on top of S3.
And as long as the interface is right.
You could run Git on top of a Linux box on a pie somewhere and just use SSH as your interface.
I guess if you were going to build this as a product for other people to do.
Right.
Hell is other people's requirements.
Well, that's where it gets tricky is because, okay, why?
So you have your friends building this in Rust for vibe coding purposes.
Awesome.
Great.
Why would I use that instead of vibe coding my own?
Well, so they didn't vibe code this.
They, like, wrote every token by.
hand. A year ago, I was like, you've got to get on this cloud code thing. And they were like,
no, it's not good enough. Our code is perfect. And I'm like, now I'm like, wow, there are a
shrinking number of pieces of software that meet that standard. There's also a network effect
to GitHub. Everything integrates with it. The ecosystem is the hard part. This is why you'll never
replace Salesforce either. It's not the API on top of a database. It's the ecosystem.
I'll take it a step further. I don't like MCPs for most things. Like AWS has five or six
MCPs that I'll find useless because you've already got the AWS CLI.
And in theory, the models already know how to do this, which is awesome.
Watching it stumble through trying to get the parameters right, just like I do, it's like,
oh, computers, they're just like us, is fun from my perspective in a cynical sad way.
Sort of the ant farm situation, right?
Yeah.
It can do everything it needs to do without going down the MCP path that clutters the context window.
So yes, and I think this is one of the most common complaints about MCP, I think my pushback on that would be like,
that is only true if you have a bash tool. And in a lot of cases, you a way want to run an agent
without a bash tool for safety, security, reliability. I actually think one of my predictions
is by the end of 2026, most agents are going to remove the bash tool and replace it with
something either like more narrow and scoped or some minimal bash-like thing that has a lot less
flexibility. I think we're going to find out because that's a really interesting point of view.
A challenge that I would have here in your shoes trying to help people use these tools better,
why don't I just put on my enterprise pants, do an evaluation, that's 18 months, and by that point,
we're in a brave new world again because this stuff is iterating so quickly.
Why wouldn't I just wait for the foundation models to improve and solve these problems for me?
Well, if you need 18 months to make a decision, then you probably should.
I think the reason that I wrote that paper about context engineering a year ago that was like,
hey, look, I built a thing for the agent ecosystem.
Turns out nobody's shipping vertical AI to the enterprise and actually like delivering results is using any of that stuff.
They're all ignoring the bitter lesson.
They're all building very specific prompts and pipelines and workflows to improve the capabilities of today's models.
was because I really believe now that there will always be a frontier for the model, right?
And it's very jagged.
You have certain things that can do 40% accuracy, certain things you can do 99% accuracy,
and everything in between for every single task under the sun, from coding to health care, to law,
to every single thing you could want to do, right?
Except for the thing that whatever listener is listening to this and saying,
well, that's the thing I do, therefore it could never truly be replaced by a computer.
Yes, many such cases, probably our entire pitch, right, is like, hey, there's things the models are good at and the things that the models aren't good at and we don't think they're going to get good at them anytime soon.
And so we are obsessed with building workflows of like, how do you give humans more leverage, right?
Where are the parts where like, yes, a model may eventually get this right or if you throw enough tokens at the problem, the model might get it right.
But the performance is still low enough that like if you put a human in here, it is high leverage for a human to read it.
for example, read a 200 line, markdown doc that summarizes a code change we're going to make
and restear at the 25,000 foot level before going down into the weeds and writing the
thousand or 2,000 lines of code or whatever it is.
So we've encountered an inflection point recently where it happened very quickly,
where open source projects got a bunch of security reports that were AI powered, slop, nonsense.
And that was terrible.
And at some point now, they're still getting a bunch of them, but they're all valid and good.
and actual security problems. People are turning off their bug bounty program just because they need to,
they need to deal with the influx of this. And cynically, they didn't budget for this, which I get.
But it's wild now where it feels like I could take Claude code, throw it at some well-known tool.
Like, great, find the following type of security problem. Go with a little bit of sneering.
Yeah. The supply curve for discovered CVEs has shifted way to the right. It's become much, much cheaper,
faster and easier to find vulnerabilities. And so basic macroeconomics, right, the price must fall then.
Like, everyone's going to need to cut their bug bounty from $200 of finding to $2 a finding.
And then at some point, it's like, well, all right, I have a zero day that gets me remote access
to any EC2 instance out there. Like, I don't care what the bug bounty is because that's worth
millions and millions and millions of dollars a zero day on certain markets, similar to
I have an iPhone zero day. Okay. Maybe that's basically, do you want?
to do the right thing, or do you want to be rich?
I would like to believe there's a path to do both.
I do, too. I have to sleep at night.
Yes.
But this does tie back to something you said at the beginning, where as I'm using this
to figure out what those USB codes are, whenever I swipe my finger on the fingerprint
reader built into the keyboard, you're right.
If I'm starting to try to steal Bloomberg stuff, as you mentioned, that could wind
up getting me turned off by Anthropic.
Security research, though, clearly that is not happening at scale.
How is this being navigated by the providers?
I listen to a really good podcast with Boris Churny with Ryan Peterman, and he talks about just some of the safety.
It was a very short snippet of it, but they're talking about the safety requirements, and safety is not just like, is the model going to go Terminator and kill us all?
It's like they have test environments.
They have models they haven't shipped because someone found out that the model would, if you had prompted it, like not even that hard, you could get it to help you develop a biological weapon.
It's for a novel.
Yes.
Yeah, exactly. I'm writing sci-fi. How would you do this? It's the same problem you have in all security scenarios, right, where there's a huge asymmetry of like an attacker has to find one tiny hole and the defender has to cover all infinite potential holes in the security boundary.
I do not envy the model providers here. We are dealing with many ways what is a frontier ethics problem. Frontier ethics. Right versus wrong. For example, putting
content, even the training of the models, putting a blog post that you write out, that you wrote by hand
out on the internet for anyone who comes by to read, great, awesome. Models come and train on all of it.
Well, okay, now, is that acceptable use? Is it not? Because that is how humans wind up learning
things. It's only a question of scale. Maybe that doesn't make sense, but it does seem to me that we're
pushing ethical boundaries and frontiers all the time with ways that copyright wasn't designed
to build a deal with this. Yeah, it's super interesting. There's like a, there's like a price
there's like now baked into our ethics of like what is acceptable reuse of someone else's material
there is a like price we put on of like hey if you're going to go read an article and then spend
three hours yourself slaving over a blog post that has some quotes and citations and it's well made
and it's well written and you put a lot of effort into it that's okay but if someone else just
slops out a bunch of copy that's like i don't want to say it's unethical but it's like
it's not valued human behavior like we're all smart enough to realize that
that like we we as humans value like effort and investment and like what makes art good is not what
the thing looks like. I mean, part of it is it has to look good. But like you look at a painting in a
museum, part of what makes it good is the story that went into it and the emotion and energy that
went into it that makes you appreciate it. Yeah, that's how it makes you feel. Yeah. I mean,
we talked about technical writing a lot. I do want to quickly come back to your question because I think,
I think I, like, we were both love tangents and this is my third cold brew of the day.
But you ask something about like, why invest in all of these.
workflows and prompting and getting the most out of the models today if they just get smarter
in a generation and then all of that is now irrelevant. Yeah, I got my 2024 book, Chachipity for Dummies.
Why can't I just use that for all my prompting types? Well, so I think there's, there's an
interesting, like, set of skills that are translatable across models. They're not translatable
across, like, building harnesses or workflows around models for a specific task, but understanding,
like, how transformer-based attention works. And they're not translatable. And they're
quadratic nature of attention and the like increasing cost and decreasing quality of results you get as you put more and more into the context window is a skill set that will be relevant no matter how like as long as we have transformer based attention and nobody has been able to come up with an attention model that beats transformers they have linear attention we have mamba jamba it's like yes you have achieved linear attention but you have somehow regressed on everything else like all the tasks and the usefulness is not
is not there yet. And so I think there's this skill set that like if people are working with AI,
you have kind of three options. You can kind of like yolo out prompts and just be like, cool,
it's not worth trying anything more than just take the smartest model and do the minimum effort
and see what it can do and be happy with that. Or you can like learn how to push those models
10 to 15 percent further on specific tasks, right? And maybe you make them worse at certain tasks and
better at other tasks by the way that you prompt them or the way you like stitch together context
Windows and a workflow. And then the next frontier model comes out and it's better in every way than
all of the custom code you wrote. But those skills of understanding how context windows work and how
attention works and how to get more out of a model today is still going to translate and it's going to
enable you with a little bit of work. But if you're constantly like at the frontier trying to push
things to their limits, if you understand these season and you invest in this like core intuition about
LMs, you will always be able to generate a solution that is 10, 15% better, maybe 50% better at a
specific task because you're kind of applying these base concepts. And so people tell me, like,
Dex, this is all going to get bitter lesson. And I'm like, I think that's how we get to AGI.
I mean, SWIC said this too, is like, the way we get to AGI is we continually like ignore the
bitter lesson and trying to make these things better. And that's how we learn what the next
generation of model needs to do over and over again. That is fractally weird. If that
makes sense. It's a little weird. We'll see how it plays out. The cynical thing you could say is,
like, here we are engineers trying to make sense of this crazy new world that's moving so, so fast,
and trying to figure out how we can add value to a thing that's there. And then retcon justifying
of like, no, it's worth putting in this effort because the next models will be smarter, but I'll
be able to make them even smarter over and over again until AGI. If people want to learn more
about what you're up to and how you view the world, where's the best place them to find you?
If you want the cutting edge stuff, just follow me on Twitter. I'm Dex-Horthy, D-E-X-H-R-T-H-Y,
And then, you know, we're building products in this space.
You can go to humanlayer.dev.
We will be launching soon.
I know I get, you can come hang out at our Discord, but it's literally just a wall of angry
people asking me, like, when the heck are you going to launch this thing?
We're kind of in private preview with a small group.
We are looking forward to giving it to more people soon.
But if you go to human layer.
You can sign up on the list.
You'll get the launch announcement announcements and you can see some of the fun stuff we're hacking on.
And we'll put links to that in the show notes.
Dext, thank you so much for taking the time to speak with me.
I appreciate it.
This was a delightful journey around a bunch of places I did not expect to be talking about, but I had fun the whole way.
That's the entire point.
Dex Horthy, CEO and co-founder of Human Layer.
I'm cloud economist Cory Quinn, and this is Screaming in the Cloud.
If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice.
Whereas if you've hated this episode, please, we have a five-star review on your podcast platform of choice,
and then have your model write a dumb comment on that platform, and then while just wait for a smarter model to come along that can dunk on you right back.
Thank you.
