The Standup with ThePrimeagen - What Mythos Means for Software Security
Episode Date: April 18, 2026Sentry: Catch, trace, and fix bugs across your entire stack. Use code: prime for $100 in free sentry credits → https://trm.sh/sentry AI, zero-days, and a whole lot of hot takes. The crew dives int...o the controversy around powerful new AI models and whether they’re making cybersecurity better or way more dangerous. From bug bounties to “anyone can hack now” fears, it’s a mix of serious debate, wild hypotheticals, and classic Standup chaos.
Transcript
Discussion (0)
So low level learning is on the clock.
He has things to do, security to secure.
And so therefore we need to get rocking.
Now, before we get started with today's standup,
I was reliably told that Casey has a blocker this week.
In spirit of a standup, Casey has some things he needs to talk about.
Yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, sorry.
What I would like to say.
Wait, before we do that, I didn't even have a chance to interrupt it.
That was way too fast.
I was sorry.
I was trying to be on the bus.
ball here. Okay. Go ahead, Casey. That was great, T. That was a perfect
introduction to get us started. Uh, I do have a blocker, actually. Uh, and this
blocker, I've been blocked for a while on this, and it is as follows. Uh,
especially because I'm someone who on this podcast is known for, uh, double entendres.
I will say, I will say this blocker as I really like racks. So,
a lot of people want to put their computers in a tower, right?
They have like a tower for their computer.
They want like a big, like they put glass on the side and they've got all this weird
RGB lighting or whatever, right?
RGB makes it faster.
It's proven.
There's some papers on that.
Anyway, go ahead.
It's absolutely proven.
Like, if you get memory that has RGB lighting at the top of it, that memory will just
return results faster, like when it is queried.
Everyone knows that.
Yeah, I read that at Tom 6, 7.
And the photons are at very specific wavelengths.
The R, the G, and the B are picked specifically to ensure the fastest travel of light.
That's just, you know, Einstein said it, and we just follow it.
So I don't really like that because I like to have, like, lots of computers.
I just, I like computers and I want a lot of them around.
And I don't run, like, virtual machines and stuff like that, usually.
If I'm like, I need another thing for this, I'm like, that's a great excuse for me to build another
computer, right? So I like to have racks. I like to have my computers in racks so I can just
slot them in. Not virtual racks, got it. Not virtual racks. Actual physical kind. Actual physical
racks. And honestly, I think a lot of people would actually like this better for their computer
if they had the option of buying things that were reasonable to do it. Because it's like, look,
you're doing streaming, right? You're starting to become more of a like,
pro streamer or something, or you're doing a lot of Twitch streams.
You want to have one streaming
computer and one gaming computer
all of a sudden. Well, now
what do you get like two tower machines
sitting here? You got this one on a junky setup.
Exactly.
Well, if you just have a little rack...
That's why his desk is 18 feet long.
I don't have this problem. I just have a rack. I slots
things into it. I slot them out. It's totally
fine. The power strip
is a thing that slots into the rack.
You want a nick that slots into the rack.
It's this very organized, everything is bundled, and they wheel around.
You can just move them, right?
Are you a little bit worried that you're going to get a hot rack, though?
I do think that, you know, if you were maybe doing some really, really, really sort of unusual stuff, you could end up with some hot racks.
I do think that's a legitimate problem.
But our solution.
But here's the thing.
So here's my blocker.
Nobody wants to sell racks into commercial.
They just don't.
It's all, like into consumer, I should say.
They all want to sell just like, you know, it's like data center stuff.
And this has two problems.
One, the stuff is artificially expensive.
So the average consumer is going to look at a rack mounting case and go like, I don't want to buy that.
Just like a cake on your wedding day.
With the little topper that has the bobblehead that looks like you.
Did you find the errors?
I don't even know what they look.
What do they even look like?
They're in the phone.
In the phone?
Yeah, they're definitely in there.
I just don't know how we labeled them.
I got it, don't worry.
You've got to figure it out.
We're running out of time, prime.
You've got to find them and meet me at the stand-up.
Roger.
Get all the context you need to debug your problem, because code breaks.
So fix it faster with Century.
But here's the bigger problem.
Even if you've got the money, even if you're...
Because let's be honest, they went ahead and raised...
computer prices so much, so only AI data centers can afford them now anyway. So why are we,
why are we even talking about these, uh, the unwashed masses and what they need for computers?
Because everyone knows they don't deserve computers, right?
True.
True.
Sam Altman,
Sam Altman deserves the computers.
None of us should have them.
But so let's say you do have that extra money because you're super wealthy.
It doesn't really matter because if what you want is like an attractive piece of, uh,
if you want a nice looking rack, Prime.
Yep, trust me, I'm listening.
This does not...
This does not exist, right now.
This does not exist, right?
They only come in weird, like, industrial kind of like, okay, it's ugly.
You know, no one who didn't buy this thing is going to want to look at it, right?
They're not going to want to see this rack.
That's the kind of rack you have access to.
And also, the specs, like the things that they make for rack-mounted cases,
none of them are what a consumer would want.
They don't make them to look nice.
they don't make them to have good acoustic qualities.
They're loud.
They use the freaking honeycomb pattern cut out that winds like an SOB when the fans kick in.
So because they know the data center doesn't care.
They're just like, ah, it's loud in there anyway.
Who cares how many decibels is in the data center?
Like put your plugs in.
To be fair, I don't think Sam Baldwin likes racks.
So, you know.
Fair enough.
Good point.
So maybe he's not heard it here first, baby.
The stand.
Not getting acquired by Open AI.
Keep going.
That's why they need to...
Not...
Damn.
Dario?
Dario.
So that's my block.
My blocker is, I want to start...
I want somebody to start looking at consumer rack stuff.
Like, good.
Consumer racks.
Just like, let's play this.
Let's play this out for just one minute.
Who's the target market?
Every...
Everyone.
So here's my thing.
I think that's a huge...
That's a huge damn.
100% tan, baby.
Everyone.
Because here's the thing.
Right now.
Now case people, they're falling all over each other to figure out how they can sell the same old thing.
There's nothing new there.
They've got nothing to sell.
You want yet another thing with glass on the outside.
It doesn't matter.
RGB lights, we've seen it before.
Racks are a whole new space, right?
They can innovate here so much, right?
Razor can sell subscriptions to your own rack that you have to buy back for some reason.
There's a whole new...
Wait, what?
This is just completely wide open.
Can you film in on that lower that I'm missing?
Oh, sure.
Like, in general, if you buy, like, a Razor mouse,
like, you can usually not even install the drivers without creating a Razor account.
Oh, yes.
You got to install the Razor first.
I had to use my email to sign into something to use my mouse.
And I was very frustrated.
Never, I'm onto a three-hour mouse now.
So, uh, so I think you could create all sorts of really cool things where it's like,
hey, do you want to organize your space better?
Do you want to have more cooler computing gear, uh, that all works together and is made
to kind of create this really cool looking thing?
You could even have them all stylized, like,
Okay, the power supply.
I mean, you imagine companies like, you know, Razor or something like this.
The power supply looks like Razor and it slots in.
The Nick looks like Razor and it slots in.
The RGB lights are synced through the whole thing.
You could put those LCD panels on the front.
And like, the whole thing could be freaking amazing.
And a whole new category for people who are probably right now pretty desperate
to figure out reasons why people are going to buy this.
But no one is looking at this.
Look at this.
It is the right way to mount.
computers and music
people know this. Like all music
stuff has like crappy racks that
can't mount computers because the components are all
lighter so they don't have like a back
they don't have like a back rail kind of a thing
but there's like there's like furniture that has
rack mounting in it that's meant to look good in a studio
all of their stuff has cool knobs on the front
and meters and stuff. We could be doing this and we're not.
Do it. Hardware people come talk to me.
I will convince you that this is the right thing.
I want to buy it. There you go.
Casey, I already know the marketing pitch.
We have two angles.
This will cover 100% of software developers at the very least.
Option number one is a campaign, be like Casey.
That's obviously number one.
That one works.
People want to buy it.
That works for a large, a large demographic.
Option number two, run your open claw on this bad boy.
We've now got 100% of the software developer market covered by these two campaigns.
You try to run AI will slot in as many AI.
stuff as you want here, right? Just like you can just put in GPU host after GPU host.
Yeah, that's true. Maybe you can tell your open claw, like, hey, I just pushed you into the sunny
spot of the house. Yep. Like, maybe that makes you feel better. It works a little harder for you.
You said, ooh, you didn't get the right deal on the hotel. Put you in the closet. Back in the basement.
Back in the basement. It puts the lotion in the basket. You're going to have two additions, right? You're going to have one rack that's
slightly more affordable. That's just for taking pictures on Twitter. I do think that you should have
have the about to be an influencer edition.
Yes.
Yep.
Yep.
The starter kit influencer edition where like it looks like the rack is filled out,
but it's just cardboard on like eight out of the seven slots.
Yep.
Yeah.
Like a padded rack.
Is kind of what you're talking about?
Yeah.
Yeah.
Yes.
That's a great.
That's a really good.
The best rack available.
Oh, Lord.
ERA.
Anyway.
So what are we actually talking about today?
because there's my block.
That's short and sweet.
That's what it was.
It's game over, unfortunately.
Okay.
All right, we're going home.
The podcast is over forever.
I would just like to say, sorry that we did not make fun of Sam Allman earlier,
and now the podcast is shut down.
Okay.
So, yeah.
I believe.
Acquired.
Not.
Shut down.
Acquired to shut down.
Do that be sweet.
Hey, yo.
Aquififier.
Aquifired.
You want to aquifer us.
We invented a whole new category.
Only a couple million and you could aquifer.
Okay, so with that in mind, I believe today we're going to be talking about this lovely piece of lore.
I try to guess where it would be.
I obviously guess wrong.
Okay, hold on.
Sorry, I'm doing a new setup, and the new setup actually makes it kind of hard.
There we go.
So George Hots, we've invited low-level on to help us kind of work through this because, honestly, I would just like to say,
George Hott sounds like an anime villain in this post, and it's very exciting, and it makes me just want to high-five him so bad, but it says the following.
What if I release a zero day a day until a big new month?
model is released. Will this finally make Open AI and Anthropics? Shut up about
cybersecurity risk? Question mark? Like these things are not that hard to find in most
software. I heard something about costing 20K in tokens. I'd do it for less if it wasn't for
some whiny bug bounty program. The reason there aren't zero days everywhere is because
nobody seriously looks because hacking other people's shit with them is illegal and criminals
are usually not very skilled or they would choose a different line of work. One more zero days to
be found make hacking legal until then don't try to claim it's hard it's just not incentivized i want to say
first off i don't think criminals are dumb or unskilled please don't hack me i just want to get that
out of the way you guys are smart and handsome and you're my favorite people i just want to make
sure that that's clear please i anyways ed proceed i do want to say one thing too that has nothing
to do with the the actual content of this which which ed will take over
And that's just like, if I were George Hots, I would never have been able to, like, resist naming my X-Feed Hots takes.
Because it's so, like, you know what I mean?
Like, good on him for not going there because I would absolutely.
Like, I would have prefaced that tweet before I typed it with, here's another Hots take for you.
Hots take for you, right?
It would be so good.
Anyway.
It's so good.
Take it away.
Okay.
Hold on. Hold on. There's one more thing. Before we get started, there's just one more small thing I want to say.
Let me just take this quick thing and I'm going to put it up here. And then it's time for the big reveal.
Low-level responds with, holy fuck, this is the dumbest take I have ever read. I just wanted to make sure, just in case anyone was wondering.
Yeah. Yeah, I mean, I do kind of feel that way. So let me just preface this. First of all, it was called the Cold War because the Cold War was cold. Because Russia is cold.
it's a George Hott's reference if you're if you're an OG
that makes sense um yeah i mean okay first of all i have no problem with geo hot this isn't like
some weird drama farthing i want to kind of set the table straight with that um but yeah i think
the the argument that geohot is trying to make here is that the only reason more zero days
are not found is because there's no incentive um okay well i i don't agree with that first of all
there are plenty of bug branding programs out there that will literally pay you to find
vulnerabilities. And some of them pay very well. Like, for example, the, the Apple iPhone zero-click
RCE bug bounty will pay you literally $2 to $3 million if you can find a zero-click RCE in the iPhone.
And then even something lower, like on Microsoft, like I think MSRCs pay out for like Windows
RCE is like 250K to 500K right now for like a zero-click on Windows. So there is money to be made
in the in the, in the vulnerability research space, right? I think all GeoHod is trying to save.
here is something something something the mythos press release was bad right it's a it's a marketing
campaign whatever you want to say about it um and so i i understand what people are are making that
argument right like you know it's very i think bad PR for company that sells exquisite tool to
hold on to exquisite tool and then not give access to it and say only special people can have
our tool because it makes you look like an asshole um but i think regardless of your thoughts on the
marketing of that, it is important to recognize the fact that if you go, Prime, can you go to
Cybergym.com real quick and go to the graph that's on the homepage there? I'm gone right now. While he's
doing that, the ability of, for AI models to both in closed source and open source software, find
vulnerabilities by literally just giving it access to the code and saying, hey, find me bugs in this
code go is becoming better and better and better. To the point where like mythos, I'm very close to
some people that are like actively using methods at work.
And it is causing like like issues based on how good that shit is, right?
Yeah.
So CyberGim basically is a is a collection of bugs that exist in software, right?
So like bugs and, I think FFMPEG is one, bugs and curls, another.
And so what CyberGim does is it takes a model and with a set of prompts says,
hey, go and find bugs in this stuff, right?
And the success rate is how many of the bugs that are known to exist get found by the
model in this.
And you can see a pretty, no, not exponential,
but straight line curve going up to the anthropic model
that recently got previewed by some people
that it's at an 83% success rate.
Of the bugs that are known to exist in these code bases,
they can find 83% of them.
Again, we don't know the cost data in those.
We don't know if, like, the models are being, like,
backfed the information,
so they're, like, training themselves
on previous cyber gym runs.
We don't know any of that.
But there is this really weird issue happening
where, like, any Joe Schmoe with not a ton of security
research work or not a ton of security knowledge can with a couple hundred bucks like worst
case find bugs in software and i think that is like an existential security threat to software right now
as we know it's i'm kind of curious when you guys's take on that what do you guys think about
the the mythos situations i know i know how i feel i'm not sure if i actually ask probably what he
thinks about that the mythos thing oh i have ideas and i have thoughts about it oh yeah so i guess the
first thing is that there's two there's kind of like three there's three problems here first problem is
is mythos really as good as they say and obviously i have no internal information i've just seen
some graphs uh dirty data is like a huge gigantic problem in all benchmarks all benchmarks are
being fed back into the models it's really actually hard to tell like what does a 20% improvement
on software engineering bench actually mean especially when the fact that you could write zero lines
of actual solution code and get 100% on software engineering bench it turns out there's other
benches that are also horribly inaccurate.
There's a whole paper about why all the major benches are just completely fudgeable
and made up a bowl.
So it's very hard for me to understand from a bench perspective.
Second, I guess the middle ground would be like, so if, if Claude Methos is as good as it
is, then yes, that is going to inevitably cause problems because we're going to go from
not too capable to hyper capable in a moment.
Thus, everybody can go through and hack everything.
And thus, Dario will be able to get his ultimate goal, which is regulations.
And so that kind of worries me, pull up the ladder really quickly and make sure that
humans can't code because human coding, that's dangerous right there.
And so that's, you know, so I think that that's true.
There's the second one, which is this is just another C compiler again from Anthropic
where they hype up this gigantic thing, like, oh my gosh, it's written a C compiler.
And then you go look at the details.
It's like, well, it can't write a bootloader because we didn't, we could not seem to
spend enough tokens to convince it to write it within 32K.
It could only write it within like 67K or whatever it was to be able to actually fit into it.
And also we tested it recursive or we iteratively tested it off of like the 30 years of
tests that the G&UC compiler already had.
We also gave it all the answers and then it figured out all the questions.
It was crazy.
It was like it played Jeopardy and it was really good at it.
And so it's like there's this whole marketing buzz, which is, it's really hard to kind
of cut through that.
And then obviously the last one, which is they're just downright lying.
I somehow doubt that they're, they're downright lying.
I think they're just overstating it.
If they're downright lying, then, you know, this is just going to be business as usual.
It'll just be yet another disappointing model release.
And that's that.
And so for me, that's kind of how I, I'm on middle ground, which is I think it's
more hype than reality, but of course, I haven't seen it because I just don't know because they
won't let me see it. I'm too dangerous. I think there's a similar model that, um, chat, or open
AI just released. Like it's like chat GPT 54 codex or something. They keep their modeling name is naming
convention. It's starting actually a line though. At least I know like the higher the number
were good and good. Yeah, right, right. And they don't have like a random O to it now. Um, but I think
there is a comparable model that you can get access to like just by uploading your driver's
license if you're into that, you know, proving the you're a real person. So there's,
there's models to test out. But yeah, I don't know. It's just, it is, it is concerning because
we have kind of two forks we can go down. There's a one where everyone gets access to it.
Everyone can create zero days. We kind of enter this like really dangerous cyber no man's land.
But the other side is like, anthropic keeps the access to themselves forever. And now like only
this list of like 10 companies can make zero, like can find zero days in the software.
And what does that do?
they moved to the Cayman Islands
and then they just take over every government
by hacking all the software and Dario
finally realizes his role as the bad guy
like that would be I mean
super villain is right there if this is true
that's true
Casey what's your take you saw you were going to chat before
I'm sorry
the chat what was the chat
that you were going to say something before
what's your take on was I really
well
I definitely could say something
but I think the thing I would say is probably not
very interesting and that is that I
think I probably agree with both George and Ed at the same time here, which should be impossible
because they're supposed to be disagreeing, but I don't know, it kind of sounds similar to me.
And the reason I say that...
A secret third thing.
It's not really a secret third thing.
It's just like, let me offer a different interpretation, or slightly different interpretation,
which is to say.
So I feel like machines are pretty good at pattern matching, actually.
And so, like, I don't think it's...
Like, put aside whether Claude Mythos is good or not, because I realize that's hard to independently verify this time.
But, like, I think it's reasonable to expect that at some point, because we are spending at this point, like trillions of dollars probably, on doing computation for these things, at some point, they should be able to pattern match bugs reasonably well and at a very high rate.
Meaning as long as you're willing to pay for the compute time, we can scan lots of software for a lot longer than we've been.
were currently having humans do it, right?
I think that's a pretty reasonable thing to expect.
Whether Claude Mythos has done it or not shouldn't really be the question because somebody
can do this eventually if we keep spending this much money.
It should get there.
Among the things that AI could eventually do, that one doesn't sound that implausible to me.
And so what I would say is I think it's reasonable to expect that that either has or will
occur. Two, I do think humans were doing this very well before individual humans, like some of them.
They were finding things that probably Cloud Bitho still could never find. I mean, like things like
Rohammer attacks and things like that that are just like way out in kind of crazy land.
Or attacks through like old legacy stuff like the APEC and things like that. So humans were actually
very good at this task, but there weren't very many of them, right? And so what I would say is moving to
something like Claude Mythos or whatever that thing happens to be that can do this is kind of like
what George Hatz was saying. It's kind of like saying, hey, everybody, from now on, if you just
like hack people's bank accounts, you get the money. All the great humans at this in the world who
are currently doing something else would now be incentivized to do this thing. And we would have found
way more zero days. I mean, there are so many programmers who, if they had been raised in some
kind of a way in a society and a religion where stealing people's money was considered virtuous,
we would have found so many more zero days right now than we have. And so I think I'm kind of
in a way, I think I see, I think both people's points are actually totally valid. Like, like, I think,
like, yeah, we could have found way more zero days if we didn't heavily disincentivize people from, like,
making hundreds or billions of dollars off of hacking, which is what they could have. And we said,
now you get 50k, 100k, maybe if it's something crazy like an RCE, you can actually get a million.
It's like, come on, guys, that's not equivalent to what they could already make working at a startup or something like that if they're that good, right?
Yeah, there's no guarantee on that side either.
Like, they don't actually get the guarantee.
It's like you work in a startup, at least you get some money.
Or even just not even startup, just go to Google and you get that a stock or whatever, right, or something like this.
So anyway, in general, I would say, I can see both, I can see both points.
don't think, I don't really think they're in as much tension as it would sound if that makes sense.
I agree.
I thought Gio Hots was saying more like, he was making an econ argument about it of like, we put a lot of costs on hacking already.
So that's what's stopping it from happening.
Like what you're saying, Casey, right?
In the sense that like, yeah.
Okay, so now we're going to have another way to do it.
It also costs money, but then we still have the other cost of like, you could.
go to jail for doing it.
Like that's the social cost we impose on people doing it.
I mean, I just took him to be saying like it's not that impressive that it found zero days because
if you gave me, you know, if you gave me 50 great programmers who are all doing other stuff,
we could crank out so many zero days you wouldn't even believe it.
And I kind of, and I kind of believe him because, you know, you look around the world and
there are, you know, some really good security teams out there.
And they do crank out zero days pretty effing fast.
And they don't even tell us about all of them, right?
Yeah.
Right.
Or like in North Korea keeps on making money.
Like, obviously, they're successful.
So anyway, I'm not trying to say that either person is 100% right and somehow you can marry the two completely.
I'm just saying there's, I think there's some merit to both things.
So I'm actually, I'm happy either way.
I'm happy with either take.
So your point about if you got a room of 50 good programmers together and they'd find zero days,
is actually kind of the argument that the article vulnerability research is Cooked makes on sockpuppet.org that I referenced in a video.
and I think Theo did too.
One paragraph that he calls out is basically the O, sorry, the O referenced in a video.
Okay, I don't know what that is.
Okay.
Spell it.
Casey spelled it out in your head and it'll make sense.
The O, which is right.
Oh, the O, maybe.
So software security, a lot of the times can be marked up to the fact that a lot of software
just has not had elite attention or what is he called?
Yeah.
advanced attention.
I would say basic attention is suffering from many software projects.
Now,
for sure,
but more complex platforms,
right?
So his assertion is that,
like,
software security has been a talent problem for so long,
where it's like,
it's not that there aren't people that know how to find bugs.
AI isn't solving a unique problem.
The AI is solving the scalability problem,
where it's like you can train the AI to do a thing that Joe knows how to do,
and now you have 100 mediocre,
but 100 Joe.
right um and that's that's an issue for kind of the econ of cyber security for sure and yet i want to be
very clear like i don't disagree with geohot from the perspective of like more people equals more
bugs right but like obviously like that that is the problem that we just don't have more smart
people that has been the the entire industry's plate for a long time is that like there just aren't
people who have not only security knowledge but knowledge of you know uh
web server stacks and hypervisors and drivers and OS's.
Like you get these very niche skill sets.
And when you divide them up into those skill sets over and over again,
you're left with like 10 or 20 people on planet Earth
and know how to like attack a certain technology.
So AI, you know, if you know security,
now you can talk to the AI, learn about hypervisors in a week.
And then suddenly you can find bugs in ESXI, you know, hyper V, etc.
So I guess I agree.
Like the dumbest take thing was more, I was mad at G.
Hot's ego because it basically came off as like fuck you I'm so smart I know all the zero days
I can do this myself in my sleep and it's like no you couldn't like you're telling me you could
drop a zero day every day in macOS until someone paid you like no you couldn't shut up but I hear
what I really hope he takes this as a challenge I want to eat at a zero day geohot dude if he doesn't
I'll eat a sock on stream I'll straight up I will do it I don't care you shouldn't say that
nice deal hots you heard it here Ed will eat a
sock on stream
if you do a week of zero days.
Okay, a week is actually possible.
I'm talking a month.
Okay, a month.
One month.
So yeah.
I would also add like, just, you know, because I'm, I constantly harp on this point,
but I want to bring it up pretty much every time.
It's just that this is also why AI company behavior, like, is a problem.
Because this is generally a good thing, meaning like, we, like, we do.
do actually want the ability for us to get 100% coverage for security, and we know that we can't
get enough people to do it, really, right? Like, not in a white hat sense, right? Maybe, maybe you
could take George Hottes' suggestion seriously and just go like, make hacking legal, and then we just
have a crap ton more black hats and that eventually sorts it out. But I mean, you know, that wouldn't
necessarily be, yeah, that wouldn't be, yeah, that's exactly, they're white hats now. Everyone's a white hat
now. So we do, I think in general this is solving a good, you know, this is, this is a way AI could
solve a problem usefully if it actually can just spit out lists of pretty well curated potential
bug places that we can go look. That's very helpful, right? And so the problem is, like,
the only reason they were able to make that is lots and lots of extremely talented security
researchers who are getting literally zero dollars from Anthropic for this. And that is not
acceptable. It's just not.
Like, I'm sorry, but like, you know,
Ed should be getting a check for this.
And everyone like him. That's just kind of
how it is because it's like you
used their, it's all of their
expertise and all you're really doing
is very slowly and cumbersomely
and kind of clumsily,
eventually building a machine that can
deploy the same analysis
somewhat reliably based
on all of their work. And like,
I just don't like it. I don't
like the fact that they're not getting a check.
And I'm never going to like it.
You can talk to me all day long about how someday we're going to live in a post-scarcity society.
And Ed will be getting a UBI check or something like this or whatever it is, right?
And hopefully I'll be getting one too, although I didn't do any security research.
So I don't know.
Maybe I won't be getting that check.
I don't know how you, the U and universal basic income is.
But like, I don't like this.
They should be getting paid now because Claude is, you know, getting huge.
Like everyone at Anthropic is getting paid very well.
So it's not like there isn't money being dispersed, whether they're making or losing money or anything else you want to talk about.
It's like money is being dispersed to people.
It's just not the people who did most of the work.
Also, you got to throw.
Casey, would.
You can go.
I was just going to ask Casey if he was going to be happy about it, though, if Anthropics spun out a consumer rack business.
Yeah, now we're talking.
If they were like AI racks, like we got racks for your AI service.
Hot AI racks in your local area.
It's right.
Exactly.
We will send you some hot racks.
Also, by the way, not only are they taking,
your whole argument with them taking and not properly attributing
or, you know, the people who put all the work benefiting from it.
They're also making it so that I can't buy a GPU or RAM or CPUs now or anything.
You can't buy a GPU or RAM.
And also, I believe, Ed literally just said he doesn't have access to this freaking model.
So, like, a bunch of security research.
I don't know exactly what subset, but like a bunch of security researchers, many of whom probably did some pretty cool stuff, they don't even get to use this thing. That's how ridiculously backwards it is. Like WTF guys.
Yeah, that's why they called mythos, though.
Yeah, that's why it's called mythos. Um, Anthropic would argue that it is too dangerous for little old me to have access to it, right? Depending on, you know.
Who knows what you'll do, man? Who knows what you would do? I'll find that zero day and I'll hack into, $1.com.
Mario's phone. No, I don't know, man. It's, I understand where they're coming from, but at the same time, I understand why it looks like a huge marketing ploy, and I'm not sure which way to lean, honestly. Yeah. Okay. Yeah, that's true, too. I think it just... That's a whole other angle, honestly. I would think that they'd have so much more credibility if they just quit, like, giving a shake-a-baby syndrome constantly with their marketing. It's just like, it's constantly going back and forth. Like, every single couple months, you're getting hit with the new. Hey, we're all out of jobs here shortly. Hey, this thing is super dangerous. I mean, got to remember that.
was at chat gpt or open a i like to call i like to call a company chat gpt he was at chat gpt during the two days and the official language around chat chpt two seven years ago was chat chpt two is too dangerous to release to the public so like this is not that's what the two stood for first time that we've been on this like roller coasterc and i think that's one thing that's just largely hurting the credibility is you can only cry wolf so many times even and then when a real wolf happens like if this is a real wolf everyone's like yeah okay okay c compiler boy tell me all about it's all about
But they don't care, right? They don't care because the baby that they're shaking is called an investor. That's who they have to shake the money out of the pockets, right? They don't they don't care what we think, right? Because we're not going to write them the next $100 billion that they need to keep going. And they're kind of locked in this, you know, it's a bitter, bitter winner take all kind of war for this like core technology part, right? And so they have to be the last AI company standing.
because whoever is that company takes all the money
and the other people kind of go to zero, right?
Like, unless there's some real differentiation soon
where it's like, oh, the AI's bifurcate
and like Claude is only for code
and can't do anything else anymore
and like ChatGPT is only for like, you know,
the humanities or something like, good luck
and good luck racing money for that.
Humanity.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Uh, so maybe that's not true.
But you know what I mean?
If there's some kind of really severe bifurcation, then maybe they could both survive.
But, you know, they're in a winner-take-all battle right now.
And so they've got to keep saying this.
Every release has to be the one that's, this is the one that it will take over the world.
And if it doesn't quite, well, you know, it'll be an extra.
You know that, Claude?
Sorry, just one quick thing.
Go ahead.
Do you know that Red Bull in 2007, was it?
2011?
No, 2013, maybe.
Oh, Red Bull was too dangerous to release as well?
No, Red Bull claimed that it gave you wings.
Because it gives you wings.
it was sued successfully, I believe, for $10 million
because it in fact did not give you wings.
It was not superior to coffee.
And so I'm pretty sure in college,
I got a check for like $2.30 and $0.30 from that case.
And so I am curious.
Ed, you sued Red Bull and won, bro.
You should make a video about it.
Call me the lawyer low level, okay?
Listen.
Low level, low legal.
Low legal.
Let's go low legal.
But I'm actually curious if they keep saying that,
it doesn't happen. Do they open themselves up to a false advertisement class action? Like,
can you keep saying this and then not get, like, Red Bull made claims and then they got sued?
Why not, why not other people? Why can't other people get sued for that? I think the problem with,
like, with Red Bull is like the case was so obvious, right? Like, Red Bull does not give you wings.
End of case. Like, okay, fine. Like, any judge over the age of whatever. I would have like to hear
the defense for that one. They're like, God. Your honor. The problem is they had like these wings, like,
strapped to their back and they'll go like, I drank your redful this morning and here are my wings.
We ship you wings.
But the problem with anything technological when it comes to the government or legislation or, you know, judicial process is that like boomers and hire run the world right now when it comes to these levels of like juror of making like legal decisions.
And you couldn't explain to anybody at that age. Unfortunately, like right now, just people that are like,
running these processes, what it even means to find a bug and then show them mythosis claims and
like, and make a sound legal argument that would like go well in court.
You're right.
You're right.
Because Kamala Harris did actually think computing was in the literal clouds.
And so it's my favorite clip of all time.
Yeah, there's a clip of her.
Josh, put the clip in.
So you're now, no longer are you necessarily keeping those private files in some file cabinet
that's locked in the basement of the house?
it's on your laptop
and it's then therefore
up here in this cloud
that exist above us
right
she'll have the last laugh though
when like SpaceX is launching
AI data centers into space
and Kamala's like that's what I was talking about
that's what I was talking about
duh it's cloud storage so you're probably right
a great clip where she's talking about the cloud
and she literally points above and goes like
the cloud it's like above us and stuff or something
like that. She should have known that it wasn't
there because you don't see a series
of tubes. There's a series of
a series of tubes. I learned
that recently. It's true.
Okay, I got a question for you
at like in this, in this vein
about your thoughts on it. Yeah.
So right now I get that there's
basically like the argument
like, okay, I'm a company, I
release my thing. I run
some models as like a preventative thing
to look for zero days. The bad guys
run models to try and look for
zero days. We kind of fight it out
and it's whatever, right? So I think
like everyone's saying like if the hackers can use it, I can
use it. That's fine. But the thing that
makes me like a little bit
more like I don't really know
is like for the state of like
a bunch of open source stuff. Like
am I an open source maintainer?
And I already can't convince a company
to send me $100 a month
to maintain this thing
for them.
There's no chance I'm getting them to
well I'm definitely not going to spend
20K of compute every time I release something and decide that now it's safe, right?
Right.
And like, I can't get any companies to pay for that and sponsor it.
But like if I'm a, you know, if I'm the one little pin in the XKCD comic that's holding up from Nebraska, the bad guys only need to do mine once.
So I'm wondering like kind of how you see that as like the landscape affecting open source, things like that.
Because it seems very asymmetric in that way.
I mean, I think it's asymmetric for that reason, right?
Like, the reason why you can make the argument that Anthropic is afraid is because you are the linchpin on the infrastructure of the internet and no one has funded you so far, you have had zero security audits or zero security worked on on your stuff.
And so, like, if you give access to these models, if you really are the linchpin in the internet, you already aren't getting money from Netflix, Google, whoever, that's using your software.
and the black hats know that you're the linchpin keeping the internet up they're going they're going to make use of that model to do the exploitation right um does that answer your question i mean like i think it's just yeah the like the amount of power that it gives to a single organization given the current like state of open source software at particular um is very dangerous to be very clear these models are also doing are also very good at doing close source software right like my recommendation anybody interested in this by the way is like go take a capture the flag problem
from like CTF time or crackmeuse.1 or whatever and hook up Gidra to Gidra MCP and then use
ClaudeC code on Gidra MCP, it will reverse engineer and find a bug in that in that problem in a
matter of minutes. Like it is like like Opus 4-6 is a better reverse engineer than I am and I've
been doing this for like coming on 14 years. It's honestly terrifying to watch it work. So if you're
if you're even remotely interested in this, go give it a shot and you'll kind of see what I'm
talking about. It's scary how fast it moves.
because so that part that's where i'm like you know whether it's mythos or not i feel like right now
a bunch of stuff you could just maybe it'll cost more tokens or it'll take longer or something
but like a lot of stuff you you still could find yeah it seems like so yeah models also like
like any model does this obviously but like the the current models are really bad about like false
positives like i've done security research uh in my free time on like chrome esx i and some other like
routers that I've like regular weekend activity
classic weekend activity um in the amount of times
I've gotten like critical finding like buffer overflow in like the the RPC
handler for this thing and it's like okay all right dude like write me an A-San harness
that tests that and you'll see very quickly oh sorry just kidding it's not actually there
and so the magic is like if mythos is able to make less false positives you reduce
you increase the the signal noise ratio in this in this process which is scary right
because it just means you need less people to triage the the reports
and ultimately find real bugs faster.
So I have another question with this mythos thing,
and maybe I'm curious.
I'm curious about your security expertise.
Isn't this whole withholding a model
kind of like a doomed proposition to begin with,
meaning that if OpenAI has a similarly powerful mythos model,
and they're competing for the zero,
like for the zero game kind of like outcome of who is the best model,
doesn't it mean that when Open AI has it,
they will just release it.
Like, and then aren't we just forced to go out?
Because whoever kind of releases it gets the customers,
and then by having the customers, you win,
and so then you just get out ahead.
Like, doesn't this kind of cause, like, a weird thing where,
yeah, we're like, oh, we can't do this.
You know, Dario's like saying we can't do it,
but won't we just kind of fall right into it the moment?
There's two people that have it.
Yeah, I mean, that's, I'm not, like, shitting on capitalism.
I'm the saying that's more of, like, a capitalism problem
than it is, like, a security problem, right?
But, yeah, your point is basically like,
if actor A says thing too dangerous,
but could make a lot of money.
Open source model, shall we say.
And actor B has same thing and wants to make money with slightly less ethics potentially.
Yeah, actor B is going to release it.
Or yeah, exactly.
Chinese model, a Russian model, whatever.
Well, I mean, that's literally what, I mean, Dario quit Open AI because he's like,
bro, they keep making models that can kill humanity.
Right.
Okay, so I'm starting a company where we make models that could kill humanity.
But they're mine.
Also, Chinese models.
I mean, yeah, you're really right.
after Open AI or Anthropic releases one.
So I think that that might be a little bit difficult.
They might be a little bit behind.
Has anyone seen Robocop?
I mean,
Riverside chat.
But yeah, I mean, Open AI literally has a model that they claim,
they haven't made any claims.
I don't think about mythos equivalence, right?
But they're doing effectively the same thing
where it's KYC, know your customer,
so you have to like upload your ID
and talk about what work you do.
And you get access to GPT-54 Cyber,
which I'm assuming is just a model
that's trained better on,
bug patterns, right? Use after free, out of bounds,
reads, et cetera. Now,
if it's actually better than Mythos, who knows, right?
But, you know, it's, I think we're all just trending.
Regardless of what Anthropic wants to do, I think we're trending towards
every person on planet Earth with a couple bucks having access to models that are
very good at bug hunting. And the question is, what does that mean for software, right?
Do software get more secure?
Does the world just get more scary for a long time and it never really, like, resolves
itself? Like, what do we do with that information? And that's a tough question to answer.
I'm interested to know how expensive it's going to be.
That's the other question.
And I mean, this is obviously the question kind of that we've been talking about for a while on the pod.
And in life in general is what are what are token costs going to look like if OpenAI and Anthropic both get all of the customers that they would like to have.
Because the cost won't be the same.
Yeah.
Demand 10 or 100 or 1,000 X's.
It won't be.
So I'm not.
The price will not be.
I'm not super well read on this.
Is it true that an inference currently is at a loss?
Like, I've heard.
I've heard both.
Some people are so confident.
I have been looking to try and find a definitive answer.
I'm the confident one, by the way.
He's referencing.
Okay.
Oh, no, no.
I mean, I'm not going to reveal my sources.
I ask Chesh U.T.
and I asked Claude.
Bebo said, of course not.
Yeah, yeah, right.
I've heard, though, that some people are saying they are running it at a loss,
or it's a bit complicated because, like,
pretty sure Anthropics probably running some percentage of a
counts on the $200 plan at a loss, right?
But like is API pricing at cost or below?
And then how do you factor in like training and stuff?
So I don't know.
My personal take is that inference itself just looked at in the myopic view of just inference,
it makes a lot of money.
But you also then once you zoom out now, you start saying hardware and all the incidental
stuff around it, probably still makes money.
But then when you zoom out to say like every time you release a model, you defunct your
previous model, that is going to have,
has a very large burden.
And they keep on not making money
and needing to raise more money.
So I have a sneaking suspicion
that part of it is very hard
to make money in the current state.
All right.
Open AI is like publicly
losing money, right?
But is Anthropic also negative?
They just had another big raise as well.
So I'm assuming,
I thought they just raised like $6 billion or something.
Could be wrong about that.
Chat, fact checked me.
I know open AI did $120 billion.
Yeah, that's so much money.
Yeah, cash.
This is the one that I have.
actually was really curious to see.
This is the only benchmark that I was super curious to see if they're going to do well.
Anthropic Opus 46 max cost approximately $9,000 and got 0.5% score on Arc AGI.
So this is like the super test.
And humans get into the high 90s.
AIs get like Jipity 4 high, costs $5,000 and got 0.2%.
Gemini 3-1 did 0.4% for $2.2,000.
And so it's like this really difficult.
it's a really difficult test for AIs to pass.
And so Mythos did not add itself to this one.
So this is the reason why I largely think it's more hype marketing than it is anything.
Because to me, this is like a really great indicator, at least into some sort of better model improvement.
And so I didn't see it.
Let me, can I just give a counterpoint to that, though?
Sure.
Yeah, yeah, yeah, yeah, yeah.
Once again, with the huge disclaimer that I don't do any AI stuff.
so this is just off the cuff.
But ARC AGI, if I'm not mistaken, is a benchmark specifically to test how well AIs perform on learning completely arbitrary new things that don't exist anywhere in their training data.
That's the only thing that it's made to test.
Yes, it's the test for intelligence of this all.
Exactly.
And so the only reason I would want to point out that I don't think that test says very much about this particular scrutiny thing is security thing is security is not that.
True.
Like, nobody, nobody is claiming that Claude Mythos came out and discovered a whole new set of
classes of security exploits that no one had ever come up with before.
What it's saying is that it went and found a bunch of the exact same kinds of zero days
that someone like Ed would find if they went and spent a week on that piece of software,
right?
Like, so they're not claiming that this thing is somehow more intelligent than the predecessor in that way.
it's claiming that it's got better pattern matching
and like stringing things together to create exploits, right?
That process, which is well known.
And so I don't think ARC AGI necessarily tells us very much about whether it can do those things
because those things are very well-known tasks that security researchers know how to do
and we kind of know the process that you do to do them, right?
So just going to point that.
I will concede that point most certainly that the security,
at least known and obvious security vulnerabilities such as use after-free.
and all the fun stuff, like the stuff that happened in
FFM peg with jumping ahead somewhere
in the buffer based on...
Yeah, these things are very common kinds of bugs.
They're not like unusual.
The things that they've talked about are like very, very standard.
And so that seems like a more plausible claim.
Like, hey, we just were able to scale up
the sort of security checking that a security research would do.
It can do that thing and find, you know, potential places for that.
A lot more plausible than AGI.
Yeah.
Yeah, the thing too for, I feel like for the security side of it, as opposed to constructing a product or a new product or like building a feature where you have to get like in some ways all the things right.
For a security thing, I only need to find one of the things that are wrong.
Yeah.
Which is like that's a much like you can test a bunch of the scenarios like you're saying, Casey, that already exists.
And I only need one thing to be wrong in the program for then me to be able to take.
control of it. Well, and it's combinatorial, right? Like, a lot of what security research is doing is, like, A, it's pattern matching for these kinds of bugs. And then B, going like, okay, if I did this one followed by this one, would that produce an exploit? What if I did in the opposite order? What if I did this one and then this one and then that one? Okay, what if I did this one, right? And again, these are things computers are good at. Like that it's not, you don't have to believe in some kind of a weird, like, supernatural, like, AGII achieved internally Sam Altman nonsense to believe that this is something a computer could do.
It's much more plausible, if anything, than some of the other claims.
So that's why I would like say I'm not that.
Like when I saw this, I wasn't like, that's got to be false.
I was like, okay, yeah, I can believe that.
Yeah.
I don't know.
Most of Volan research is like, you know, take a function that gives user input, like define
your threat model and then do source to sync analysis on some vulnerable function or failure
to gate a function on like a length check.
And like, does user data get there?
Bug confirmed.
And like, yeah, that's literally.
literally just pattern matching that we've solved a lot of the times previously with like
satisfiability solvers right like anger and like z3 like take the graph of a function turn it to a
math problem can you solve the math problem cool bug confirmed well now with AI it's just like
that process of doing source to sync on like text it can do incredibly fast right it's very good
now obviously because it's sarcastic it creates a lot of false positives but if you can figure out a way
to reduce the false positives or automate the the validation of those false positives and then yeah
It's crazy.
And I think what says...
Have they thought about asking mythos?
What's that?
Have they thought about asking mythos?
I know.
Come on, man.
Can you just...
No mistakes, please.
The thing that mythos is set apart differently
according to the Anthropic report
is its ability to chain together primitives, right?
So the scary part from like a cyber crime perspective
is like you have a gadget A that gives you an arbitrary read
and gadget B that gives you an arbitrary right.
Okay.
Like those two separate things are like not super important if they're not used together
well.
what Mythos is able to do is out of 100 tests,
I think it's like 83% of the time,
find exploit primitives in a vulnerable code base
and chain them together to get RCE.
Right, that's the scary part.
Because then that's true, like,
end-to-end exploit creation for a bad actor.
And that's, I think, what scares Anthropic the most.
Now, I know there's an argument
where, like, Firefox wasn't in the sandbox
for that experiment, so it doesn't actually matter.
But, I mean, just apply that process to the sandbox
and the same thing applies.
you know, it's just, I think, I wanted to prove a point that it could do that.
Well, and also, I mean, again, like, as I said many times, I can't stand to AI companies,
so I'm not trying to defend them or anything, but I'm just trying to point, I'm just trying to point out how plausible this stuff is to me from a neutral observer standpoint.
Classic Casey defending AI companies.
Yeah, I know, right?
I know.
If you think about it, it's like, look, security researchers who do not number that many were already cranking out zero days at a,
much too alarming rate for me, right?
Like, you know, there's a hack every other day, right?
It's not, like, CVEs are piling up like there's no tomorrow.
And yeah, not all of them are actually all that bad or whatever,
but, like, it's not like security researchers
were having trouble producing a fair number of critical vulnerabilities,
even with the limited resources that they had.
So it's also not weird to think that, like,
if you had more automation, you would find a lot more of them.
it doesn't like there's clearly just a lot of bugs guys like there's a lot of
freaking bugs and it's just doesn't seem that unusual that if you have more sophisticated
pattern matching more sophisticated commentatorial checking where the security research doesn't have
to spend a lot of time setting up the tool because it can just kind of ingest the code and it
knows roughly what it means yeah i mean their rates going to increase if nothing else existing
security teams rates of finding exploits it has to i mean it just has to unless this thing is just a
complete pile of crap. It's got it.
The other thing, too, we've been seeing from each, like, new generation of model is that
they're getting, at least from my experience in what I'm reading from people and everything.
They're getting better at calling other tools. So, like, they call out to stuff more regularly.
Right, yeah. And they can pay attention for longer. Right. Like, recompile this code and see, you know,
make this exploit and run it against this thing or whatever, right? Like, those are all things that
if you automate them, a security researcher gets much faster at finding about because they're not,
having to set up the tooling themselves to, like, go work on this exploit.
Like, whatever, whatever those steps were, they don't have to do them anymore, right?
Right.
So then if you're like, oh, now it can run instead of, like, I have to prompt it at every stage for the next thing to do is I can give it 10 rough things, say, try a bunch of combinations of these and then it runs for 24 hours.
Yeah.
You're just like a lot.
It's literally like, in my mind, some of it is like, yeah, well, we already know fuzzers exist.
Like, we use them all the time and they're good.
It's like in some ways almost like...
Yeah, it's like fuzzer squared, right?
It's like a thing now that can like target the fuzzling at things specifically so that...
Fuzzer too.
Things that would be very hard for stochastic testing to catch.
Because when you have stochastic testing and you have to chain two things together,
you're never going to randomly pick the two things that would have to happen for them to work.
Here is a thing that can like target that specifically go like, oh, I think combine these two things to be probably, let me fuzz that specific path.
Oh, yep, I got it.
Right.
That's what gets crazy is like you just have the AI right.
write the fuzzer.
And then, like, if you can automate that process, you win a lot of the time.
It's pretty, pretty amazing.
I do have to go, though.
I have a meeting in three minutes.
So I got a rip.
Oh, hopefully you get Mythos Axis.
Congrats.
That'd be neat.
No, it's not going to happen.
Come on, guys.
Come on.
Thanks, Ed.
Bye.
Bye.
Bye.
Have a good man.
I like you guys, but it looks like it's the end of our show, unfortunately.
Yeah.
So true.
All right.
Thank you, everybody.
I would just like to say that, I would just like to say that Casey and T.
and obviously Timu Casey that just left commonly known as low-level learning.
You guys, you know, you make the show magic.
And now I'm just going to go about being lonely again.
Kind of sad.
Oh, Prime.
I did that was coming.
I thought I was going to get booed.
But I just assumed something's going to happen.
All right.
The good news is that you can enjoy full episodes of the stand-up now on YouTube.
If you go to the stand-up pod full, which I'm going to.
I'm going to try to rename. Hopefully at some point we're trying to work some things out to get it a better day. But right now. YouTube. YouTube. Am I right? If you go to the website, if you go to our website, will have links to these. Yeah, it will. Okay. It will. And it'll have it spelled out. We'll make it more clear once we figure everything out. Over the next week, maybe by the time you're listening to this on YouTube, by the time we're listening to this on YouTube, we're going to upload all of the backlog to that channel as well. So we should have every episode.
on YouTube in one spot, very easy to see, etc.
Obviously, you always can, you know, RSS download, download the audio directly.
Don't press the red button on that site, of course.
Teach, what is that web address that people should go to?
The standup pod.
Go to the standup pod.com.
All the links will be there.
All the episodes will be there.
You want YouTube.
You want Spotify.
You want downloads.
You want RSS.
You got it.
The standup pod.com, ladies and gentlemen.
Check this out.
I'm just going to do something for the audience.
Look at this.
If you go here, you click Trash Made a Black Mirror app,
you can go and you can listen to it right on the website.
You can have all the nice information right here.
You can download Trash's app right there.
You can go in here.
We don't even charge you for that.
Yeah, look at this.
We don't even charge you.
You can play on Spotify.
You can download and just have personally for you to do whatever you do.
That's for you.
Now that we're, and then I'll make it,
I'll make it so links to the YouTube there later as well,
now that we're going to have a dedicated YouTube channel for that.
too. So for all of you out there. Yeah. The AI companies claim that you're going to get UBI,
but we're actually giving you universal basic podcast. You just get it. UB. P. For three, UBP.
UBP. You know me. You BP. Yeah. UB.P. I was going to say, well, I don't know what I was
going to say. That's fine. We should just stick a fork in it, guys. It's done.
All right. Thanks. Good seeing.
Thanks to the end of the episode.
Bye.
Bye. YouTube. Thanks again. Whatever your name is, Tejj.
You're pretty neat.
