Risky Business - Risky Business #835 -- Why the Fast16 malware is badass
Episode Date: April 29, 2026On this week’s show, Patrick Gray and James Wilson are joined by special guest-host Dmitri Alperovitch. They discuss the week’s cybersecurity news, including: Th...e US government is mad as hell about Chinese firms stealing American AI technology Dmitri has an opinion or two about the US selling Nvidia chips to China Speaking of Chinese AI, Kimi’s new 2.6 is very interesting The US sanctions a Cambodian senator for earning mega bucks through scam compounds And a ransomware family is promoting itself as being … quantum-safe? This week’s show is sponsored by Trail of Bits. CEO and co-founder Dan Guido chats to Pat about how private inference works and Trail of Bits’ audit of WhatsApp’s private AI setup. This episode is also available on Youtube. Show notes Exclusive: US State Dept orders global warning about alleged AI thefts by DeepSeek, other Chinese firms | Reuters moonshotai/Kimi-K2.6 · Hugging Face Discord Sleuths Gained Unauthorized Access to Anthropic’s Mythos | WIRED Newly Deciphered Sabotage Malware May Have Targeted Iran’s Nuclear Program—and Predates Stuxnet | WIRED Hackers deployed wiper malware in destructive attacks on Venezuela’s energy sector | The Record from Recorded Future News Mystery Around Venezuelan Cyberattack Deepens, with New Discovery of "Highly Destructive" Wiper Risky Business #819 -- Venezuela (credibly?!) blames USA for wiper attack - Risky Business Media AI Tools Are Helping Mediocre North Korean Hackers Steal Millions | WIRED CISA: US agency breached through Cisco vulnerability, FIRESTARTER backdoor allowed access through March | The Record from Recorded Future News US, UK authorities warn that Firestarter backdoor malware survives patching | Cybersecurity Dive Surveillance campaigns use commercial surveillance tools to exploit long-known telecom vulnerabilities | CyberScoop UK regulator closes loophole that allowed rogue companies to track phone users' location | Reuters US sanctions Cambodian senator for millions earned through scam compounds | The Record from Recorded Future News Vercel says some of its customers' data was stolen prior to its recent hack | TechCrunch Supply Chain Security Incident Update Apple fixes bug that cops used to extract deleted chat messages from iPhones | TechCrunch Kyle Daigle on X: "Wanted to provide more clarity about this. Yesterday, we had a regression in merge queue behavior where, in some cases, squash or rebase commits were generated from the wrong base state, making earlier changes appear reverted in branch history. 2,804 pull requests out of over 4M" / X Securing the git push pipeline: Responding to a critical remote code execution vulnerability - The GitHub Blog One ransomware crew now drives half of all cyber claims: At-Bay | Insurance Business In a first, a ransomware family is confirmed to be quantum-safe - Ars Technica What we learned about TEE security from auditing WhatsApp's Private Inference
Transcript
Discussion (0)
Hey everyone and welcome to risky business.
My name's Patrick Gray.
We've got a fabulous show for you this week.
Mr. Adam Bwalo is still traveling and not here with us at the moment.
So we have a special guest co-host, which is Mr. Dmitri Alperovich,
who will be joining me and James Wilson in just a moment to talk through the week's security news.
Dmitri, of course, many, many years ago was the co-founder of CrowdStrike.
But these days, he is the chairman.
of the Silverado Policy Accelerator,
a Washington, D.C. based think tank.
But he is still very much someone
who pays attention to events in cyber.
So he's graciously agreed to join us
to do this week's show,
which he does sometimes,
and we sure do appreciate it.
This week's show is brought to you
by Trail of Bits,
which is a security engineering firm
based in the United States.
And Dan Guido
will be joining us this week
to talk through private inference.
So he did a talk
and unprompted a while ago.
And one of the interesting things he said was he wasn't really that keen on buying hardware
for trailer bits to use to do private, you know, like local model,
to run local models and do it privately,
when you can actually just basically rent private inference hardware, you know, as a service, right?
And I've thought, oh, that's an interesting topic for a conversation.
So he's joining us this week and this week's sponsored of you to talk through how that works.
There's like tinfoil SH is one of the services that they use.
But they also did some work looking at WhatsApp's private inference approach.
And yeah, it's just generally an interesting topic.
So that is coming up after this week's news, which starts now.
And Dimitri, we're going to start with you.
It's really funny, man, because we've got, you know, we've had guest co-hosts the last couple of weeks.
And the topics just keep miraculously lining up with whoever we've scheduled to be our guest co-host.
Like last week we had gruck.
And there was all this grucky sort of stuff to talk about.
And this week, there's all this sort of very much Dmitri Alperovich.
All the Dmitriy stuff.
Yeah, exactly, right?
It was meant to be.
So we're going to start with this first item here, which is the State Department kicking
up a huge stink about distillation attacks against, you know, frontier models in the United States.
So they're saying that Deep Seeker are a bunch of dirty thieving scoundrels.
And it will not stand.
And yeah, what's the go here?
Well, not just the State Department.
This is really an all-of-government approach.
Michael Kratzios, who heads the Office of Science and Technology at the White House,
also put out a memo to all the executive agencies that they release publicly.
Look, this is a big deal, and I'm glad that the government is now paying attention to this.
I've talked to a number of frontier companies and their researchers.
They actually believe that most of the progress that you're seeing currently from the main Chinese AI models,
Kimi, Deep Sea, Kwen, is actually coming from two things.
It's coming from distillation.
of US models and obviously smuggling of or even buying legitimately of
Nvidia chips that you use to do this post-training.
And by the way, post-training.
Hey, hey, hey, whoa, whoa, hey, how dare you suggest that?
I mean, the Chinese trained these things on a bunch of old like 486s
they had lying around in a gut.
They trained them on a potato, Dimitri.
What are you talking about?
That's crazy talk.
As we know, potatoes are fantastic at matrix multiplications, so they're just perfect for training
this stuff.
Yeah, no, look, I mean, as we know, even the Chinese AI researchers are saying that they're
training out on video chips.
Now they're claiming that they're buying them legitimately, but this is not a secret.
But look, what's actually really important is that if you talk to frontier AI companies
here in the U.S., they will tell you that post-training is becoming really, really important
and is driving most of the improvements in models today.
So it makes sense that if you steal a bunch of reasoning traits from our models
and then use them for post-training,
that the Chinese AI models would become quite good.
Still behind us, but behind in a matter of months, which makes sense, right?
New Opus, you know, 4.7 comes out.
They do a bunch of distillation attacks on it,
and then three to six months later, oh, my God, a new Kimi model is out.
That's really good.
I wonder where that came from.
Yeah, yeah. I mean, we also saw the news, like we got a bit of insight into how some of this chip smuggling is happening, right?
We're the super micro co-founder, you know, personally assisting in like using a hair dryer to remove labels from hardware, to stick them on other boxes and whatever to move this stuff into China.
I feel like the restrictions on chips are only ever going to be so successful and the best you're ever going to do is slow the Chinese down.
Now, you did some testimony to the US government.
We'll get into that in a minute on this topic.
Your testimony was on this topic.
But James, you've actually recorded and will publish today a 75-minute deep-dive solo podcast on distillation attacks.
And I think the conclusion that you arrived at when going through this exercise was that really the moat, America's moat, America's edge in the AI war really does come down to compute.
It comes down to who has the most hardware.
Yeah, absolutely.
It was a really interesting deep thought that I did, Pat, because it surfaced that.
It also surfaced that distillation is not just for attackers.
This is a legitimate technique that's actually used in training.
But for an adversary, it's such a, like a delicious economic sweet spot.
You can take these freely available open-weight models,
and with a relatively small dataset compared to the massive data set you'd need for an actual frontier model to be trained,
you end up with some really high quality results
and results on smaller models as well
so that does help with the fact that
when you're limited in your computer and your RAM
you can still run these things
but yes everything's converging
everything's becoming commoditized
but the thing that still remains the huge limiting factor
is getting the access to the GPUs
and increasingly the RAM that's needed
to host these models for the inference
now you know when some you know we describe people
as doveish or hawkish
I sometimes describe Dimitri as
being less of a hawk and more of a teradactyl when it comes to topics like this,
you gave some testimony recently.
Was it a senator, a congressional committee?
It was the House Select Committee on the Chinese Commerce Party.
Yeah, okay.
So I'm guessing you went there and said, hey, it's fine.
We should sell the Chinese our hardware.
They're our friends, right?
Was that basically what you said to them?
Look, I basically said that, you know, we're in an AI race.
No one is disputing that.
The last time we were in a great power race was a space race.
and selling chips to China is akin to selling rockets to the Soviet Union to get to the moon faster, right?
It just makes no sense whatsoever.
And Chairman Moulinar actually asked me, well, what about this argument that NVIDIA uses that they'll get addicted to our chips?
And I said, you know, Chairman, with all due respect, chips ain't cocaine.
They're not that addictive.
And look, you know, the top two frontier eye models right now, Claude and Gem and I were not even trained on Nvidia chips.
They moved off of Nvidia very, very quickly.
I'm told it was a matter of about 10 people in a couple of months to actually pour.
to TPUs in the case of Google, traniums in the case of Claude.
So this is not something that you can actually get them to be addicted to.
And the talking points from folks like Jensen and others is, well, wait, we want China to use
the American AI stack.
And my view is American AI tech stack is not chips.
You know, if you want to talk about the tech stack, it's chips in American cloud providers
running American models.
That's the tech stack.
But selling China chips so that they can steal reasoning traits from American cloud models,
AI models, and then use it to train their own models that will be running Huawei data centers.
That's not winning.
And I actually had a little bit of a debate with Shriaram, who is the now, I guess, acting AI czar,
now that David Sachs has moved on in the White House on X, where I said,
imagine the situation where we actually get China addicted to Nvidia chips,
which is impossible, but let me just suppose that they would.
And they train the very, very best models.
And the entire world is running on Kimmy or Deep Seek
and applications are built on top of them,
but they're all running on a video chips.
Did we win?
Like, is that what victory looks like?
Of course not.
It's all about the models, not the chips.
Yeah, I think also the argument that,
well, if you restrict chips from China,
they're just going to start developing their own.
I mean, that's a heavy lift.
I think for people who are really deep in the semiconductor space, like that is a heavy lift.
Well, by the way, they're doing that anyway.
And most of them, this is a story that should be getting more publicity, but it's not.
So they have the Huawei Ascent chips, right, that are not even equivalent to the black wells in any shape or form.
It's similar to the H20 from Nvidia, but not even as good as that.
But most of the ascents chips were not made in China.
They were actually made by TSM, who was kind of conned by a shell company to build them for Huawei.
And of course, that's now hopefully been stopped.
So they can't even manufacture these chips when they design them.
By the way, one other point on distillation to come back to that topic, this is why, and many of the listeners probably have noticed that, when you go to Claude, when you go to Chad GPT, you no longer see the chain of thoughts details that you used to see on these particularly deep research type of
experiments. This is why, because they know that their models are being distilled. If you show all that
information, it helps a lot for China to actually catch up quickly. Well, James and I were chatting about
that way. And your opinion on that, James, was it helps a little more than a lot, right?
Yeah, it's, I mean, look, it makes sense why they've done it, but there is just so many other
signals that are still going to be there to distill these models. And so, yeah, it helps a little,
but it certainly hasn't closed the door on distillation.
And just to really bring into focus that point Dimitri made about the,
are the chips the thing that people get addicted to?
One of the things I covered in the pod that's coming out today
is that the actual act of distilling a model is about 10 lines of Python code.
Like this is so well embedded into frameworks and existing tooling
that you could imagine that there is only probably one line in that 10 line
Python script that would have to change to move from an Nvidia chip to a GPU
from anywhere else.
And so it's just not, we're thinking about this wrong.
We're thinking about it as there's some sort of incredible level of engineering
that directly hard tires the training process and inference process
to a particular hardware vendor.
But we're not, it's all been modularized and abstracted away.
And so I completely agree.
It's easy to move.
Now, James, look, there is some validity to the fact that PiTorch,
one of the common frameworks, is more optimized for Kuda and Nvidia.
That is true.
But, you know, TPU code has been.
in there for many years now.
More optimizations are being done.
And look, every single frontier company,
every single hyperscalers building
their own chips, right? Maya from
Microsoft, Facebook is building
their own chips. You know, Open the I is claiming
that they're going to do their own chips. Obviously,
TPUs from Google. So it's not like
Nvidia has a monopoly on this. I mean,
this is matrix multiplication. Sure, you
do it in a very parallelized way.
You integrate high band memory.
But, like, I actually think, you know,
you guys may correct me on this, but like,
building a TPU is actually a lot easier than building a CPU,
where you have to do all the branch prediction stuff and like deep cash integration,
all that stuff.
Here you're just optimizing for really, really fast, really huge matrix multiplication.
Yeah, that is exactly right.
It is simpler because of the tasks that do,
kind of like the difference between the complex instruction set versus risk in the reduced
instruction set, and this is reduced even further because it really just does come down
to Matt Moll.
But there's other challenges around just the memory band with parallelization in computer
science has always been one of those really difficult things alongside naming things.
But yeah, the sort of croxier point is right there.
Now, just for everyone listening at home, this is what happens when you ask Dimitri Alperovich,
hey man, what do you think about chips?
A little bit of a passion project.
Yeah, we're going to kick on to the next part of this discussion, which is you've mentioned
the Kimi model a couple times.
Now this is interesting because like James and I, we actually interviewed Nicholas Carlini from Anthropic.
He's the security researcher at Anthropic and we published that into our risky business features feed because there's no room in the main feed for this.
So if you want to hear these sort of podcasts listeners, please go and subscribe to risky business features.
It is a completely different podcast feed.
And this is also where James's solo pod all about the distillation stuff is going.
Now, the reason we're going to talk about Kimmy just now is it's, it's, it's really.
relevant to a bunch of stuff that we talked about with Nicholas in that uh in in last week's uh
podcast james chiefly you know he was talking about how mythos might be big and scary now but like this
sort of capability will probably be in a local a model you can run locally on your laptop in a year from
now and i guess the reason i wanted to talk about kimmie now is because they've done some really
interesting stuff in terms of making the model very efficient at performing certain tasks by being
able to sort of selectively load parts of it, right? Which makes Nicholas Carlini's claim about
what these local models might be capable of, either on your own hardware or on something like
tinfoil-sh, which we're chatting about with Dan Guido later. The point is these local models are
going to get a lot more powerful. Can you just tell us a little bit about the innovation here?
Because the Chinese, we saw it with Deepseek and now we're seeing it with Kimi. They are innovating
in their own ways in AI. And this is a good example of that. Yeah, 100%. So let's cover Kimi and
also Quinn because these are the two sort of leading models out there in the open weight space.
Kimi's interesting because they've just come out with K2.6 and it follows in the pattern of
Deepseek, which uses this thing called mixture of experts. So Kimi K2.6 is a one trillion
parameter model, which is vastly too big to even fit or to run efficiently on sort of a laptop
or even, you know, the most beefed up Mac Studio. But you can see the trajectory that they're heading
towards through these innovations because although it's a one trillion parameter model, only about
32 billion of those parameters is active at any one time. And so if this continues in this direction,
we're going to see this very nice balance between the model is huge in terms of the overall
capabilities it's got, but it's able to then zero in on certain parts of the model, activate
those at that point in time to keep inference far more efficient. The other end of the spectrum
that's interesting here is Quen, which is another Chinese.
model, this is from Alibaba. And they focus more on models that are actually manageable size
for local deployments. So they've got models available at the moment that are, I think there's
one that's 27 billion parameters, one that's 35 billion parameters. And the benchmarks basically
say these are approaching those frontier model capabilities in agentic coding and a few other
tasks. And to put that in perspective, like I've got a Mac Studio M2, 32 gig of RAM, I can run a Quen
32 or 27 billion parameter model on that locally.
The catch here is, though,
these models are still incredibly slow
compared to a hosted anthropic model.
Like you're talking about 400 to 1,000 seconds
for something that takes 20 seconds with Claude,
and they're also an order of magnitude
more inefficient with their token use.
But sort of to what Dimitri was saying there,
this is optimization, right?
We've cleared the hurdle of this as possible,
and now we're just in optimization.
And there's one thing we know about optimization.
It just keeps getting better and better
over time. So yes, local running models with frontier capabilities coming real soon and kind
already here. Yeah, so I think, I mean, I think that was really an interesting part of that interview.
Dmitri, you know, you and I, we've known each other a long time. We're good friends, right?
Like our friendship, I best describe it as an argument that started in about 2019 and he's still
continuing. You've been, you're really an AI pumper, right? Like you've been big on AI,
very early, way ahead of me, you know, I think largely you've been proven right.
That said, with Mythos, you think some of the hype around Mythos is not entirely justified.
So I guess, you know, the idea that we might start seeing similar capabilities in local models
might not be that frightening to you.
Just a quick take from you on Mythos and what you think it means, because it's been such a big topic
and we've got you here.
Well, look, first of all, and I think Nicholas on your podcast talked really well about this,
you know, the big difference with Mythos is the exploit generation, right?
It is better at finding vulnerabilities, but look, other models have been able to find
vulnerabilities as well, right?
It's not like we woke up today and Mythos suddenly can find all these vulnerabilities
that no one else could discover.
It's incrementally better, and, you know, my guess is probably, and I haven't played with
it, but I've talked to many people that have, maybe 30% better than, you know,
the Opus 4.7, for example.
But it is able to write exploits.
But from what I've heard, even that capability is hit or miss.
Like simpler exploits, sure, but anytime you need to sort of like, you know, do an iPhone zero-day
that chains lots of things together, that becomes much, much harder.
It is able to do it.
You know, what I hear from my friends Anthropic is that if you're spending the equivalent
of $100,000 dollars on tokens, sure, you can probably generate some pretty cool exploits.
I mean, very few people, companies can afford that much, right?
So on this level of experimentation and driving it that deep, so mythos is incredibly expensive,
which is yet another reason why it won't be available publicly.
And by the way, you know, that really helps with distillation.
If Chinese cannot get their hands on this model.
We'll see how good Kimi is in six months and whether it can actually catch up with War of Mythos is today.
Yeah, I thought actually Carlini's argument that the cost,
thing is kind of irrelevant was the more compelling part of the interview where he's like
look you know it's getting cheaper some of the dollar figures around the tokens
required to do these things have been misreported so I thought that that part was
pretty interesting but yeah anyway it's an interesting thing right because no one
quite knows where all of this is a hundred percent going but look speaking of
mythos there was a big story it was breaking kind of as we were preparing last week's
show I didn't think it was that important to cover but it's been everywhere
which is a group of people in a discord
figured out how to get access to mythos
when they weren't on the approved list of people who could use it.
The reason I thought this story was funny
is completely different, I'm guessing,
to the reason most people thought this story was interesting.
The reason I thought it was interesting
is Anthropic totally hung themselves
on their own messaging here
because they come out and talk about how dangerous it is, right?
So obviously when you find out someone got access to it
who wasn't supposed to,
like people are going to think that's a big deal
when honestly it's not. Was that your take here, James?
Yeah, 100%. It's just funny reading this as well. Like a bunch of shady discord people,
and they basically used a data breach at Merckor to sort of rifle through some logs and sort of found
the URL patterns. Well, actually not even the URL. I think that bit's been completely
misreported. It's looking at the payload of the message transcript that goes back and forth
with Anthropic. And they just basically guessed, hey, you know, the pattern for naming a model is
pretty standard with Anthropic. It's the name.
name, it's the version, it's the date, and they must have just put a few things together and
determined, oh, this is what Mythos is called and the number and had access to it.
There's also a little dot, dot, dot on the story around how the other part of this that made it
possible is apparently they had, or one of them was working with an anthropic contractor.
I do wonder if that's soon to be former anthropic contractor, but they had some sort of
of creds that probably gave them the ability to access models that were, you know,
hosted in Anthropic, but not yet publicly available.
But overall, yeah, I agree with you.
It's like if the model's that dangerous, you kind of think there would have been a few more safeguards or things around it.
Yeah, but it's like marketing meeting reality, right?
And it's just a funny old thing.
You know, they've built a computer god in a box but can't do ACLs.
You know, like it is just such a classic in the genre of Infosex stories.
Moving on real quick.
We're going to talk quickly about, I don't think this is going to be quick.
actually, I lie.
We're going to talk now about Fast 16, which is some malware.
It was, I think, Centaur 1 research here.
It was Vittalikamluk, which is a name I haven't heard in a while.
He's an ex-Kaspersky guy.
And Jags, Juan Andreas Guerrero Sard, which I guess is why he goes by Jags a lot of the time.
Sorry for massacring your name, dude.
They have done a bit of research into this malware that kind of connects to shadow brokers
in an interesting way.
Dimitri, tell us about Fast 16.
Well, this is, I guess, a 10-year-old mystery that they were focused on solving.
So people may recall the Shadowbrok's release back 10 years ago,
allegedly from the NSA, and it contained, in addition to a bunch of malware and other things,
it contained this file that was called Territorial Dispute,
which basically had a list of various coronal drivers and other identifiers that you would check
when you land on a box to see if someone else is already there.
So it could be like other five eyes malware, other malware you know about,
and you basically wanted to make sure there's no kind of friendly fire.
If you're also on there and they get detected,
then suddenly you get detected as well.
So you can kind of decide what you want to do from a deconfliction perspective at that point.
And there was one line in that file that just said,
Fast 16 driver,
Colonel Driver, move on, nothing to see here.
And that was it, right?
And no one knew, you know, what APT is this?
Like, you know, why is it saying that?
That seems like really interesting thing.
And Jags and Vitaly have been looking for this fast 16,
elusive fast 16 driver for years now.
And the way they found it is really, really interesting.
They actually use a technique from this,
another Kaspersky researcher called Sergei Meneyev,
who actually passed recently, I think last month,
who was like this unbelievable APT hunter
who had all these ideas of how you look through a huge repository of data, telemetry, or malware,
to find interesting stuff.
And what they did is they started looking at binaries and integrated lure interpreter.
Because they said, wait a second, we see this duku malware and other malware that integrates Lua.
Let us look at all all binaries from like 2000 to 2010 time frame that integrate Lua interpreters, right?
And then weed out all the legitimate stuff and see what else we can find.
And they stumbled on this file that eventually led them to this Fast 16 kernel driver
that basically was compiled in 2005 based on its compilation date.
And it's a really interesting malware because it spreads itself via network shares.
And then it hooks into Windows file read operation so that it can look for new executables that get loaded.
D.EXE files.
and then we'll patch specific XC files
and patch particularly mathematical calculations code in those XC files.
And this was another sleuthing exercise by Jax and Vatli to figure out
what are the XCs that it's actually patching.
And their top candidates that they identified were this L.S. Dina Suite,
which is a powerful engineering simulation software
that's used to analyze how materials and structures behave in various extreme conditions,
like, let's say, nuclear explosions.
and lo and behold, another think tank has actually published information that Iran has used
this software in its nuclear program and to do its modeling.
And then there was another program for open source water modeling and another Chinese
construction and design software.
And basically, when these executables load, it will patch up its math calculation code
to basically introduce subtle errors in it, right, so that the modeling would produce wrong
results, which would be really, really difficult to detect.
So another kind of, you know, Stuxnet-like program that, you know, supposedly was hitting
Iran back in the early 2000s.
One thing I found really interesting, I'll give Jacks credit for it because I was talking
to him about it, is he said, you know, this is a type of model that you can use in terms
of tactics to actually target Chinese companies, AI companies, right?
If they're doing distillations, well, if you can introduce subtle errors and Matt
operations when you're doing training, boy, you can really ruin those models. So I thought that
was an interesting thought experiment to see if, you know, this might be happening even today.
Well, that'd be interesting because that would be virtually impossible to detect, right? It's one thing
to have an error showing up in a spreadsheet and then cross-referencing that. But the models are so
dense and introspectable. Like, how would you even know that your weights have been messed with
through this process? So, funnily enough, I remember like around the time that this malware was out
there doing the thing.
There was a lot of talk about how you shouldn't forget about the eye in the CIA triad,
the confidentiality, integrity and availability triad.
And, you know, I sat through talks from people who were like adjacent to the intelligence
community talking about how, you know, someone could write malware that would subtly impact
like spreadsheets and things like that and cause drama.
And like, this was a big concern back then.
And I guess I'm realizing the reason it was a big concern among people in the Five Eyes Alliance
back then is because you were doing it to other people, which kind of makes sense.
There's a bit of a bit of projection going on there.
But it's a look, it's a fascinating story.
And, you know, it makes me think, is this cyber war?
You know, I just, I wonder, is that what we're calling cyber war?
Now, look, moving from the I in the CIA triad to the A in the CIA triad, we got some
analysis here again from Kasperski looking at something they're calling Lotus Wiper, which
is apparently the wiper that went after the state-owned petroleum company in Venezuela.
Like back in December, there was a ransomware attack that kind of looked like a wiper attack.
The Venezuelans blamed the Americans.
We actually published a podcast in which we said that was a credible accusation.
It's looking more and more credible, James.
I think you took a look at this and by the process of elimination, you're thinking, yeah,
what else could it be than a U.S. written wiper?
Yeah, that's right.
But between the Kaspersi write-up and Kim Zeta's write-up, which had a bit more detail, no one seems to be comfortable saying, aha, this proves that it was a US operation or an U.S. ally operation.
But it's very much a Sherlock Holmes.
The dog's not barking kind of thing.
This is a highly targeted malware.
It had no financial incentive.
There was no ransom or extortion element to it.
It just straight up wiped things, and it destroyed data really well.
And there was a compiled version of it with the timestamps that match up, and it had a hard-coded string point.
to the PDVSA.
So, you know, like, what's your saying?
But it was Venezuelan dissidents, clearly.
Clearly, walks like a duck, talks like a dark,
must be a Venezuelan dissident.
Yeah, okay.
Yeah, yeah.
Moving on, we got one from Andy Greenberg and Matt Bird,
just over at Wired,
just looking at how the North Koreans are using AI
to, like, automate a lot of their campaigns.
I mean, this is something actually,
James, you and I spoke about recently,
not in the podcast, which is we saw,
there was some incident,
and they talked about how it was an AI,
and able to attack. And it's like, it's not going to be too long before putting that in a
statement about some sort of data breach is going to be completely pointless because they're all
going to be AI-assisted, you know, hacks. I sort of feel like it's going to become the new
sophisticated attacker line that goes into a press release. The new sophisticated attacker is
like AI-assisted. What are the nuts and bolts here, though, real quick on this campaign that
they've written up? Yeah, look, the campaign itself is just
run-of-the-mill AI stuff, you know, and of course everyone's using it, and it did its usual thing
of left some database credentials exposed, et cetera, et cetera, the same all thing that always gets these
vibe-coded apps popped, and that allowed folks to go and rifle through and just see the tradecraft
and the process behind this. What I did find interesting about it is they, I think they've kind of
stumbled on an interesting vector here that allows them to get away with otherwise sloppy vibe-coded
tools to do this, and that is they go after their targets for this.
this campaign here by posting as job ads. Now, we've all been through this. You get that job ad,
you get the call from the recruiter, you want to start going in the recruitment process. You don't do
that on your corporate laptop, right? You're doing it on your personal laptop, which doesn't have
the EDR, doesn't have all those protections. So that's how they managed to get these otherwise
pretty sloppy exploits onto a device. And then, you know, these days, those devices have access to the
things that they then connects will trade and get their crypto wallets and keys out of. So it's just
kind of interesting that that interview aspect, this contagious interviewer camp and as they
call it, hits the sweet spot of quite an unprotected device. Yeah, they're not getting snapped by a
colonel driver that Dimitri helped write 15 years ago, basically. I mean, we saw a whole thing on
this on Twitter recently where someone walked through how they nearly got done, which was a
LinkedIn account of some venture capitalists that they knew got compromised, and then they were like
reaching out to them saying, hey, we should sink up. It's been a while and they did the whole
calendar invite thing and wait it a week and then it's like oh yeah you're going to need to
download this binary to join the call. Dimitri you had something you wanted to add there.
Well, this just proves to me yet again that I said this 15 years ago first that North Koreans
are by far, far the most creative actor, right? They pioneer techniques that then get adopted by others.
They were the first ones back in like 2009 to do disruption at scale. They were the first ones to do
hack and leak with Sony. They did the Bangladesh bank heist and, you.
the IT workers and everything else.
So it's not surprising to me at all that they are the first ones to really popularize
the use of AI.
And of course,
they've been doing this with the IT workers and passing interviews for a while.
Now they're using it for coding.
What was interesting,
I thought is that the way they expel guys identify that they use cursor and Chad GPT is by comments
because they were looking at the scripts and they saw,
wait a second,
there's a ton of comments here describing every step of this operation.
That looks suspiciously like a vibe coded piece of software.
Yeah, in English too.
So, yeah, kind of obvious.
That is funny.
Now, this next one I wanted to talk about with you, Dmitri,
because I feel like I'm taking crazy pills,
because this story, it just keeps rattling around, right?
There's been some sort of bug in Cisco ASA,
and a threat actor, I think it was Chinese,
was putting some malware onto these Cisco's.
Now, Cicca came along and said,
hey, you've got to patch your Ciscoes.
Okay, yep, cool.
And people have patched their Ciscoes.
US government have patched their Cisco's and now they're putting out advisories saying,
hey, they're warning you that this fire starter backdoor malware survives patching.
And I'm thinking what malware doesn't survive patching?
That's not how you do incident response.
I don't get this. It's it's mad.
Why was there any expectation that patching a compromised Cisco ASA device
was going to get you anywhere in terms of cleaning it up?
What is going on?
Well, look, you know, in fairness, right, there's no EDR running on these devices.
So people kind of assume that they're clean and the only thing they have to do is patch them
because usually vulnerabilities don't actually exploit the device itself and to land malware on the device,
but kind of use it as a pass-through to inside the network.
In this particular case, obviously malware was installed that had persistence.
So it would not be removed by patch.
And look, you know, we were talking about this before the shift.
show, like, you have to reflash your firmware in these devices. You have to assume compromise
and as painful as it is to take firewalls offline, hopefully have redundancy. You absolutely
have to do this. Just patching it is not sufficient.
I just, look, I just find it a bit maddening saying headlines like US, UK authorities
warn that fire starter backdoor malware survives patching. I just think, what? Like, okay.
Anyway, okay, we're going to move on to the next story now and Citizen Lab has published
report about a threat actor that is tracking its targets.
They use the physically tracking, you know,
tracking the physical location of their targets by exploiting
vulnerabilities in SS7 and diameter, which is another like SS7 like
protocol that isn't quite as awful.
But it's the sort of stuff that you expect to be seeing happening on
Telcos.
Dimitra, we'll get you in on this in a moment, but first up, James,
you've had a look at this one as well.
this report has resulted in the telco regulator in the UK shutting down the availability of
like something called global titles which are used in these campaigns. Can you explain to us what
a global title is and how it's used to stage these sort of attacks? I can. I can use my very
fresh knowledge of this because after reading the Citizen Lab report I had to go and read up on a
whole lot of topics. The write-up is exquisite. It's so good. But a global title is essentially it's like
a phone number style address with some metadata attached to it and the complex and convoluted ways
that these telco networks work, there's kind of a resolution process. It's almost like you can think
of it's a form of like DNS resolving down to the IP address and then, you know, when IP address has
to use ARP to resolve to the MAC address, similar kind of multi-step resolution process for
these things. But the point of using the global title here, I think, is actually to exploit that
resolution, right? So they can essentially craft a global title coming from
I think there was only three or so candidate telco networks they were trying, but
due to the way that those global title addresses get resolved, they're able to sort of
kind of like poke at various points of the network and see how is this getting resolved,
where is that traffic go to?
And of course, the problem with these networks is they're insecure by default, really.
It's built for interoperability between the telcos, not a high degree of security.
And so people have bolted on firewalls to prevent a lot of this malicious traffic going
back and forth. But what's interesting in this example here is the attacker is actually using
lots of different steps where they basically say, I'll try this attack, see how it resolves,
where does it get to, who blocked it? Okay, let's try something a little bit different. How does
that get routed? Who blocked that? And it's like they're just incrementally building more and
more knowledge about where the weaknesses are, because then they can funnel all the traffic they
want to through that weak point in the network to get to the end subscriber that they're trying
us avail. Now when it comes to these global titles, who is supposed to have access to a global
title? I'm guessing that's a telco. Yeah, it's a telco and they lease them out for other purposes
as well. And that's what the UK has particularly. Of course, right. Like, yeah, of course,
they lease them out to someone for some insane reason because it's the telco industry. Dmitri,
you are actually an investor in Cape, which occasionally sponsors the risky business podcast.
And that's because you introduced them to us when you invested in them. I'm guessing you invested in
them for kind of this reason because they don't even allow SS7, it's diameter only.
And I'm guessing even then they're a little bit careful about the way that traffic is handled
on their network.
I mean, do you have some thoughts here?
Yeah, absolutely.
Look, location traffic, which is one of the ways that they were using this SS7 and diameter
compromises for, is a huge issue, right?
It's a safety, personal safety issue.
You know, executives care about it.
Government folks care about it.
So trying to prevent that at a telco level, really, really important.
And that's one of the reasons why Cape is becoming more and more popular across the board.
What I found interesting in this write-up, and again, kudos to the citizen lab guys, really phenomenal work, is there's actually two types of attackers they identified.
One that was just using SS7 in diameter to track locations.
Another, that actually had an really interesting SMS exploit where they would send a binary SMS message to a device that contained a SIM card, a SIM jacker, basically, exploit that extracted.
location info and would basically have the SIM jacker continuously pin you with the current location.
So a lot of activity that's happening that most people aren't even aware of.
Obviously, we have the on-device compromises that people are becoming more aware of,
waterhole attacks and other things on Apple and Android's.
But this is something that doesn't even impact your phone.
It's down at the carry level.
So, you know, without things like this report,
we would really not know the extent of which this is happening around the world today.
And just the fact that that malicious SMS can include those codes to make those actions happen,
boy, it just makes you nervous about carrying this thing in your pocket when you know that that's
actually, you know, part of the protocol that that can be done.
It's, yeah, it's incredible read.
Turns out mixing data and code is bad in the mobile ecosystem as well.
Who would have thought?
So let's spend hundreds of billions of dollars on CAPEX doing exactly that, Dimitri,
with AI models, but anyway.
And just what you were talking about about it, but you know,
feeling nervous about it being in your pocket, James.
Like I just still remember something like eight, nine years ago or something,
meeting up with Joseph Cox, now the publisher and, you know,
head of one of the, one of the partners in 404 media.
And he did not have a phone.
He used a iPod touch as his,
as his communication device for exactly that reason.
He's like, I do not want to connect to any of these networks.
Well, by the way, if you talk to some of these citizen lab researchers
and how paranoid they are, they don't have phones.
They, you know, do a hotspot to a device via Wi-Fi, like they go through
extraordinary efforts to try not have to get that.
Yeah, yeah.
Well, I mean, that's the thing.
Like, I don't want to live that way.
I just prefer not to think about this, frankly.
Moving on, and real quick, the US has sanctioned a Cambodian senator because this senator
is essentially, you know, renting compounds to scam operators.
I mean, this is what I've been saying would be the case where you'd wind up with a form
of state capture by an illicit industry because it's, you know, something like 40% of the
Cambodian GDP or whatever.
comes from these scams. So of course the government is going to essentially wind up
operating as an enabler of these sorts of scams and that seems to be what's happening.
What else we got? We got Versal have put out another statement talking about how some
other company other customer accounts got knocked over by attackers because this is what
happens right. Your hire Mandiant. They come in they start kicking over rocks and they
cut and they start finding other stuff and that's what's happened here.
Although it doesn't look like what they found here was a necessarily a breach of Versel
systems. But they have
have through the process of doing incident response on the breach that they had discovered that,
well, hey, here's some access to some customer accounts.
It looks kind of funny.
I mean, that's basically it, right James?
Yeah, that's exactly it.
It's the same thing.
It's these environment variables that weren't marked as sensitive, somehow being enumerated.
But again, small number of accounts.
It's not going to lead to much, I don't think.
And I stand by what I said last week.
I think Vesel did a really good job of just enacting three or four remediation corrective
steps here that, you know,
all still remain as good mitigation strategies even for the new things they've found.
And in particular, all the environment variables now are sensitive by default.
So that completely alleviates this problem.
We had a complaint, too, that we were going too easy on Vassel on last week's show
because they were like, they didn't send us remediation things in time or whatever.
I think, look, the reason we're praising Vassel is because they didn't communicate ahead of,
like, them knowing stuff, right?
Which is how you get into trouble when you're communicating about an incident.
So I think we're going to stand by that.
I mean, Dimitri, have you even followed this one?
Not too closely, but yeah, I mean, look, major hosting provider, right?
Of course, they're going to get popped by a variety of different actors.
Of course, people want to get access to their information.
So not at all surprising.
Yeah, yeah.
Meanwhile, checkmarks.
We spoke like a couple weeks ago about a very limited intrusion that, you know,
into like a checkmarks open source thing that not many people use,
no big deal.
There was sort of a supply chain thing.
About that, James.
Yes, about that.
Yeah, it was originally the checkmarks kicks,
which is just basically an infrastructure as code scanner.
So we weren't, yeah, it was interesting,
but it was all team PCP and wrapped up in that.
Then along comes April 22.
There's a new set of compromise packages that are released,
and this went beyond kicks.
This was some of the AST GitHub actions,
their VS code extension, their developer assist extension.
But the thing that makes you a little bit nervous,
about the state of checkmarks is even in their updated advisory this week. They said
a second wave of malicious checkmarks artifacts are published, and this was April 22nd,
indicating continued or renewed attacker access. So clearly not quite sure how the attackers
still have access, but suffice to say they do. And that resulted then in a whole bunch of
data getting exfiltrated out of their GitHub repos and Lapsus claimed credit for that. So
So it's a bad time for check marks.
Yeah, it is.
Now, quickly, too, we spoke about the issue where the FBI were able to extract someone's signal messages
out of some cache of their display notifications for those messages, for their push notifications.
And, you know, you said, having worked at Apple previously, that shouldn't have been possible.
Apple has now patched that, but you have had a look at Apple's blog here and you think that it's not so much
that the push notifications were cashed,
you think they've actually fixed a different bug?
So the notification database is part of a thing in iOS
called Springboard, which is actually a very, very well-engineered
and quite a hardened part of the operating system
because it does handle everything from arbitrating the apps
that are running.
It's the app launcher, it's the notification center,
control center, et cetera.
So I was a little bit surprised that a bug would creep into there
that would have been essentially something along the lines of
when iOS posts a notification saying an app is deleted,
a bug was causing that the history of notifications to not get purged.
That's what we thought the bug was, but still I was a bit like,
that's a bad bug to end up in Springboard or one of those components.
Then along comes the fix,
and the security bulletin from Apple has this line in it.
It says, impact notifications marked for deletion could be unexpectedly retained on device.
We knew that.
description. A logging issue was addressed with improved data redaction. And this is where I can use a little bit of my
internal knowledge of Apple to tell you that Apple has systems for collecting a whole lot of diagnostics.
When I worked there, we'd ship a feature and we would make sure that it had things like message tracer keys or
AWD diagnostics in there so that once this software's out there, we could pull these metrics and we could see,
you know, how often is someone using, you know,
mail's feature for threading, et cetera.
I think what's happened here is someone accidentally put one of those traces into a log
to say, log when message has been deleted out of the notifications database
and included the payload in there.
That gets into that logging data.
And it fits also with this reported thing around the notification logs were cashed for a month.
That's generally the pool time for those.
diagnostics. They hang around, they batch up, and then they get shipped off. So yeah, I think it was
actually logging and telemetry that got us here. Yeah, you know, one thing that I found really interesting
is, you know, as all of your listeners know, you have this model now where a patch comes out for a piece
of software and everyone is reverse engineering that patch to kind of figure out what the vulnerability
was, write an exploit and kind of use it as a one-day exploit to compromise anyone who hasn't
patch. Now it seems like the model is to actually reverse engineer,
or really read indictment documents to identify vulnerabilities that you can then use to go patch,
right? So it tells you that DOJ probably was a little overzealous in the way they were describing
how they got this data and they probably should have obfuscated a little bit better so that Apple
wouldn't patch it in the future. I know of stories where entire indictments have gone away
because defense counsel have been pushing FBI to disclose how they've collected certain bits of
evidence so that they could make the indictment go away because they didn't want to expose sources.
and methods. So yeah, I actually had the same thought, which is like they messed up by talking
about this in in a court document because now obviously that that capability is gone forever.
Now, real quick, James, because we're like going over time. GitHub's had a hell of a week.
They had a regression in merge queue behavior, which was not great, but wasn't as bad as it
was like first thought to be. They've had all sorts of availability problems. And then there's
been this security issue that Wiz disclosed like overnight for us here in in Australia.
Can you just walk us through quickly GitHub's horrible week?
Yeah, bad week.
So the merge Q1 looked pretty scary because, you know, when you say it's a regression in merging
and the regression is the code didn't merge, that's a big deal.
But it was only...
You had one job, etc.
Yeah, exactly.
Regression.
We didn't do what we were supposed to do.
But it was in this thing called a merge queue, which is only used if you're at a very high
volume of commits and PRs going through and so hence the blast radius that was small but nevertheless
dumbug and a real concern that the testing wasn't actually you know exercising the real purpose of
emerge queue there then along comes whiz with this advisory that there's a you know RCE basically
in GitHub online and as well as well as all of the self-hosted versions of this
the write-up's good but the thing that made me really laugh out loud with this one was
towards the bottom of the article GitHub says
if you're looking for indicators of compromise
on your GitHub Enterprise box
look in your GitHub pushes
and see if any of the GitHub pushes
contained a semicolon
and it's like oh come on guys
you're telling me that you just had basically
a straight path from
a push command to a shell
that you could put a semicolon in there to say
hey start my new command here
it's just it's sequel injection
but for the shell so a
really, again, dumb bug, to their credit, they did post today a pretty good article that explains,
look, we are having a rough time because AI agenda coding has just blown the roof off all of
our metrics of how fast committed to coming in, how many repos are there. So, yeah, you know,
but still, tough time for them. I mean, it's the vibe coding revolution, right? So, like, if you
imagine the equivalent number of, like, human developers that they're dealing with now, that actually
makes a lot of sense. Yeah. It makes a lot of sense. Now, we've got two items we're going to get
through real quick, both kind of hilarious. Catalan spotted this one and dropped it into slack
today, which is the insurer at bay has said, oh, you know, one ransomware crew is like risen
to the leaderboard and you're thinking, oh, big and scary, it's a Kira ransomware and you're thinking,
wow, you know, these guys must have mad skills. And it turns out no, they've just got an M.O.
where they go and rinse anyone who's running a sonic wall, basically, which is how they've been
able to get to number one, which is just, I mean, what are you?
even say. Yeah, specialization works. Specialization works, that's right. And, you know, and how the
mighty have fallen. Like, if that's what gets you to the top of the ransomware leaderboard these
days, like, you disappoint us with your, you know, subpar TTPs. And the last thing we want to
talk about is, you know, the cybersecurity industry's marketing dross is starting to leak out into
the ransomware ecosystem. We've got a ransomware. Dan Gooden has this right up for us, Technica,
where a ransomware crew is Kiber is going out there promoting its ransomware as being
quantum ready, Dimitri.
Well, this is military-grade technology, military-grade encryption for ransomware, right?
Why wouldn't you use military-grade technology, Mr. Affiliate that I want to sign up?
By the way, it's interesting that it's named after the Kiber, you know, lattice-based encryption
algorithm for key encapsulation, obviously, that is quantum resistance.
And yeah, you know, if you want to differentiate yourself, one way to do that is obviously
to target the most common vulnerabilities like Sonicwell.
If you don't have that, you know, resort to marketing.
Well, guys, that's actually it for the week's news.
But before we go, Dimitri, you know, how was your weekend, man?
I heard you were off to the White House correspondence ball the other night.
Was it fun?
Well, you know, it was interesting, exciting.
I was disappointed that the food was kind of lacking.
We got salads and then not much else.
So we were stuck there for an hour and a half when they told us the president would come back,
which he, I think, wanted to do.
And then in the end, the Secret Service didn't let him.
I remember I was texting with you.
At the time, we thought that there was a dead body outside because there was rumors that the shooter was killed.
and you were marveling at the fact that in America,
dinner could continue while dead body is outside.
But this is America.
It's almost like gun deaths have become a little bit over normalized, maybe.
Yeah, I tell you, we were right at the stage.
And one of the funniest things I saw, you know,
in the initial moments as Secret Service is swooping in
and taking over the people that they were protecting
and they were grabbing Stephen Miller and Katie Miller,
who was sitting at the Fox News table right next to us.
And there's this couple, and I didn't know who they wore, unfortunately,
but they're jumping over people, jumping over chairs and screaming,
Stephen, Stephen, can you please take us with you?
And I'm like thinking, this is not the rapture.
Where are you thinking he's going to take you?
Well, Dimitri, I hate to tell you, mate,
but if it were the rapture, I wouldn't want to go where Stephen Miller's going.
Basically, that is the last place I would want to go,
during the rupture.
But, you know, to finish it up, you know, the evening actually ended up really great
because we had the Canadian ambassador at our table.
And at the end of it, when they finally let us out, he's like, why don't you come over
to a house, we'll lower a pizza?
And it ended up being a really fun end of the night, actually.
Well, mate, all's well that ends well.
We're glad that you had a fun night and survived to tell the tale.
So you could come here and talk about the news with us.
And on that note, Dimitriol Perovich, James Wilson.
Thank you so much for joining me on the show to talk through the week's news.
It's been a lot of fun.
It's been fun, Pat, and looking forward to next week.
Thanks so much for having.
That was Dimitri Alperovich and James Wilson there with a look at the week's security news.
Big thanks to them for that.
It is time for this week's sponsor interview now with Dan Guido from Trail of Bits.
And Trail of Bits is a really interesting sort of security consultancy and engineering firm that does a lot of interesting work.
And, you know, Dan is big on AI at the moment, of course.
I mean, they've always been a very sort of forward-leaning company.
So when there's new tech, they dive right into it.
And one of the things that they've been both making use of and looking at on behalf of their clients is when you want to do some sort of private inference, right?
Rent some infrastructure so that you can run inference workloads without exposing them to the hosting provider.
And I thought this is going to be an interesting topic for a sponsor interview.
So that's what Dan joined me to talk about now.
So, you know, there's certain workloads where.
you don't want to expose what you're working on to Anthropic, for example, right?
You just can't, whether it's through like regulatory constraints or, you know, confidentiality
constraints.
Like if you're an exploit developer working on high-end stuff, you can't just throw that into
Anthropic.
It's just not the way the world works.
So you can use local models on very expensive hardware, or you can use services like tinfoil
S-H, which will help you to actually load some local models into their, you know, powerful
hardware and you can just pay to use that.
But how do you then go about guaranteeing that that stuff is actually private?
So that is the topic of this interview.
And also we talk about the work trailer bits did at looking at the way meta has done private
inference for WhatsApp.
So here's that interview.
Here's Dan Guido, who is going to kick off right now explaining what, how private inference
works.
Enjoy.
So these things, they operate in little trusted execution environments.
They're a little virtual machine inside of either a CPU or a GPU that is completely separated from any infrastructure that the cloud provider or the vendor can access.
And that's just fundamentally different from what OpenAI Anthropic and the rest of the frontier labs are doing right now.
They have a lot of legal agreements that say, you should trust us.
And they have lawyers standing over their back with knives like, okay, like we get it.
But if they get hacked or if somebody goes rogue internally or if they just get curious from a business standpoint and their internal rules change, they can look into all your stuff.
So this private inference stuff, it solves a lot of major problems that, for instance, companies like Trail of Bits have.
Like a lot of people trust me with their most sensitive intellectual property and they don't want me to give my data to a frontier lab where I'm taking it on faith that they're not looking.
So I think in 2026, this is a really big topic.
And I am kind of excited to see where it goes,
but also it's been a great thing for us to be around for us
since I think we have the skills to really help.
Now, okay, so all of this makes a lot of sense, right?
So you've explained why there is a need for this private inference stuff.
But, you know, as you also pointed out,
this means that you are not using a frontier model.
How big is the gap these days
between something like the latest and greatest anthropic model or open AI model, you know,
versus one of these local models.
Because as I understand it, the gap is pretty substantial.
So I'm guessing there's a whole bunch of stuff you just can't do when you're using like,
I know it sounds funny, but like a remotely hosted local model in one of these, you know,
in one of these private inference rigs, right?
So, you know, like what, yeah, I guess that's the question.
What's the gap like there?
Yeah.
So I think the way that we do it is we always prototype.
first on a Frontier Lab model.
Like, I want to make sure that I'm going to go explore and figure out, can I get this problem
solved with Opus, with Codex, with, you know, Gemini's latest model.
And then once we get that done, then we can start building evaluations.
We build some data sets around like, okay, can we reliably solve this problem?
Can we swap out some of the parts and figure out that we can trust one of the open source
models to do the same?
I think a lot of people, the experience they have with open source models is they go grab
Olamma, which is doing a CPU inference, not even MLX.
And they're doing it with models that fit inside their laptop GPUs or their laptop
CPUs.
And those things just don't compare to using a full, unquantized, you know, 230 billion parameter
model.
You're leaving so much performance on the table.
And the only place you're really going to be able to do that is with some inference
provider that's running it in the cloud.
Not even withstanding, like I think personally, the way that Trail
bits engineers use these sorts of things is that speed matters more than the model size.
Like if this thing works five tokens per second, you're still not going to use it even if it's
smarter. I would rather have something that does like 10,000 tokens per second and is completely
stupid because then I can scale out a huge agent workflow, run like 50,000 checks on whatever
it is that I'm working on, and get some trust out of the process. So the ability to run one of
these really high-end, latest generation, 200 billion parameter plus models in the cloud
with inference that another person can't peek into is pretty attractive.
And I think you can do it if you have the right e-vals.
Right.
So I guess what you're saying is the local models aren't that bad if you got the right hardware,
if you got hardware with the juice to run it.
Yeah, I think that's true.
You can get mileage out of these things.
I'm seeing that it's kind of like a six-month delay where Opus comes out today, six months
later the open source models catch up. But a lot of that is left to like, well, show me the proof.
And that's where Trail of Bits and other firms, we're building a lot of proprietary data sets that
give us that trust that, oh, yes, this open source model can do it. But that's work that you have to
do for yourself. You know, a lot of the open evaluation data sets aren't going to tell you that.
So what sort of tasks are we talking about that need to be sort of hidden away from Anthropic as
per, you know, customers,
Anthropical, Google, or Open AI, or whoever,
at a customer's request, right?
So what are the sort of gigs?
What sort of gigs are you using this sort of private inference infrastructure for?
Is it like code audits?
Is it like, what is it?
A lot of its code audits, a lot of its product security incident response.
I think the most sensitive engagements the Trail of its works on
are cases where a company got hacked and they are treading carefully
because there are illegal ramifications to the work that we are doing
for them. You know, there's customer liability issues. You know, they could be sued. There's lawyers
involved. So this is about what winds up in discovery, I guess. Like if someone's dropping a subpoena
on Anthropic and saying we want to see what this contractor did, blah, blah, blah, it's part of this
incident response. It's part of that too. But, you know, we audit a lot of sensitive code where,
like, we have some clients that are EDR vendors, right? And through working with those clients,
we are finding remotely expletable bugs in these EDR products,
and we don't want information about where those bugs are
to end up in some log at a frontier lab
where any curious person inside the company
might be able to get access to that log of prompts.
Yeah.
I mean, one of the reasons I was interested in having this conversation with you
is because, well, first of all, you mentioned it in your talk at Unprompted.
And secondly, I was chatting with someone who works very deep in exploit development
who was talking about their frustration that they can't use model.
like all of the anthropic models and whatever,
because you know, you can't expose those sort of exploits and vulnerabilities.
You just can't take the risk.
The stakes are too high when you're working in the national defense space at least.
I think, you know, it's a really funny mindset to have.
I remember conversations with similar people about Microsoft security research.
Like, let's just booby trap or like let's look through the logs on the documentation
for different APIs inside of Windows.
And from there, it's probably a data leak about, oh, like, this person's probably looking for bugs and this subsystem inside of Windows.
Like, there's one person on the entire universe that found this specific API.
And I think, you know, you can go, the turtles go way down.
Like, you can get a local, you would probably be motivated to get a completely local offline copy of everything that's hosted online just so that you don't expose that stuff to a third-party vendor.
But, I mean, from my perspective, we have client obligations to live up to.
I get a lot of sensitive intellectual property and I have to attest that I'm going to keep it confidential
and it's hard for me to do that with some of the way that AI works today.
So this is attractive for me.
Let's just go into that because you mentioned before that there is this sort of, you know,
cryptographically enforced separation between these virtual machines that are doing the inference.
Like how robust is that separation?
because I look, I understand actually from a contractual point of view
from a not having data lying around that could be subject to discovery later point of view,
like this is going to be good enough, right?
But I just sort of, I've always wondered about that, you know,
hardware separation on a, you know, separating VMs on a single piece of hardware thing.
Like, how robust is it?
I mean, so I think it can be very robust.
The problem is that you need to actually do the work to make it robust.
And that's the process where Trailobits has been publishing,
public reports of audits that we've done for these systems so that people gain that trust
because they're complicated machines like especially so you've had a look under the under the hood
of like tinfoil.sh for example are they a customer we've looked at tinfoil sh we've also looked at
WhatsApp what's app that's the next thing i wanted to talk to you about is you did a whole you did a
whole gig where you know and the and the reason WhatsApp is doing private inference it's funny man
it's so funny because my wife just asked me yesterday oh there's like now an AI button in WhatsApp
She's Brazilian.
All the Brazilians, that's their default app, right?
Is WhatsApp?
I think it was like, is this so if you get lonely, you can just talk to WhatsApp?
And I'm like, no, but, you know, a nice one.
Kind of, yeah.
But they're doing, so in order not to break the end-to-end encryption,
they're doing private inference so that, you know,
for some of the reasons that we spoke about earlier, right?
So the data isn't just leaking out into logs or whatever
or leaking out into, you know, queries and prompts of models
that can be recovered later.
So how did WhatsApp actually tackle this?
So WhatsApp is using a lot of commodity off-the-shelf systems.
Like you look at Apple's private compute.
They own the entire stack.
They build custom hardware, custom software.
They have a whole build chain that only exists at Apple.
And it still sucks.
But anyway.
It's, I mean, everybody's got the same problem.
So WhatsApp is using a lot of AMD stuff.
And, you know, there's challenges.
There's challenges where a lot of the AMD hardware is not robust.
against physical attacks.
It has trouble making reproducible builds.
You know, inside of a company like what Meta is really selling is they're saying that a company
with meta's resources, that all the resources at Meta will not allow us to peek into the operation
of this enclave.
And that is a really strong, really outrageous statement to make.
And it means that you actually have to think about this beyond just the secure enclave.
You have to think about physical access controls around the hardware.
You have to think through what you're actually at.
testing to how people check it, you know, what the transparent, what's available from the transparency
log. Like when Apple does private compute cloud stuff, they don't even give you the source code
to reproduce it. They just give you a binary. And they're like, oh, you know, obviously there's
some security researcher that's looking at every single new hash that comes out, downloading the binary,
and then inspecting it fully to make sure there's no backdoors. But, you know, so there's there's,
gaps and weaknesses about. I mean, you keep, you keep, you keep referencing Apple, right? And I'm guessing
the reason you do that is because that is one of the better implementations of like private inference,
right? So I'm guessing that's why you keep mentioning that in a discussion about WhatsApp.
Well, they're like they were the first to market with one. And I think when it came out,
it was sort of a big bang. It was a lot of interests and WhatsApp followed shortly after.
So these are the two big comparable public implementations that people have.
But I would say, you know, your question was, should we trust this? And why should we trust this?
what these systems do is beyond just having technical controls,
they force a company to think hard about how they're storing your data.
And like if you're not using a secure enclave,
if you're not doing these things in a private inference system,
then it's just sitting out there on the cloud somewhere.
And, you know, your opinions about what you should do with that data could change,
there might be thousands of DevOps engineers at your company that can access it.
But when it's inside of an enclave,
you like, you can't just like change your idea about how you're accessing data
you have to actually hack your own software.
You have to spend considerable effort in an unambiguous project to hack software that you made.
And the skills to do that are few and far between.
There are like 10 people in the whole company that could figure it out, if that.
And those 10 people probably are not motivated to do it.
They like, they know the work that went into it.
So these things, they provide like a real logical barrier inside the company as much as they provide a technical control.
Now, for anyone who's interested in looking at your full audit report for WhatsApp's private inference, you've actually published that.
Oh, yeah. I mean, these systems require people to trust them. They need to know what's actually running, and they want to...
It's a cryptographic system. You know, we publish cryptographic audits all the time, and this is one.
So we have our full report that was co-published with WhatsApp, and you can go dig through all the dirty details.
Yep, awesome. Well, look, Dan Guido, fantastic.
to chat to you. I mean, you know, it's a shortish interview, right? So we can only sort of start to
scratch the surface of what is a very interesting topic. But for people who want to go deeper,
I mean, I'm looking at the audit report here and, you know, it is not short. I mean, what
are we got here? About 118 pages of audit report for those of you who are really interested
in going deep on private inference. But, mate, great to chat to you. Always good to see you.
And I'll look forward to chatting to you again soon.
Thanks again, Patrick.
That was Dan Guido there from Trail of Bits. This
week's risky business sponsor. Big thanks to them for that. And that is it for this week's show.
I do hope you enjoyed it. I'll be back soon with more security news and analysis. But until
then, I've been Patrick Gray. Thanks for listening.
