Risky Business - Risky Business #835 -- Why the Fast16 malware is badass

Starting point is 00:00:00 Hey everyone and welcome to risky business. My name's Patrick Gray. We've got a fabulous show for you this week. Mr. Adam Bwalo is still traveling and not here with us at the moment. So we have a special guest co-host, which is Mr. Dmitri Alperovich, who will be joining me and James Wilson in just a moment to talk through the week's security news. Dmitri, of course, many, many years ago was the co-founder of CrowdStrike. But these days, he is the chairman.

Starting point is 00:00:33 of the Silverado Policy Accelerator, a Washington, D.C. based think tank. But he is still very much someone who pays attention to events in cyber. So he's graciously agreed to join us to do this week's show, which he does sometimes, and we sure do appreciate it.

Starting point is 00:00:49 This week's show is brought to you by Trail of Bits, which is a security engineering firm based in the United States. And Dan Guido will be joining us this week to talk through private inference. So he did a talk

Starting point is 00:01:03 and unprompted a while ago. And one of the interesting things he said was he wasn't really that keen on buying hardware for trailer bits to use to do private, you know, like local model, to run local models and do it privately, when you can actually just basically rent private inference hardware, you know, as a service, right? And I've thought, oh, that's an interesting topic for a conversation. So he's joining us this week and this week's sponsored of you to talk through how that works. There's like tinfoil SH is one of the services that they use.

Starting point is 00:01:32 But they also did some work looking at WhatsApp's private inference approach. And yeah, it's just generally an interesting topic. So that is coming up after this week's news, which starts now. And Dimitri, we're going to start with you. It's really funny, man, because we've got, you know, we've had guest co-hosts the last couple of weeks. And the topics just keep miraculously lining up with whoever we've scheduled to be our guest co-host. Like last week we had gruck. And there was all this grucky sort of stuff to talk about.

Starting point is 00:02:01 And this week, there's all this sort of very much Dmitri Alperovich. All the Dmitriy stuff. Yeah, exactly, right? It was meant to be. So we're going to start with this first item here, which is the State Department kicking up a huge stink about distillation attacks against, you know, frontier models in the United States. So they're saying that Deep Seeker are a bunch of dirty thieving scoundrels. And it will not stand.

Starting point is 00:02:26 And yeah, what's the go here? Well, not just the State Department. This is really an all-of-government approach. Michael Kratzios, who heads the Office of Science and Technology at the White House, also put out a memo to all the executive agencies that they release publicly. Look, this is a big deal, and I'm glad that the government is now paying attention to this. I've talked to a number of frontier companies and their researchers. They actually believe that most of the progress that you're seeing currently from the main Chinese AI models,

Starting point is 00:02:55 Kimi, Deep Sea, Kwen, is actually coming from two things. It's coming from distillation. of US models and obviously smuggling of or even buying legitimately of Nvidia chips that you use to do this post-training. And by the way, post-training. Hey, hey, hey, whoa, whoa, hey, how dare you suggest that? I mean, the Chinese trained these things on a bunch of old like 486s they had lying around in a gut.

Starting point is 00:03:25 They trained them on a potato, Dimitri. What are you talking about? That's crazy talk. As we know, potatoes are fantastic at matrix multiplications, so they're just perfect for training this stuff. Yeah, no, look, I mean, as we know, even the Chinese AI researchers are saying that they're training out on video chips. Now they're claiming that they're buying them legitimately, but this is not a secret.

Starting point is 00:03:46 But look, what's actually really important is that if you talk to frontier AI companies here in the U.S., they will tell you that post-training is becoming really, really important and is driving most of the improvements in models today. So it makes sense that if you steal a bunch of reasoning traits from our models and then use them for post-training, that the Chinese AI models would become quite good. Still behind us, but behind in a matter of months, which makes sense, right? New Opus, you know, 4.7 comes out.

Starting point is 00:04:18 They do a bunch of distillation attacks on it, and then three to six months later, oh, my God, a new Kimi model is out. That's really good. I wonder where that came from. Yeah, yeah. I mean, we also saw the news, like we got a bit of insight into how some of this chip smuggling is happening, right? We're the super micro co-founder, you know, personally assisting in like using a hair dryer to remove labels from hardware, to stick them on other boxes and whatever to move this stuff into China. I feel like the restrictions on chips are only ever going to be so successful and the best you're ever going to do is slow the Chinese down. Now, you did some testimony to the US government.

Starting point is 00:04:56 We'll get into that in a minute on this topic. Your testimony was on this topic. But James, you've actually recorded and will publish today a 75-minute deep-dive solo podcast on distillation attacks. And I think the conclusion that you arrived at when going through this exercise was that really the moat, America's moat, America's edge in the AI war really does come down to compute. It comes down to who has the most hardware. Yeah, absolutely. It was a really interesting deep thought that I did, Pat, because it surfaced that. It also surfaced that distillation is not just for attackers.

Starting point is 00:05:34 This is a legitimate technique that's actually used in training. But for an adversary, it's such a, like a delicious economic sweet spot. You can take these freely available open-weight models, and with a relatively small dataset compared to the massive data set you'd need for an actual frontier model to be trained, you end up with some really high quality results and results on smaller models as well so that does help with the fact that when you're limited in your computer and your RAM

Starting point is 00:06:03 you can still run these things but yes everything's converging everything's becoming commoditized but the thing that still remains the huge limiting factor is getting the access to the GPUs and increasingly the RAM that's needed to host these models for the inference now you know when some you know we describe people

Starting point is 00:06:20 as doveish or hawkish I sometimes describe Dimitri as being less of a hawk and more of a teradactyl when it comes to topics like this, you gave some testimony recently. Was it a senator, a congressional committee? It was the House Select Committee on the Chinese Commerce Party. Yeah, okay. So I'm guessing you went there and said, hey, it's fine.

Starting point is 00:06:38 We should sell the Chinese our hardware. They're our friends, right? Was that basically what you said to them? Look, I basically said that, you know, we're in an AI race. No one is disputing that. The last time we were in a great power race was a space race. and selling chips to China is akin to selling rockets to the Soviet Union to get to the moon faster, right? It just makes no sense whatsoever.

Starting point is 00:06:59 And Chairman Moulinar actually asked me, well, what about this argument that NVIDIA uses that they'll get addicted to our chips? And I said, you know, Chairman, with all due respect, chips ain't cocaine. They're not that addictive. And look, you know, the top two frontier eye models right now, Claude and Gem and I were not even trained on Nvidia chips. They moved off of Nvidia very, very quickly. I'm told it was a matter of about 10 people in a couple of months to actually pour. to TPUs in the case of Google, traniums in the case of Claude. So this is not something that you can actually get them to be addicted to.

Starting point is 00:07:32 And the talking points from folks like Jensen and others is, well, wait, we want China to use the American AI stack. And my view is American AI tech stack is not chips. You know, if you want to talk about the tech stack, it's chips in American cloud providers running American models. That's the tech stack. But selling China chips so that they can steal reasoning traits from American cloud models, AI models, and then use it to train their own models that will be running Huawei data centers.

Starting point is 00:08:02 That's not winning. And I actually had a little bit of a debate with Shriaram, who is the now, I guess, acting AI czar, now that David Sachs has moved on in the White House on X, where I said, imagine the situation where we actually get China addicted to Nvidia chips, which is impossible, but let me just suppose that they would. And they train the very, very best models. And the entire world is running on Kimmy or Deep Seek and applications are built on top of them,

Starting point is 00:08:31 but they're all running on a video chips. Did we win? Like, is that what victory looks like? Of course not. It's all about the models, not the chips. Yeah, I think also the argument that, well, if you restrict chips from China, they're just going to start developing their own.

Starting point is 00:08:47 I mean, that's a heavy lift. I think for people who are really deep in the semiconductor space, like that is a heavy lift. Well, by the way, they're doing that anyway. And most of them, this is a story that should be getting more publicity, but it's not. So they have the Huawei Ascent chips, right, that are not even equivalent to the black wells in any shape or form. It's similar to the H20 from Nvidia, but not even as good as that. But most of the ascents chips were not made in China. They were actually made by TSM, who was kind of conned by a shell company to build them for Huawei.

Starting point is 00:09:24 And of course, that's now hopefully been stopped. So they can't even manufacture these chips when they design them. By the way, one other point on distillation to come back to that topic, this is why, and many of the listeners probably have noticed that, when you go to Claude, when you go to Chad GPT, you no longer see the chain of thoughts details that you used to see on these particularly deep research type of experiments. This is why, because they know that their models are being distilled. If you show all that information, it helps a lot for China to actually catch up quickly. Well, James and I were chatting about that way. And your opinion on that, James, was it helps a little more than a lot, right? Yeah, it's, I mean, look, it makes sense why they've done it, but there is just so many other signals that are still going to be there to distill these models. And so, yeah, it helps a little,

Starting point is 00:10:13 but it certainly hasn't closed the door on distillation. And just to really bring into focus that point Dimitri made about the, are the chips the thing that people get addicted to? One of the things I covered in the pod that's coming out today is that the actual act of distilling a model is about 10 lines of Python code. Like this is so well embedded into frameworks and existing tooling that you could imagine that there is only probably one line in that 10 line Python script that would have to change to move from an Nvidia chip to a GPU

Starting point is 00:10:42 from anywhere else. And so it's just not, we're thinking about this wrong. We're thinking about it as there's some sort of incredible level of engineering that directly hard tires the training process and inference process to a particular hardware vendor. But we're not, it's all been modularized and abstracted away. And so I completely agree. It's easy to move.

Starting point is 00:11:02 Now, James, look, there is some validity to the fact that PiTorch, one of the common frameworks, is more optimized for Kuda and Nvidia. That is true. But, you know, TPU code has been. in there for many years now. More optimizations are being done. And look, every single frontier company, every single hyperscalers building

Starting point is 00:11:20 their own chips, right? Maya from Microsoft, Facebook is building their own chips. You know, Open the I is claiming that they're going to do their own chips. Obviously, TPUs from Google. So it's not like Nvidia has a monopoly on this. I mean, this is matrix multiplication. Sure, you do it in a very parallelized way.

Starting point is 00:11:36 You integrate high band memory. But, like, I actually think, you know, you guys may correct me on this, but like, building a TPU is actually a lot easier than building a CPU, where you have to do all the branch prediction stuff and like deep cash integration, all that stuff. Here you're just optimizing for really, really fast, really huge matrix multiplication. Yeah, that is exactly right.

Starting point is 00:12:02 It is simpler because of the tasks that do, kind of like the difference between the complex instruction set versus risk in the reduced instruction set, and this is reduced even further because it really just does come down to Matt Moll. But there's other challenges around just the memory band with parallelization in computer science has always been one of those really difficult things alongside naming things. But yeah, the sort of croxier point is right there. Now, just for everyone listening at home, this is what happens when you ask Dimitri Alperovich,

Starting point is 00:12:30 hey man, what do you think about chips? A little bit of a passion project. Yeah, we're going to kick on to the next part of this discussion, which is you've mentioned the Kimi model a couple times. Now this is interesting because like James and I, we actually interviewed Nicholas Carlini from Anthropic. He's the security researcher at Anthropic and we published that into our risky business features feed because there's no room in the main feed for this. So if you want to hear these sort of podcasts listeners, please go and subscribe to risky business features. It is a completely different podcast feed.

Starting point is 00:13:01 And this is also where James's solo pod all about the distillation stuff is going. Now, the reason we're going to talk about Kimmy just now is it's, it's, it's really. relevant to a bunch of stuff that we talked about with Nicholas in that uh in in last week's uh podcast james chiefly you know he was talking about how mythos might be big and scary now but like this sort of capability will probably be in a local a model you can run locally on your laptop in a year from now and i guess the reason i wanted to talk about kimmie now is because they've done some really interesting stuff in terms of making the model very efficient at performing certain tasks by being able to sort of selectively load parts of it, right? Which makes Nicholas Carlini's claim about

Starting point is 00:13:46 what these local models might be capable of, either on your own hardware or on something like tinfoil-sh, which we're chatting about with Dan Guido later. The point is these local models are going to get a lot more powerful. Can you just tell us a little bit about the innovation here? Because the Chinese, we saw it with Deepseek and now we're seeing it with Kimi. They are innovating in their own ways in AI. And this is a good example of that. Yeah, 100%. So let's cover Kimi and also Quinn because these are the two sort of leading models out there in the open weight space. Kimi's interesting because they've just come out with K2.6 and it follows in the pattern of Deepseek, which uses this thing called mixture of experts. So Kimi K2.6 is a one trillion

Starting point is 00:14:25 parameter model, which is vastly too big to even fit or to run efficiently on sort of a laptop or even, you know, the most beefed up Mac Studio. But you can see the trajectory that they're heading towards through these innovations because although it's a one trillion parameter model, only about 32 billion of those parameters is active at any one time. And so if this continues in this direction, we're going to see this very nice balance between the model is huge in terms of the overall capabilities it's got, but it's able to then zero in on certain parts of the model, activate those at that point in time to keep inference far more efficient. The other end of the spectrum that's interesting here is Quen, which is another Chinese.

Starting point is 00:15:07 model, this is from Alibaba. And they focus more on models that are actually manageable size for local deployments. So they've got models available at the moment that are, I think there's one that's 27 billion parameters, one that's 35 billion parameters. And the benchmarks basically say these are approaching those frontier model capabilities in agentic coding and a few other tasks. And to put that in perspective, like I've got a Mac Studio M2, 32 gig of RAM, I can run a Quen 32 or 27 billion parameter model on that locally. The catch here is, though, these models are still incredibly slow

Starting point is 00:15:45 compared to a hosted anthropic model. Like you're talking about 400 to 1,000 seconds for something that takes 20 seconds with Claude, and they're also an order of magnitude more inefficient with their token use. But sort of to what Dimitri was saying there, this is optimization, right? We've cleared the hurdle of this as possible,

Starting point is 00:16:02 and now we're just in optimization. And there's one thing we know about optimization. It just keeps getting better and better over time. So yes, local running models with frontier capabilities coming real soon and kind already here. Yeah, so I think, I mean, I think that was really an interesting part of that interview. Dmitri, you know, you and I, we've known each other a long time. We're good friends, right? Like our friendship, I best describe it as an argument that started in about 2019 and he's still continuing. You've been, you're really an AI pumper, right? Like you've been big on AI,

Starting point is 00:16:34 very early, way ahead of me, you know, I think largely you've been proven right. That said, with Mythos, you think some of the hype around Mythos is not entirely justified. So I guess, you know, the idea that we might start seeing similar capabilities in local models might not be that frightening to you. Just a quick take from you on Mythos and what you think it means, because it's been such a big topic and we've got you here. Well, look, first of all, and I think Nicholas on your podcast talked really well about this, you know, the big difference with Mythos is the exploit generation, right?

Starting point is 00:17:07 It is better at finding vulnerabilities, but look, other models have been able to find vulnerabilities as well, right? It's not like we woke up today and Mythos suddenly can find all these vulnerabilities that no one else could discover. It's incrementally better, and, you know, my guess is probably, and I haven't played with it, but I've talked to many people that have, maybe 30% better than, you know, the Opus 4.7, for example. But it is able to write exploits.

Starting point is 00:17:32 But from what I've heard, even that capability is hit or miss. Like simpler exploits, sure, but anytime you need to sort of like, you know, do an iPhone zero-day that chains lots of things together, that becomes much, much harder. It is able to do it. You know, what I hear from my friends Anthropic is that if you're spending the equivalent of $100,000 dollars on tokens, sure, you can probably generate some pretty cool exploits. I mean, very few people, companies can afford that much, right? So on this level of experimentation and driving it that deep, so mythos is incredibly expensive,

Starting point is 00:18:12 which is yet another reason why it won't be available publicly. And by the way, you know, that really helps with distillation. If Chinese cannot get their hands on this model. We'll see how good Kimi is in six months and whether it can actually catch up with War of Mythos is today. Yeah, I thought actually Carlini's argument that the cost, thing is kind of irrelevant was the more compelling part of the interview where he's like look you know it's getting cheaper some of the dollar figures around the tokens required to do these things have been misreported so I thought that that part was

Starting point is 00:18:42 pretty interesting but yeah anyway it's an interesting thing right because no one quite knows where all of this is a hundred percent going but look speaking of mythos there was a big story it was breaking kind of as we were preparing last week's show I didn't think it was that important to cover but it's been everywhere which is a group of people in a discord figured out how to get access to mythos when they weren't on the approved list of people who could use it. The reason I thought this story was funny

Starting point is 00:19:07 is completely different, I'm guessing, to the reason most people thought this story was interesting. The reason I thought it was interesting is Anthropic totally hung themselves on their own messaging here because they come out and talk about how dangerous it is, right? So obviously when you find out someone got access to it who wasn't supposed to,

Starting point is 00:19:23 like people are going to think that's a big deal when honestly it's not. Was that your take here, James? Yeah, 100%. It's just funny reading this as well. Like a bunch of shady discord people, and they basically used a data breach at Merckor to sort of rifle through some logs and sort of found the URL patterns. Well, actually not even the URL. I think that bit's been completely misreported. It's looking at the payload of the message transcript that goes back and forth with Anthropic. And they just basically guessed, hey, you know, the pattern for naming a model is pretty standard with Anthropic. It's the name.

Starting point is 00:19:55 name, it's the version, it's the date, and they must have just put a few things together and determined, oh, this is what Mythos is called and the number and had access to it. There's also a little dot, dot, dot on the story around how the other part of this that made it possible is apparently they had, or one of them was working with an anthropic contractor. I do wonder if that's soon to be former anthropic contractor, but they had some sort of of creds that probably gave them the ability to access models that were, you know, hosted in Anthropic, but not yet publicly available. But overall, yeah, I agree with you.

Starting point is 00:20:27 It's like if the model's that dangerous, you kind of think there would have been a few more safeguards or things around it. Yeah, but it's like marketing meeting reality, right? And it's just a funny old thing. You know, they've built a computer god in a box but can't do ACLs. You know, like it is just such a classic in the genre of Infosex stories. Moving on real quick. We're going to talk quickly about, I don't think this is going to be quick. actually, I lie.

Starting point is 00:20:55 We're going to talk now about Fast 16, which is some malware. It was, I think, Centaur 1 research here. It was Vittalikamluk, which is a name I haven't heard in a while. He's an ex-Kaspersky guy. And Jags, Juan Andreas Guerrero Sard, which I guess is why he goes by Jags a lot of the time. Sorry for massacring your name, dude. They have done a bit of research into this malware that kind of connects to shadow brokers in an interesting way.

Starting point is 00:21:22 Dimitri, tell us about Fast 16. Well, this is, I guess, a 10-year-old mystery that they were focused on solving. So people may recall the Shadowbrok's release back 10 years ago, allegedly from the NSA, and it contained, in addition to a bunch of malware and other things, it contained this file that was called Territorial Dispute, which basically had a list of various coronal drivers and other identifiers that you would check when you land on a box to see if someone else is already there. So it could be like other five eyes malware, other malware you know about,

Starting point is 00:21:59 and you basically wanted to make sure there's no kind of friendly fire. If you're also on there and they get detected, then suddenly you get detected as well. So you can kind of decide what you want to do from a deconfliction perspective at that point. And there was one line in that file that just said, Fast 16 driver, Colonel Driver, move on, nothing to see here. And that was it, right?

Starting point is 00:22:20 And no one knew, you know, what APT is this? Like, you know, why is it saying that? That seems like really interesting thing. And Jags and Vitaly have been looking for this fast 16, elusive fast 16 driver for years now. And the way they found it is really, really interesting. They actually use a technique from this, another Kaspersky researcher called Sergei Meneyev,

Starting point is 00:22:42 who actually passed recently, I think last month, who was like this unbelievable APT hunter who had all these ideas of how you look through a huge repository of data, telemetry, or malware, to find interesting stuff. And what they did is they started looking at binaries and integrated lure interpreter. Because they said, wait a second, we see this duku malware and other malware that integrates Lua. Let us look at all all binaries from like 2000 to 2010 time frame that integrate Lua interpreters, right? And then weed out all the legitimate stuff and see what else we can find.

Starting point is 00:23:18 And they stumbled on this file that eventually led them to this Fast 16 kernel driver that basically was compiled in 2005 based on its compilation date. And it's a really interesting malware because it spreads itself via network shares. And then it hooks into Windows file read operation so that it can look for new executables that get loaded. D.EXE files. and then we'll patch specific XC files and patch particularly mathematical calculations code in those XC files. And this was another sleuthing exercise by Jax and Vatli to figure out

Starting point is 00:23:57 what are the XCs that it's actually patching. And their top candidates that they identified were this L.S. Dina Suite, which is a powerful engineering simulation software that's used to analyze how materials and structures behave in various extreme conditions, like, let's say, nuclear explosions. and lo and behold, another think tank has actually published information that Iran has used this software in its nuclear program and to do its modeling. And then there was another program for open source water modeling and another Chinese

Starting point is 00:24:28 construction and design software. And basically, when these executables load, it will patch up its math calculation code to basically introduce subtle errors in it, right, so that the modeling would produce wrong results, which would be really, really difficult to detect. So another kind of, you know, Stuxnet-like program that, you know, supposedly was hitting Iran back in the early 2000s. One thing I found really interesting, I'll give Jacks credit for it because I was talking to him about it, is he said, you know, this is a type of model that you can use in terms

Starting point is 00:25:03 of tactics to actually target Chinese companies, AI companies, right? If they're doing distillations, well, if you can introduce subtle errors and Matt operations when you're doing training, boy, you can really ruin those models. So I thought that was an interesting thought experiment to see if, you know, this might be happening even today. Well, that'd be interesting because that would be virtually impossible to detect, right? It's one thing to have an error showing up in a spreadsheet and then cross-referencing that. But the models are so dense and introspectable. Like, how would you even know that your weights have been messed with through this process? So, funnily enough, I remember like around the time that this malware was out

Starting point is 00:25:41 there doing the thing. There was a lot of talk about how you shouldn't forget about the eye in the CIA triad, the confidentiality, integrity and availability triad. And, you know, I sat through talks from people who were like adjacent to the intelligence community talking about how, you know, someone could write malware that would subtly impact like spreadsheets and things like that and cause drama. And like, this was a big concern back then. And I guess I'm realizing the reason it was a big concern among people in the Five Eyes Alliance

Starting point is 00:26:11 back then is because you were doing it to other people, which kind of makes sense. There's a bit of a bit of projection going on there. But it's a look, it's a fascinating story. And, you know, it makes me think, is this cyber war? You know, I just, I wonder, is that what we're calling cyber war? Now, look, moving from the I in the CIA triad to the A in the CIA triad, we got some analysis here again from Kasperski looking at something they're calling Lotus Wiper, which is apparently the wiper that went after the state-owned petroleum company in Venezuela.

Starting point is 00:26:47 Like back in December, there was a ransomware attack that kind of looked like a wiper attack. The Venezuelans blamed the Americans. We actually published a podcast in which we said that was a credible accusation. It's looking more and more credible, James. I think you took a look at this and by the process of elimination, you're thinking, yeah, what else could it be than a U.S. written wiper? Yeah, that's right. But between the Kaspersi write-up and Kim Zeta's write-up, which had a bit more detail, no one seems to be comfortable saying, aha, this proves that it was a US operation or an U.S. ally operation.

Starting point is 00:27:19 But it's very much a Sherlock Holmes. The dog's not barking kind of thing. This is a highly targeted malware. It had no financial incentive. There was no ransom or extortion element to it. It just straight up wiped things, and it destroyed data really well. And there was a compiled version of it with the timestamps that match up, and it had a hard-coded string point. to the PDVSA.

Starting point is 00:27:41 So, you know, like, what's your saying? But it was Venezuelan dissidents, clearly. Clearly, walks like a duck, talks like a dark, must be a Venezuelan dissident. Yeah, okay. Yeah, yeah. Moving on, we got one from Andy Greenberg and Matt Bird, just over at Wired,

Starting point is 00:27:55 just looking at how the North Koreans are using AI to, like, automate a lot of their campaigns. I mean, this is something actually, James, you and I spoke about recently, not in the podcast, which is we saw, there was some incident, and they talked about how it was an AI, and able to attack. And it's like, it's not going to be too long before putting that in a

Starting point is 00:28:15 statement about some sort of data breach is going to be completely pointless because they're all going to be AI-assisted, you know, hacks. I sort of feel like it's going to become the new sophisticated attacker line that goes into a press release. The new sophisticated attacker is like AI-assisted. What are the nuts and bolts here, though, real quick on this campaign that they've written up? Yeah, look, the campaign itself is just run-of-the-mill AI stuff, you know, and of course everyone's using it, and it did its usual thing of left some database credentials exposed, et cetera, et cetera, the same all thing that always gets these vibe-coded apps popped, and that allowed folks to go and rifle through and just see the tradecraft

Starting point is 00:28:53 and the process behind this. What I did find interesting about it is they, I think they've kind of stumbled on an interesting vector here that allows them to get away with otherwise sloppy vibe-coded tools to do this, and that is they go after their targets for this. this campaign here by posting as job ads. Now, we've all been through this. You get that job ad, you get the call from the recruiter, you want to start going in the recruitment process. You don't do that on your corporate laptop, right? You're doing it on your personal laptop, which doesn't have the EDR, doesn't have all those protections. So that's how they managed to get these otherwise pretty sloppy exploits onto a device. And then, you know, these days, those devices have access to the

Starting point is 00:29:32 things that they then connects will trade and get their crypto wallets and keys out of. So it's just kind of interesting that that interview aspect, this contagious interviewer camp and as they call it, hits the sweet spot of quite an unprotected device. Yeah, they're not getting snapped by a colonel driver that Dimitri helped write 15 years ago, basically. I mean, we saw a whole thing on this on Twitter recently where someone walked through how they nearly got done, which was a LinkedIn account of some venture capitalists that they knew got compromised, and then they were like reaching out to them saying, hey, we should sink up. It's been a while and they did the whole calendar invite thing and wait it a week and then it's like oh yeah you're going to need to

Starting point is 00:30:09 download this binary to join the call. Dimitri you had something you wanted to add there. Well, this just proves to me yet again that I said this 15 years ago first that North Koreans are by far, far the most creative actor, right? They pioneer techniques that then get adopted by others. They were the first ones back in like 2009 to do disruption at scale. They were the first ones to do hack and leak with Sony. They did the Bangladesh bank heist and, you. the IT workers and everything else. So it's not surprising to me at all that they are the first ones to really popularize the use of AI.

Starting point is 00:30:42 And of course, they've been doing this with the IT workers and passing interviews for a while. Now they're using it for coding. What was interesting, I thought is that the way they expel guys identify that they use cursor and Chad GPT is by comments because they were looking at the scripts and they saw, wait a second, there's a ton of comments here describing every step of this operation.

Starting point is 00:30:59 That looks suspiciously like a vibe coded piece of software. Yeah, in English too. So, yeah, kind of obvious. That is funny. Now, this next one I wanted to talk about with you, Dmitri, because I feel like I'm taking crazy pills, because this story, it just keeps rattling around, right? There's been some sort of bug in Cisco ASA,

Starting point is 00:31:20 and a threat actor, I think it was Chinese, was putting some malware onto these Cisco's. Now, Cicca came along and said, hey, you've got to patch your Ciscoes. Okay, yep, cool. And people have patched their Ciscoes. US government have patched their Cisco's and now they're putting out advisories saying, hey, they're warning you that this fire starter backdoor malware survives patching.

Starting point is 00:31:44 And I'm thinking what malware doesn't survive patching? That's not how you do incident response. I don't get this. It's it's mad. Why was there any expectation that patching a compromised Cisco ASA device was going to get you anywhere in terms of cleaning it up? What is going on? Well, look, you know, in fairness, right, there's no EDR running on these devices. So people kind of assume that they're clean and the only thing they have to do is patch them

Starting point is 00:32:15 because usually vulnerabilities don't actually exploit the device itself and to land malware on the device, but kind of use it as a pass-through to inside the network. In this particular case, obviously malware was installed that had persistence. So it would not be removed by patch. And look, you know, we were talking about this before the shift. show, like, you have to reflash your firmware in these devices. You have to assume compromise and as painful as it is to take firewalls offline, hopefully have redundancy. You absolutely have to do this. Just patching it is not sufficient.

Starting point is 00:32:47 I just, look, I just find it a bit maddening saying headlines like US, UK authorities warn that fire starter backdoor malware survives patching. I just think, what? Like, okay. Anyway, okay, we're going to move on to the next story now and Citizen Lab has published report about a threat actor that is tracking its targets. They use the physically tracking, you know, tracking the physical location of their targets by exploiting vulnerabilities in SS7 and diameter, which is another like SS7 like protocol that isn't quite as awful.

Starting point is 00:33:21 But it's the sort of stuff that you expect to be seeing happening on Telcos. Dimitra, we'll get you in on this in a moment, but first up, James, you've had a look at this one as well. this report has resulted in the telco regulator in the UK shutting down the availability of like something called global titles which are used in these campaigns. Can you explain to us what a global title is and how it's used to stage these sort of attacks? I can. I can use my very fresh knowledge of this because after reading the Citizen Lab report I had to go and read up on a

Starting point is 00:33:56 whole lot of topics. The write-up is exquisite. It's so good. But a global title is essentially it's like a phone number style address with some metadata attached to it and the complex and convoluted ways that these telco networks work, there's kind of a resolution process. It's almost like you can think of it's a form of like DNS resolving down to the IP address and then, you know, when IP address has to use ARP to resolve to the MAC address, similar kind of multi-step resolution process for these things. But the point of using the global title here, I think, is actually to exploit that resolution, right? So they can essentially craft a global title coming from I think there was only three or so candidate telco networks they were trying, but

Starting point is 00:34:36 due to the way that those global title addresses get resolved, they're able to sort of kind of like poke at various points of the network and see how is this getting resolved, where is that traffic go to? And of course, the problem with these networks is they're insecure by default, really. It's built for interoperability between the telcos, not a high degree of security. And so people have bolted on firewalls to prevent a lot of this malicious traffic going back and forth. But what's interesting in this example here is the attacker is actually using lots of different steps where they basically say, I'll try this attack, see how it resolves,

Starting point is 00:35:12 where does it get to, who blocked it? Okay, let's try something a little bit different. How does that get routed? Who blocked that? And it's like they're just incrementally building more and more knowledge about where the weaknesses are, because then they can funnel all the traffic they want to through that weak point in the network to get to the end subscriber that they're trying us avail. Now when it comes to these global titles, who is supposed to have access to a global title? I'm guessing that's a telco. Yeah, it's a telco and they lease them out for other purposes as well. And that's what the UK has particularly. Of course, right. Like, yeah, of course, they lease them out to someone for some insane reason because it's the telco industry. Dmitri,

Starting point is 00:35:45 you are actually an investor in Cape, which occasionally sponsors the risky business podcast. And that's because you introduced them to us when you invested in them. I'm guessing you invested in them for kind of this reason because they don't even allow SS7, it's diameter only. And I'm guessing even then they're a little bit careful about the way that traffic is handled on their network. I mean, do you have some thoughts here? Yeah, absolutely. Look, location traffic, which is one of the ways that they were using this SS7 and diameter

Starting point is 00:36:14 compromises for, is a huge issue, right? It's a safety, personal safety issue. You know, executives care about it. Government folks care about it. So trying to prevent that at a telco level, really, really important. And that's one of the reasons why Cape is becoming more and more popular across the board. What I found interesting in this write-up, and again, kudos to the citizen lab guys, really phenomenal work, is there's actually two types of attackers they identified. One that was just using SS7 in diameter to track locations.

Starting point is 00:36:43 Another, that actually had an really interesting SMS exploit where they would send a binary SMS message to a device that contained a SIM card, a SIM jacker, basically, exploit that extracted. location info and would basically have the SIM jacker continuously pin you with the current location. So a lot of activity that's happening that most people aren't even aware of. Obviously, we have the on-device compromises that people are becoming more aware of, waterhole attacks and other things on Apple and Android's. But this is something that doesn't even impact your phone. It's down at the carry level. So, you know, without things like this report,

Starting point is 00:37:20 we would really not know the extent of which this is happening around the world today. And just the fact that that malicious SMS can include those codes to make those actions happen, boy, it just makes you nervous about carrying this thing in your pocket when you know that that's actually, you know, part of the protocol that that can be done. It's, yeah, it's incredible read. Turns out mixing data and code is bad in the mobile ecosystem as well. Who would have thought? So let's spend hundreds of billions of dollars on CAPEX doing exactly that, Dimitri,

Starting point is 00:37:48 with AI models, but anyway. And just what you were talking about about it, but you know, feeling nervous about it being in your pocket, James. Like I just still remember something like eight, nine years ago or something, meeting up with Joseph Cox, now the publisher and, you know, head of one of the, one of the partners in 404 media. And he did not have a phone. He used a iPod touch as his,

Starting point is 00:38:08 as his communication device for exactly that reason. He's like, I do not want to connect to any of these networks. Well, by the way, if you talk to some of these citizen lab researchers and how paranoid they are, they don't have phones. They, you know, do a hotspot to a device via Wi-Fi, like they go through extraordinary efforts to try not have to get that. Yeah, yeah. Well, I mean, that's the thing.

Starting point is 00:38:28 Like, I don't want to live that way. I just prefer not to think about this, frankly. Moving on, and real quick, the US has sanctioned a Cambodian senator because this senator is essentially, you know, renting compounds to scam operators. I mean, this is what I've been saying would be the case where you'd wind up with a form of state capture by an illicit industry because it's, you know, something like 40% of the Cambodian GDP or whatever. comes from these scams. So of course the government is going to essentially wind up

Starting point is 00:38:58 operating as an enabler of these sorts of scams and that seems to be what's happening. What else we got? We got Versal have put out another statement talking about how some other company other customer accounts got knocked over by attackers because this is what happens right. Your hire Mandiant. They come in they start kicking over rocks and they cut and they start finding other stuff and that's what's happened here. Although it doesn't look like what they found here was a necessarily a breach of Versel systems. But they have have through the process of doing incident response on the breach that they had discovered that,

Starting point is 00:39:29 well, hey, here's some access to some customer accounts. It looks kind of funny. I mean, that's basically it, right James? Yeah, that's exactly it. It's the same thing. It's these environment variables that weren't marked as sensitive, somehow being enumerated. But again, small number of accounts. It's not going to lead to much, I don't think.

Starting point is 00:39:45 And I stand by what I said last week. I think Vesel did a really good job of just enacting three or four remediation corrective steps here that, you know, all still remain as good mitigation strategies even for the new things they've found. And in particular, all the environment variables now are sensitive by default. So that completely alleviates this problem. We had a complaint, too, that we were going too easy on Vassel on last week's show because they were like, they didn't send us remediation things in time or whatever.

Starting point is 00:40:15 I think, look, the reason we're praising Vassel is because they didn't communicate ahead of, like, them knowing stuff, right? Which is how you get into trouble when you're communicating about an incident. So I think we're going to stand by that. I mean, Dimitri, have you even followed this one? Not too closely, but yeah, I mean, look, major hosting provider, right? Of course, they're going to get popped by a variety of different actors. Of course, people want to get access to their information.

Starting point is 00:40:38 So not at all surprising. Yeah, yeah. Meanwhile, checkmarks. We spoke like a couple weeks ago about a very limited intrusion that, you know, into like a checkmarks open source thing that not many people use, no big deal. There was sort of a supply chain thing. About that, James.

Starting point is 00:40:55 Yes, about that. Yeah, it was originally the checkmarks kicks, which is just basically an infrastructure as code scanner. So we weren't, yeah, it was interesting, but it was all team PCP and wrapped up in that. Then along comes April 22. There's a new set of compromise packages that are released, and this went beyond kicks.

Starting point is 00:41:13 This was some of the AST GitHub actions, their VS code extension, their developer assist extension. But the thing that makes you a little bit nervous, about the state of checkmarks is even in their updated advisory this week. They said a second wave of malicious checkmarks artifacts are published, and this was April 22nd, indicating continued or renewed attacker access. So clearly not quite sure how the attackers still have access, but suffice to say they do. And that resulted then in a whole bunch of data getting exfiltrated out of their GitHub repos and Lapsus claimed credit for that. So

Starting point is 00:41:51 So it's a bad time for check marks. Yeah, it is. Now, quickly, too, we spoke about the issue where the FBI were able to extract someone's signal messages out of some cache of their display notifications for those messages, for their push notifications. And, you know, you said, having worked at Apple previously, that shouldn't have been possible. Apple has now patched that, but you have had a look at Apple's blog here and you think that it's not so much that the push notifications were cashed, you think they've actually fixed a different bug?

Starting point is 00:42:25 So the notification database is part of a thing in iOS called Springboard, which is actually a very, very well-engineered and quite a hardened part of the operating system because it does handle everything from arbitrating the apps that are running. It's the app launcher, it's the notification center, control center, et cetera. So I was a little bit surprised that a bug would creep into there

Starting point is 00:42:47 that would have been essentially something along the lines of when iOS posts a notification saying an app is deleted, a bug was causing that the history of notifications to not get purged. That's what we thought the bug was, but still I was a bit like, that's a bad bug to end up in Springboard or one of those components. Then along comes the fix, and the security bulletin from Apple has this line in it. It says, impact notifications marked for deletion could be unexpectedly retained on device.

Starting point is 00:43:17 We knew that. description. A logging issue was addressed with improved data redaction. And this is where I can use a little bit of my internal knowledge of Apple to tell you that Apple has systems for collecting a whole lot of diagnostics. When I worked there, we'd ship a feature and we would make sure that it had things like message tracer keys or AWD diagnostics in there so that once this software's out there, we could pull these metrics and we could see, you know, how often is someone using, you know, mail's feature for threading, et cetera. I think what's happened here is someone accidentally put one of those traces into a log

Starting point is 00:43:55 to say, log when message has been deleted out of the notifications database and included the payload in there. That gets into that logging data. And it fits also with this reported thing around the notification logs were cashed for a month. That's generally the pool time for those. diagnostics. They hang around, they batch up, and then they get shipped off. So yeah, I think it was actually logging and telemetry that got us here. Yeah, you know, one thing that I found really interesting is, you know, as all of your listeners know, you have this model now where a patch comes out for a piece

Starting point is 00:44:31 of software and everyone is reverse engineering that patch to kind of figure out what the vulnerability was, write an exploit and kind of use it as a one-day exploit to compromise anyone who hasn't patch. Now it seems like the model is to actually reverse engineer, or really read indictment documents to identify vulnerabilities that you can then use to go patch, right? So it tells you that DOJ probably was a little overzealous in the way they were describing how they got this data and they probably should have obfuscated a little bit better so that Apple wouldn't patch it in the future. I know of stories where entire indictments have gone away because defense counsel have been pushing FBI to disclose how they've collected certain bits of

Starting point is 00:45:09 evidence so that they could make the indictment go away because they didn't want to expose sources. and methods. So yeah, I actually had the same thought, which is like they messed up by talking about this in in a court document because now obviously that that capability is gone forever. Now, real quick, James, because we're like going over time. GitHub's had a hell of a week. They had a regression in merge queue behavior, which was not great, but wasn't as bad as it was like first thought to be. They've had all sorts of availability problems. And then there's been this security issue that Wiz disclosed like overnight for us here in in Australia. Can you just walk us through quickly GitHub's horrible week?

Starting point is 00:45:51 Yeah, bad week. So the merge Q1 looked pretty scary because, you know, when you say it's a regression in merging and the regression is the code didn't merge, that's a big deal. But it was only... You had one job, etc. Yeah, exactly. Regression. We didn't do what we were supposed to do.

Starting point is 00:46:07 But it was in this thing called a merge queue, which is only used if you're at a very high volume of commits and PRs going through and so hence the blast radius that was small but nevertheless dumbug and a real concern that the testing wasn't actually you know exercising the real purpose of emerge queue there then along comes whiz with this advisory that there's a you know RCE basically in GitHub online and as well as well as all of the self-hosted versions of this the write-up's good but the thing that made me really laugh out loud with this one was towards the bottom of the article GitHub says if you're looking for indicators of compromise

Starting point is 00:46:43 on your GitHub Enterprise box look in your GitHub pushes and see if any of the GitHub pushes contained a semicolon and it's like oh come on guys you're telling me that you just had basically a straight path from a push command to a shell

Starting point is 00:47:00 that you could put a semicolon in there to say hey start my new command here it's just it's sequel injection but for the shell so a really, again, dumb bug, to their credit, they did post today a pretty good article that explains, look, we are having a rough time because AI agenda coding has just blown the roof off all of our metrics of how fast committed to coming in, how many repos are there. So, yeah, you know, but still, tough time for them. I mean, it's the vibe coding revolution, right? So, like, if you

Starting point is 00:47:28 imagine the equivalent number of, like, human developers that they're dealing with now, that actually makes a lot of sense. Yeah. It makes a lot of sense. Now, we've got two items we're going to get through real quick, both kind of hilarious. Catalan spotted this one and dropped it into slack today, which is the insurer at bay has said, oh, you know, one ransomware crew is like risen to the leaderboard and you're thinking, oh, big and scary, it's a Kira ransomware and you're thinking, wow, you know, these guys must have mad skills. And it turns out no, they've just got an M.O. where they go and rinse anyone who's running a sonic wall, basically, which is how they've been able to get to number one, which is just, I mean, what are you?

Starting point is 00:48:06 even say. Yeah, specialization works. Specialization works, that's right. And, you know, and how the mighty have fallen. Like, if that's what gets you to the top of the ransomware leaderboard these days, like, you disappoint us with your, you know, subpar TTPs. And the last thing we want to talk about is, you know, the cybersecurity industry's marketing dross is starting to leak out into the ransomware ecosystem. We've got a ransomware. Dan Gooden has this right up for us, Technica, where a ransomware crew is Kiber is going out there promoting its ransomware as being quantum ready, Dimitri. Well, this is military-grade technology, military-grade encryption for ransomware, right?

Starting point is 00:48:48 Why wouldn't you use military-grade technology, Mr. Affiliate that I want to sign up? By the way, it's interesting that it's named after the Kiber, you know, lattice-based encryption algorithm for key encapsulation, obviously, that is quantum resistance. And yeah, you know, if you want to differentiate yourself, one way to do that is obviously to target the most common vulnerabilities like Sonicwell. If you don't have that, you know, resort to marketing. Well, guys, that's actually it for the week's news. But before we go, Dimitri, you know, how was your weekend, man?

Starting point is 00:49:23 I heard you were off to the White House correspondence ball the other night. Was it fun? Well, you know, it was interesting, exciting. I was disappointed that the food was kind of lacking. We got salads and then not much else. So we were stuck there for an hour and a half when they told us the president would come back, which he, I think, wanted to do. And then in the end, the Secret Service didn't let him.

Starting point is 00:49:50 I remember I was texting with you. At the time, we thought that there was a dead body outside because there was rumors that the shooter was killed. and you were marveling at the fact that in America, dinner could continue while dead body is outside. But this is America. It's almost like gun deaths have become a little bit over normalized, maybe. Yeah, I tell you, we were right at the stage. And one of the funniest things I saw, you know,

Starting point is 00:50:19 in the initial moments as Secret Service is swooping in and taking over the people that they were protecting and they were grabbing Stephen Miller and Katie Miller, who was sitting at the Fox News table right next to us. And there's this couple, and I didn't know who they wore, unfortunately, but they're jumping over people, jumping over chairs and screaming, Stephen, Stephen, can you please take us with you? And I'm like thinking, this is not the rapture.

Starting point is 00:50:45 Where are you thinking he's going to take you? Well, Dimitri, I hate to tell you, mate, but if it were the rapture, I wouldn't want to go where Stephen Miller's going. Basically, that is the last place I would want to go, during the rupture. But, you know, to finish it up, you know, the evening actually ended up really great because we had the Canadian ambassador at our table. And at the end of it, when they finally let us out, he's like, why don't you come over

Starting point is 00:51:09 to a house, we'll lower a pizza? And it ended up being a really fun end of the night, actually. Well, mate, all's well that ends well. We're glad that you had a fun night and survived to tell the tale. So you could come here and talk about the news with us. And on that note, Dimitriol Perovich, James Wilson. Thank you so much for joining me on the show to talk through the week's news. It's been a lot of fun.

Starting point is 00:51:30 It's been fun, Pat, and looking forward to next week. Thanks so much for having. That was Dimitri Alperovich and James Wilson there with a look at the week's security news. Big thanks to them for that. It is time for this week's sponsor interview now with Dan Guido from Trail of Bits. And Trail of Bits is a really interesting sort of security consultancy and engineering firm that does a lot of interesting work. And, you know, Dan is big on AI at the moment, of course. I mean, they've always been a very sort of forward-leaning company.

Starting point is 00:52:01 So when there's new tech, they dive right into it. And one of the things that they've been both making use of and looking at on behalf of their clients is when you want to do some sort of private inference, right? Rent some infrastructure so that you can run inference workloads without exposing them to the hosting provider. And I thought this is going to be an interesting topic for a sponsor interview. So that's what Dan joined me to talk about now. So, you know, there's certain workloads where. you don't want to expose what you're working on to Anthropic, for example, right? You just can't, whether it's through like regulatory constraints or, you know, confidentiality

Starting point is 00:52:41 constraints. Like if you're an exploit developer working on high-end stuff, you can't just throw that into Anthropic. It's just not the way the world works. So you can use local models on very expensive hardware, or you can use services like tinfoil S-H, which will help you to actually load some local models into their, you know, powerful hardware and you can just pay to use that. But how do you then go about guaranteeing that that stuff is actually private?

Starting point is 00:53:05 So that is the topic of this interview. And also we talk about the work trailer bits did at looking at the way meta has done private inference for WhatsApp. So here's that interview. Here's Dan Guido, who is going to kick off right now explaining what, how private inference works. Enjoy. So these things, they operate in little trusted execution environments.

Starting point is 00:53:28 They're a little virtual machine inside of either a CPU or a GPU that is completely separated from any infrastructure that the cloud provider or the vendor can access. And that's just fundamentally different from what OpenAI Anthropic and the rest of the frontier labs are doing right now. They have a lot of legal agreements that say, you should trust us. And they have lawyers standing over their back with knives like, okay, like we get it. But if they get hacked or if somebody goes rogue internally or if they just get curious from a business standpoint and their internal rules change, they can look into all your stuff. So this private inference stuff, it solves a lot of major problems that, for instance, companies like Trail of Bits have. Like a lot of people trust me with their most sensitive intellectual property and they don't want me to give my data to a frontier lab where I'm taking it on faith that they're not looking. So I think in 2026, this is a really big topic.

Starting point is 00:54:29 And I am kind of excited to see where it goes, but also it's been a great thing for us to be around for us since I think we have the skills to really help. Now, okay, so all of this makes a lot of sense, right? So you've explained why there is a need for this private inference stuff. But, you know, as you also pointed out, this means that you are not using a frontier model. How big is the gap these days

Starting point is 00:54:53 between something like the latest and greatest anthropic model or open AI model, you know, versus one of these local models. Because as I understand it, the gap is pretty substantial. So I'm guessing there's a whole bunch of stuff you just can't do when you're using like, I know it sounds funny, but like a remotely hosted local model in one of these, you know, in one of these private inference rigs, right? So, you know, like what, yeah, I guess that's the question. What's the gap like there?

Starting point is 00:55:20 Yeah. So I think the way that we do it is we always prototype. first on a Frontier Lab model. Like, I want to make sure that I'm going to go explore and figure out, can I get this problem solved with Opus, with Codex, with, you know, Gemini's latest model. And then once we get that done, then we can start building evaluations. We build some data sets around like, okay, can we reliably solve this problem? Can we swap out some of the parts and figure out that we can trust one of the open source

Starting point is 00:55:46 models to do the same? I think a lot of people, the experience they have with open source models is they go grab Olamma, which is doing a CPU inference, not even MLX. And they're doing it with models that fit inside their laptop GPUs or their laptop CPUs. And those things just don't compare to using a full, unquantized, you know, 230 billion parameter model. You're leaving so much performance on the table.

Starting point is 00:56:11 And the only place you're really going to be able to do that is with some inference provider that's running it in the cloud. Not even withstanding, like I think personally, the way that Trail bits engineers use these sorts of things is that speed matters more than the model size. Like if this thing works five tokens per second, you're still not going to use it even if it's smarter. I would rather have something that does like 10,000 tokens per second and is completely stupid because then I can scale out a huge agent workflow, run like 50,000 checks on whatever it is that I'm working on, and get some trust out of the process. So the ability to run one of

Starting point is 00:56:51 these really high-end, latest generation, 200 billion parameter plus models in the cloud with inference that another person can't peek into is pretty attractive. And I think you can do it if you have the right e-vals. Right. So I guess what you're saying is the local models aren't that bad if you got the right hardware, if you got hardware with the juice to run it. Yeah, I think that's true. You can get mileage out of these things.

Starting point is 00:57:15 I'm seeing that it's kind of like a six-month delay where Opus comes out today, six months later the open source models catch up. But a lot of that is left to like, well, show me the proof. And that's where Trail of Bits and other firms, we're building a lot of proprietary data sets that give us that trust that, oh, yes, this open source model can do it. But that's work that you have to do for yourself. You know, a lot of the open evaluation data sets aren't going to tell you that. So what sort of tasks are we talking about that need to be sort of hidden away from Anthropic as per, you know, customers, Anthropical, Google, or Open AI, or whoever,

Starting point is 00:57:50 at a customer's request, right? So what are the sort of gigs? What sort of gigs are you using this sort of private inference infrastructure for? Is it like code audits? Is it like, what is it? A lot of its code audits, a lot of its product security incident response. I think the most sensitive engagements the Trail of its works on are cases where a company got hacked and they are treading carefully

Starting point is 00:58:12 because there are illegal ramifications to the work that we are doing for them. You know, there's customer liability issues. You know, they could be sued. There's lawyers involved. So this is about what winds up in discovery, I guess. Like if someone's dropping a subpoena on Anthropic and saying we want to see what this contractor did, blah, blah, blah, it's part of this incident response. It's part of that too. But, you know, we audit a lot of sensitive code where, like, we have some clients that are EDR vendors, right? And through working with those clients, we are finding remotely expletable bugs in these EDR products, and we don't want information about where those bugs are

Starting point is 00:58:49 to end up in some log at a frontier lab where any curious person inside the company might be able to get access to that log of prompts. Yeah. I mean, one of the reasons I was interested in having this conversation with you is because, well, first of all, you mentioned it in your talk at Unprompted. And secondly, I was chatting with someone who works very deep in exploit development who was talking about their frustration that they can't use model.

Starting point is 00:59:13 like all of the anthropic models and whatever, because you know, you can't expose those sort of exploits and vulnerabilities. You just can't take the risk. The stakes are too high when you're working in the national defense space at least. I think, you know, it's a really funny mindset to have. I remember conversations with similar people about Microsoft security research. Like, let's just booby trap or like let's look through the logs on the documentation for different APIs inside of Windows.

Starting point is 00:59:42 And from there, it's probably a data leak about, oh, like, this person's probably looking for bugs and this subsystem inside of Windows. Like, there's one person on the entire universe that found this specific API. And I think, you know, you can go, the turtles go way down. Like, you can get a local, you would probably be motivated to get a completely local offline copy of everything that's hosted online just so that you don't expose that stuff to a third-party vendor. But, I mean, from my perspective, we have client obligations to live up to. I get a lot of sensitive intellectual property and I have to attest that I'm going to keep it confidential and it's hard for me to do that with some of the way that AI works today. So this is attractive for me.

Starting point is 01:00:23 Let's just go into that because you mentioned before that there is this sort of, you know, cryptographically enforced separation between these virtual machines that are doing the inference. Like how robust is that separation? because I look, I understand actually from a contractual point of view from a not having data lying around that could be subject to discovery later point of view, like this is going to be good enough, right? But I just sort of, I've always wondered about that, you know, hardware separation on a, you know, separating VMs on a single piece of hardware thing.

Starting point is 01:00:54 Like, how robust is it? I mean, so I think it can be very robust. The problem is that you need to actually do the work to make it robust. And that's the process where Trailobits has been publishing, public reports of audits that we've done for these systems so that people gain that trust because they're complicated machines like especially so you've had a look under the under the hood of like tinfoil.sh for example are they a customer we've looked at tinfoil sh we've also looked at WhatsApp what's app that's the next thing i wanted to talk to you about is you did a whole you did a

Starting point is 01:01:24 whole gig where you know and the and the reason WhatsApp is doing private inference it's funny man it's so funny because my wife just asked me yesterday oh there's like now an AI button in WhatsApp She's Brazilian. All the Brazilians, that's their default app, right? Is WhatsApp? I think it was like, is this so if you get lonely, you can just talk to WhatsApp? And I'm like, no, but, you know, a nice one. Kind of, yeah.

Starting point is 01:01:45 But they're doing, so in order not to break the end-to-end encryption, they're doing private inference so that, you know, for some of the reasons that we spoke about earlier, right? So the data isn't just leaking out into logs or whatever or leaking out into, you know, queries and prompts of models that can be recovered later. So how did WhatsApp actually tackle this? So WhatsApp is using a lot of commodity off-the-shelf systems.

Starting point is 01:02:10 Like you look at Apple's private compute. They own the entire stack. They build custom hardware, custom software. They have a whole build chain that only exists at Apple. And it still sucks. But anyway. It's, I mean, everybody's got the same problem. So WhatsApp is using a lot of AMD stuff.

Starting point is 01:02:26 And, you know, there's challenges. There's challenges where a lot of the AMD hardware is not robust. against physical attacks. It has trouble making reproducible builds. You know, inside of a company like what Meta is really selling is they're saying that a company with meta's resources, that all the resources at Meta will not allow us to peek into the operation of this enclave. And that is a really strong, really outrageous statement to make.

Starting point is 01:02:53 And it means that you actually have to think about this beyond just the secure enclave. You have to think about physical access controls around the hardware. You have to think through what you're actually at. testing to how people check it, you know, what the transparent, what's available from the transparency log. Like when Apple does private compute cloud stuff, they don't even give you the source code to reproduce it. They just give you a binary. And they're like, oh, you know, obviously there's some security researcher that's looking at every single new hash that comes out, downloading the binary, and then inspecting it fully to make sure there's no backdoors. But, you know, so there's there's,

Starting point is 01:03:26 gaps and weaknesses about. I mean, you keep, you keep, you keep referencing Apple, right? And I'm guessing the reason you do that is because that is one of the better implementations of like private inference, right? So I'm guessing that's why you keep mentioning that in a discussion about WhatsApp. Well, they're like they were the first to market with one. And I think when it came out, it was sort of a big bang. It was a lot of interests and WhatsApp followed shortly after. So these are the two big comparable public implementations that people have. But I would say, you know, your question was, should we trust this? And why should we trust this? what these systems do is beyond just having technical controls,

Starting point is 01:04:05 they force a company to think hard about how they're storing your data. And like if you're not using a secure enclave, if you're not doing these things in a private inference system, then it's just sitting out there on the cloud somewhere. And, you know, your opinions about what you should do with that data could change, there might be thousands of DevOps engineers at your company that can access it. But when it's inside of an enclave, you like, you can't just like change your idea about how you're accessing data

Starting point is 01:04:30 you have to actually hack your own software. You have to spend considerable effort in an unambiguous project to hack software that you made. And the skills to do that are few and far between. There are like 10 people in the whole company that could figure it out, if that. And those 10 people probably are not motivated to do it. They like, they know the work that went into it. So these things, they provide like a real logical barrier inside the company as much as they provide a technical control. Now, for anyone who's interested in looking at your full audit report for WhatsApp's private inference, you've actually published that.

Starting point is 01:05:07 Oh, yeah. I mean, these systems require people to trust them. They need to know what's actually running, and they want to... It's a cryptographic system. You know, we publish cryptographic audits all the time, and this is one. So we have our full report that was co-published with WhatsApp, and you can go dig through all the dirty details. Yep, awesome. Well, look, Dan Guido, fantastic. to chat to you. I mean, you know, it's a shortish interview, right? So we can only sort of start to scratch the surface of what is a very interesting topic. But for people who want to go deeper, I mean, I'm looking at the audit report here and, you know, it is not short. I mean, what are we got here? About 118 pages of audit report for those of you who are really interested

Starting point is 01:05:48 in going deep on private inference. But, mate, great to chat to you. Always good to see you. And I'll look forward to chatting to you again soon. Thanks again, Patrick. That was Dan Guido there from Trail of Bits. This week's risky business sponsor. Big thanks to them for that. And that is it for this week's show. I do hope you enjoyed it. I'll be back soon with more security news and analysis. But until then, I've been Patrick Gray. Thanks for listening.

Risky Business - Risky Business #835 -- Why the Fast16 malware is badass

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.