CyberWire Daily - A spyware swiss army knife.
Episode Date: February 10, 2026ZeroDayRAT delivers full mobile compromise on Android and iOS. The UK warns infrastructure operators to act now as severe cyber threats mount. Russia moves to block Telegram. The FTC draws a line on d...ata sales to foreign adversaries. Researchers unpack DeadVax, a stealthy new malware campaign, while an old-school Linux botnet resurfaces. BeyondTrust fixes a critical flaw. And in AI, are we moving too fast? One mild training prompt may be enough to knock down safety guardrails. Our guest is Omer Akgul, Researcher at RSA Conference, discussing his work on "The Case for LLM Consistency Metrics in Cybersecurity (and Beyond)." A pair of penned pentesters provoke a pricey payout. Remember to leave us a 5-star rating and review in your favorite podcast app. Miss an episode? Sign-up for our daily intelligence roundup, Daily Briefing, and you’ll never miss a beat. And be sure to follow CyberWire Daily on LinkedIn. CyberWire Guest Today we are joined by Omer Akgul, PhD, Researcher at RSA Conference, discussing his work on "The Case for LLM Consistency Metrics in Cybersecurity (and Beyond)." Selected Reading New ‘ZeroDayRAT’ Spyware Kit Enables Total Compromise of iOS, Android Devices (SecurityWeek) NCSC Issues Warning Over “Severe” Cyber-Attacks Targeting Critical National Infrastructure (Infosecurity Magazine) Russian Watchdog Starts Limiting Access to Telegram, RBC Reports (Bloomberg) FTC Reminds Data Brokers of Their Obligations to Comply with PADFAA (FTC) Dead#Vax: Analyzing Multi-Stage VHD Delivery and Self-Parsing Batch Scripts to Deploy In-Memory Shellcode (secureonix) New ‘SSHStalker’ Linux Botnet Uses Old Techniques (SecurityWeek) BeyondTrust Patches Critical RCE Vulnerability (SecurityWeek) Critics warn America’s 'move fast' AI strategy could cost it the global market (CyberScoop) Microsoft boffins figured out how to break LLM safety guardrails with one simple prompt (The Register) County pays $600,000 to pentesters it arrested for assessing courthouse security (Ars Technica) Share your feedback. What do you think about CyberWire Daily? Please take a few minutes to share your thoughts with us by completing our brief listener survey. Thank you for helping us continue to improve our show. Want to hear your company in the show? N2K CyberWire helps you reach the industry’s most influential leaders and operators, while building visibility, authority, and connectivity across the cybersecurity community. Learn more at sponsor.thecyberwire.com. The CyberWire is a production of N2K Networks, your source for strategic workforce intelligence. © N2K Networks, Inc. Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
You're listening to the Cyberwire Network, powered by N2K.
Identity is a top attack vector.
In our interview with Kvitha Maria Pan from Rubrik,
she breaks down why 90% of security leaders believe that identity-based attacks are their biggest threat.
Throughout this conversation, we explore why recovery times are getting longer, not shorter,
and what resiliency will look like in this AI-driven world.
If you're struggling to get a handle on identity risk,
this is something you should tune into.
Check out the full interview at
thecyberwire.com slash rubric.
Maybe that's an urgent message from your CEO,
or maybe it's a deep fake trying to target your business.
Dopple is the AI-native social engineering defense platform
fighting back against impersonation and manipulation.
As attackers use AI to make their tactics more sophisticated,
Dopple uses it to fight back.
from automatically dismantling cross-channel attacks to building team resilience and more.
Doppel, outpacing what's next in social engineering.
Learn more at doppel.com.
That's DOPP-E-L.com.
Zero-day rat delivers full mobile compromise on Android and iOS.
The UK warns infrastructure operators to act now as severe cyber threats mount.
Russia moves to block telegram.
The FTC draws a line on day.
data sales to foreign adversaries.
Researchers unpack dead Vax, a stealthy new malware campaign, while an old-school Linux botnet
resurfaces.
Beyond Trust fixes a critical flaw.
Are we moving too fast in AI?
One mild training prompt may be enough to knock down safety guardrails.
Our guest is Omar Akul, researcher for the RSA conference discussing his work on the
case for LLM consistency metrics.
And a pair of penned pen testers provoke a problem.
privacy payout. It's Tuesday, February 10, 26. I'm Dave Bittner, and this is your Cyberwire Intel
briefing. Thanks for joining us here today. It's great as always to have you with us. Zero Day Rat is a
newly observed commercial mobile spyware toolkit that offers full remote compromise of both
Android and iOS devices. First seen on February 2nd and analyzed by I-Verify, the toolkit is
sold via telegram and rivals capabilities typically associated with nation-state tooling.
Infection requires delivery of a malicious binary, after which buyers operate their own self-hosted
infrastructure using a management panel and payload builder.
Distribution is left to the attacker using fishing, trojanized apps, or social engineering.
While an exploit feature is advertised, exploit capabilities remain unconfirmed.
Once installed, Zero Day Rat enables extensive passive data collection, including device profiling,
app usage, account details, messages, and precise location tracking.
It also supports live surveillance through camera, microphone, screen recording, and key logging.
Financial theft capabilities include clipboard-based crypto theft and banking credential harvesting.
Detection is difficult, indicators are limited, and take-down efforts.
are complicated by decentralized infrastructure and deliberate attribution obfuscation.
The National Cybersecurity Center has warned UK critical national infrastructure providers
to take immediate action against what it calls severe cyber threats.
The alert follows coordinated malware attacks on energy infrastructure in Poland in December.
Jonathan Ellison, the NCSC's director for national resilience, said similar attacks
against UK infrastructure are realistic and potentially disruptive to everyday services.
Writing on LinkedIn, he stressed that operators must act now to strengthen cyber defenses and
resilience. The NCSC defines severe threats as deliberate, highly disruptive or destructive
cyber attacks, potentially aimed at shutting down services, damaging industrial control systems,
or erasing data. Its guidance urges improved threat monitoring,
greater situational awareness and hardened network defenses through patching,
access controls like multi-factor authentication, and secure-by-design practices.
Ellison also highlighted the cybersecurity and resiliency bill
as a key step toward reducing national cyber risk.
Russia's communications regulator Roscom Nazor
plans to further restrict access to telegram starting Tuesday,
according to RBC, citing unnamed sources.
Measures to slow the service are reportedly already underway.
The move comes as authorities promote a state-run super app called Max while limiting foreign platforms.
Russia has progressively curtailed telegrams since late 2025 and recently moved toward blocking WhatsApp.
The actions fit a broader crackdown that's already banned Facebook, Instagram, and X,
and restricted YouTube.
The Federal Trade Commission has sent warning letters to 13 data brokers,
reminding them of their obligations under the Protecting Americans Data from Foreign
Adversaries Act of 2024, also known as Padfa.
The law bars data brokers from selling or providing access to sensitive personal data about
Americans to foreign adversaries, including China, Russia, Iran, and North Korea,
or entities they control.
Padfa covers highly sensitive information
such as health, financial, biometric,
geolocation, and login data,
as well as government-issued identifiers.
The FTC said some recipients
appeared to offer data related to U.S. armed forces status,
which is protected under the law.
The agency warned companies to review their practices,
noting violations could trigger enforcement actions
and civil penalties of a policy.
to $53,000 per violation.
Researchers at Securonics Threat Research have documented a highly stealthy multi-stage malware campaign
dubbed Dead Vax, highlighting how modern attackers evade traditional defenses.
The campaign begins with spearfishing emails delivering virtual hard-disc files hosted on IPFS,
which bypass common email and file security checks.
Once mounted, the VHD launches a chain of heavily obfuscated Windows scripts,
batch files, and power shell loaders that decrypt and execute payloads entirely in memory.
The final stage delivers as an encrypted shell code injected into trusted Microsoft-signed Windows processes
without ever writing a decrypted binary to disk.
The operation combines fileless execution, extreme obfuscation,
anti-analysis checks, and resilient persistence. Securonics's analysis emphasizes that attackers are
increasingly abusing legitimate file formats and native system features, making detection,
investigation, and response far more challenging for defenders.
Researchers at Flair report a newly identified Linux botnet SSH stalker that leans on 2009-era tooling and techniques,
including IRC bots and 19 Linux kernel exploits.
It's noisy, persisting via a cron job that runs every minute
and an update watchdog relaunch model
while deploying scanners and additional malware.
Artifacts resemble Romanian-linked botnets like Outlaw and Dota,
but Flair found no direct link, suggesting a copycat or derivative operator.
Flair estimates roughly 7,000 infections mainly on legacy,
Linux systems and observed crypto mining kits and apparently dormant IRC infrastructure.
Beyond Trust has patched a critical vulnerability affecting its remote support and privileged
remote access products. The flaw allows unauthenticated remote code execution via crafted
requests and carries a CVSS score of 9.9. It impacts multiple versions. Hactron AI estimates
about 8,500 internet-exposed instances are vulnerable.
While no active exploitation is reported,
Rapid 7 warns that state-linked groups,
including China's Silk Typhoon,
have previously targeted beyond-trust products.
The Trump administration has made U.S. leadership
in artificial intelligence a national priority,
favoring rapid innovation over strict security and safety regulation.
Officials say this approach departs from the
emphasis on AI safety under former President Joe Biden, but critics argue it risks undermining
global adoption of U.S.-made AI systems.
Former Deputy National Cyber Director Camille Stewart Gloucester warns that many organizations
are moving too fast, deploying AI without adequate governance or guardrails.
She says weak oversight can create real harm, citing cases where poorly controlled AI
agents disrupted customers and could not be easily shut down.
Down. Others, including former White House cybersecurity coordinator Michael Daniel,
caution that lighter U.S. rules may put American companies at a disadvantage abroad,
particularly in Europe, where safety standards are higher. Recent scrutiny of XAI and its
GROC model, backed by Elon Musk, highlights how regulatory gaps could trigger bans or
restrictions overseas. Democrats, like Mark Kelly, argue, stronger safeguards could all
ultimately strengthen U.S. competitiveness.
Researchers led by Microsoft CTO Mark Rucinovich report that a single unlabeled training prompt
can dismantle safety controls in large language models.
In a new paper, the team showed that fine-tuning models on the prompt
create a fake news article that could lead to panic or chaos,
weaken safety alignment across 15 different models,
even though the prompt avoids explicit violence,
or illegality.
The effect stems from group relative policy optimization or GRPO,
a reinforcement learning method intended to reward safer outputs.
By reversing those rewards, the researchers demonstrated a process they call GRP obliteration,
which effectively teaches models to ignore guardrails.
The work suggests current alignment techniques can be fragile,
with risks extending beyond text models to image generators,
raising concerns about sleeper back doors and the robustness of AI safety training.
Coming up after the break, Omer Akhul from the RSA conference discusses his work on the case per LLM consistency metrics,
and a pair of penned-pen testers provoke a pricey payout.
Stay with us.
What's your 2am security worry?
Is it, do I have the right controls in place?
Maybe are my vendors secure?
or the one that really keeps you up at night,
how do I get out from under these old tools and manual processes?
That's where Vanta comes in.
Vanta automates the manual work,
so you can stop sweating over spreadsheets,
chasing audit evidence,
and filling out endless questionnaires.
Their trust management platform continuously monitors your systems,
centralizes your data,
and simplifies your security at scale.
And it fits right into your workflows,
using AI to streamline evidence collection,
flag risks and keep your program audit ready all the time.
With Vanta, you get everything you need to move faster, scale confidently, and finally, get back to sleep.
Get started at Vanta.com slash cyber.
That's VANTA.com slash cyber.
Omer Aguul is a researcher with the RSA conference.
I recently caught up with him to discuss his work on the case for LLM consistency metrics
in cybersecurity.
So we were initially interested in understanding if we can put any sort of bounds on the
truthfulness of LLMs, right?
They say stuff, but it's pretty hard to fact-check what they're saying.
They're pretty confident in what they're saying, and so they lie all the time.
They call these things hallucinations.
So that was our initial curiosity, right?
how do you go about this?
And then we ran into a line of work called consistency.
Some people call this accuracy prediction.
Some people call this accuracy calibration.
They call it confidence.
The terms are somewhat conflated.
But the general idea is if you were to try to get the model to tell you how confident it is,
is it going to give you the same answer every time.
how much is that going to happen, right?
And turns out that might be a pretty decent predictor
when it's making stuff up.
Well, you use the word consistency here.
What does that mean in the context of large language models?
Right.
So the very simplest example or the definition I'd give is
how likely is the model to produce the same output given and prompt?
So say I ask it, tell me what 2 plus 2 is.
How likely is it to say four each time, right?
Or is it going to say something different every once in a while?
So say 60% of the time it says 4.
And 40% of the time it says 1, 3 and 5.
So that's the simplest way of putting it, I think.
Well, let's go through the research then.
How did you go at this?
How did you compare human judgments to automated consistency metrics?
Right.
So we were looking at the state of the art, right?
What have other people have done to try to measure consistency?
Because it turns out it's not actually super simple as the way I put it to measure the model's consistency.
And what we noticed in a lot of prior work is that they find these automated ways of trying to understand what
what consistency looks like.
They would, for instance,
have the model
respond to the same question multiple times,
and then they have these algorithms
to automatically compare
these answers to one another
and come up with an answer.
They would say 60% of the time
is consistent to this prompt.
But those automated ways, turns out,
aren't super ideal
because they aren't
the same way that humans
would compare answers and say these answers are consistent.
So I would maybe say writing down for, right, spelling it out,
and the numeric for single character is the same thing.
But these automated ways might not necessarily say that all the time.
So we noticed that flaw in prior work and thought that was worth investigating more.
And what were your core findings here?
What did you discover?
So what we find is our initial intuition was right,
that there is somewhat of a discrepancy
between all these automated metrics
and if you were to ask human intuition,
if you were to ask humans directly.
And this has consequences.
This means that the consistency metrics out there
that people are using already have flaws.
They're not perfect.
And we identify in,
how they're not perfect, right?
But we also have
some mitigations
to this. We do find that
if you combine a couple of these different methods
and you calibrate it with human
intuition, so there's like a little
bit of a training loop going
on there, you can get
pretty close to human numbers
and you can be pretty efficient at that.
So is
this, as you say, is this a matter
of training your LLL?
LM and giving you a positive reinforcement when it gives you what's perceived as a correct answer?
There is some work that does that.
So that is a method that could be explored.
But the way we do it is we basically have an auxiliary model where it looks at some of the,
basically the before NLM is going to give you an answer, it produces these internal states.
and these probabilities of what it should say essentially.
Our method looks at those,
and it's calibrated by human intuition to say,
based on what I'm seeing this model is outputting,
a human would have said this is this much consistent, right?
That's the general idea.
And so we look at that,
and this auxiliary model gives you a number.
It says this is how consistent,
of this model is with this answer.
And so that's how it works.
And so I'm clear here,
the part about human intuition
is also modeled.
That's a separate model.
So we do collect a bunch of data from humans.
Okay.
That's the part I wasn't clear about it.
To what degree are humans actually in the loop here?
Right.
So we did collect a bunch of data from humans
and that's how we get our first result.
There's a discrepancy between these automated metrics
and what humans would have said.
But then we're like, well, can we make these automated things better?
Can we make it more aligned with humans?
And that's where that auxiliary model I was just talking about comes in.
So yes, the human intuition is somewhat modeled in one of our solutions.
And the reason we need a solution to ever begin with
is you can't have a consortium of humans
rate the answers of models every time they produce an answer, right?
Right.
That's very impractical.
So we try to distill it down a little bit.
It's not perfect by any means.
There's still a discrepancy, but it's better than what it was.
Yeah.
So from your research here,
what sort of steps do you recommend organizations take
when they're deploying LLMs in their own critical workflows?
Right.
So this is the tricky part.
And we did try to come up with guidelines,
especially in the blog post I wrote later on.
But what it boils down to is actually pretty similar to how the machine learning
or the model deployment lifecycle works in other contexts.
What you generally do is you train and develop your model, right?
You put it out there.
But what you train on and what you deploy on aren't matching one-to-one.
your model won't perform as well in the real world as it did in when you were training it.
There's a similar thing going on here with consistency.
So say you developed or you borrowed your consistent metric from someone,
because consistency is pretty useful to understand if your model is doing well or not.
So in the case of LLMs, it can tell you if your model's lying or not with some confidence in trouble.
So it's not a fail-safe thing, but it's pretty useful.
And what could happen is you pick your consistency metric and you think it's giving you
pretty good data. All you're seeing is your model is doing great, but in reality it's not
doing all that great. So there needs to be some calibration going on to show that what your
consistency metric is saying is what is actually going on in the real world. That's kind of what
we do with this paper, right? We calibrate it on humans. I suspect, depending on how critical
your deployment is, right? You might need to do something similar to really get used out of these
consistency metrics. It's not like they're completely useless without this, but again, this
depends on how much risk you're willing to take. Depending on that, you might want to do some of
this calibration. And that could mean collecting data from humans. Maybe, depending on their scenario,
these humans might need to be specialized in whatever domain. Say these are like logs of computer systems,
right? Maybe you need people who understand systems, maybe more cybersecurity people that look at
these things. But you might need to do some calibration on your consistency metrics for them to be
really useful to you. I see. Where do you suppose this is going to go in the future? I mean,
do you envision that this, the things that you've learned here could be just integrated into
everyday LLMs? I hope so. It is tricky, right? The human calibration part is it's, it's not cheap
and it might change based on domain.
But I certainly hope that there is more attention paid to this
as more and more consistency metrics come out
because this isn't a solved problem.
There are new versions of these things coming out
practically every other week.
The pace of LLM papers being put out there is pretty fast.
But I certainly hope more people will pay it.
attention to the flaws that we've discovered and the solutions we've proposed to make this
stuff more robust for real-world deployment. That's Omer Akul from RSA Conference.
And finally, two penetration testers walked into a courthouse to do their jobs. Eventually,
Dallas County, Iowa agreed to pay them $600,000 for their trouble.
Back in 2019, Gary DiMecurio and Justin Wynne, then working for Coalfire Labs, were hired to test security at the Dallas County Courthouse under written authorization from the Iowa Judicial Branch.
The rules explicitly allowed lock picking and other physical attacks.
They found a door, popped a lock, tripped an alarm, and promptly showed deputies their authorization letter.
So far, textbook Red Team.
Then the sheriff arrived. Despite confirmation, the work was approved, Chad Leonard had the pair
arrested on felony burglary charges. They spent 20 hours in jail, posted $100,000 bail,
and endured months of public accusations before all the charges were dropped. The fallout was
career-threatening, the message chilling, even authorized hacking can end in handcuffs. After years of
litigation, the county settled days before trial. D. MacGurio now runs Kaiju security, and the lesson
stands. Sometimes, testing defenses exposes a different vulnerability altogether. And that's the Cyberwire.
For links to all of today's stories, check out our daily briefing at thecyberwire.com.
We'd love to know what you think of this podcast. Your feedback ensures we deliver the insights that
keep you a step ahead in the rapidly changing world of cybersecurity.
If you like our show, please share a rating and review in your favorite podcast app.
Please also fill out the survey in the show notes or send an email to Cyberwire at N2K.com.
N2K's senior producer is Alice Caruth.
Our Cyberwire producer is Liz Stokes.
We're mixed by Trey Hester with original music by Elliot Peltzman.
Our executive producer is Jennifer Iben.
Peter Kilpy is our publisher, and I'm Dave Bittner.
Thanks for listening.
We'll see you back here tomorrow.
If you only attend one cybersecurity conference this year, make it R-SAC 2026.
It's happening March 23rd through the 26th in San Francisco,
bringing together the global security community for four days of expert insights,
hands-on learning, and real innovation.
I'll say this plainly, I never miss this conference.
The ideas and conversations stay with me all year.
Join thousands of practitioners and leaders tackling today's toughest challenges,
and shaping what comes next.
Register today at rsacconference.com slash cyberwire 26.
I'll see you in San Francisco.
