CyberWire Daily - IoT security and the need for randomness. [Research Saturday]
Episode Date: October 2, 2021Dan Petro, Lead Researcher, and Allan Cecil, Security Consultant, from Bishop Fox join Dave to share their research "You're Doing IoT RNG," that they presented at DefCon 29. There’s a crack in the f...oundation of Internet of Things (IoT) security, one that affects 35 billion devices worldwide. Basically, every IoT device with a hardware random number generator (RNG) contains a serious vulnerability whereby it fails to properly generate random numbers, which undermines security for any upstream use. In order to perform most security-relevant operations, computers need to generate secrets via an RNG. These secrets then form the basis of cryptography, access controls, authentication, and more. The details of exactly how and why these secrets are generated varies for each use. The research can be found here: You're Doing IoT RNG Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
You're listening to the Cyber Wire Network, powered by N2K. of you, I was concerned about my data being sold by data brokers. So I decided to try Delete.me.
I have to say, Delete.me is a game changer. Within days of signing up, they started removing my
personal information from hundreds of data brokers. I finally have peace of mind knowing
my data privacy is protected. Delete.me's team does all the work for you with detailed reports
so you know exactly what's been done. Take control of your data and keep your private life Thank you. Hello everyone and welcome to the CyberWire's Research Saturday.
I'm Dave Bittner and this is our weekly conversation with researchers and analysts
tracking down threats and vulnerabilities,
solving some of the hard problems of protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us. I guess start, we do just pronounce it,
you're doing it wrong. At least when spoken, that seems to be the best way to say it.
That's Dan Petro. He's joined by his colleague, Alanil from Bishop Fox on their research, You're Doing IoT RNG.
And now, a message from our sponsor, Zscaler, the leader in cloud security.
a message from our sponsor Zscaler, the leader in cloud security. Enterprises have spent billions of dollars on firewalls and VPNs, yet breaches continue to rise by an 18% year-over-year
increase in ransomware attacks and a $75 million record payout in 2024. These traditional security
tools expand your attack surface with public-facing IPs that are exploited by bad actors more easily
than ever with AI tools. It's time to rethink your security. Zscaler Zero Trust plus AI stops
attackers by hiding your attack surface, making apps and IPs invisible, eliminating lateral
movement, connecting users only to specific apps, not the entire network, continuously
verifying every request based on identity and context, simplifying security management
with AI-powered automation, and detecting threats using AI to analyze over 500 billion
daily transactions.
Hackers can't attack what they can't see.
Protect your organization with Zscaler Zero Trust and AI.
Learn more at zscaler.com slash security.
This whole thing came about from an engagement that we were once on.
So we do a bunch of what we refer to as product security reviews at Bishop Fox.
We'd like to say that if it breaks when you drop it, then it's a product security review.
A lot of hardware sort of engagements there.
And so there was an IoT engagement from one of our clients that was making a kind of like a security device
that did a lot of cryptography as a part of its normal operations.
And so when doing that engagement, we sort of asked the client,
like, hey, what are you using as a random number generator
since doing all this crypto?
How do you generate keys? Things like that.
And they replied that, like, oh, well, the SOC that we're using,
the system on a chip, has a built-in hardware random number generator
on the board.
That sounds great.
Just kind of on a lark.
I would ask them, like,
do you mind giving us a bunch of the output
from the random number generator?
Just tell it to produce a gigabyte of data
and just send it over to us.
Figured it'd be fine.
Like, a hardware random number generator,
surely that's the gold standard
for generating random numbers.
It's a peripheral that does nothing but this one thing, right?
So I got this file back, and to our horror, large swaths of it were just zero.
We thought that something surely must be terribly wrong here.
There must be some buggy code or, like, what happened here that would cause this?
And so the research was kind of a follow-on from that to look up, you know, was this just a one-off thing, or was it wider than that?
It turned out that it was a much wider issue than, you know, was this just a one-off thing or was it wider than that? It turned out
that it was a much wider issue than, you know, just one ship. Yeah, it really is fascinating
the way you all dig into this and unpack the sort of, you know, the mystery of how something like
this could happen. And I think to your point here, it's sort of a head scratcher that something like
this could happen. Alan, can you
give us some of the background here? How do we get to the point where a piece of hardware that's
dedicated to generating random numbers is not doing that? That's a really good question. And
what's interesting is it's not just the hardware that things are going askew at, but we'll get
into that in a bit. The hardware itself generally works in a couple of specific ways, and computers are really,
really good at deterministic behavior.
Because if you did your taxes and every time you ran the program you got a different result,
you'd be pretty upset.
So you want computers to be extremely deterministic, which is fantastic, but on the other hand,
it's a real pain in the rear when you need things to not be deterministic, when you need
randomness. So there's a variety of ways of getting randomness. One of them is to use a pseudo-random
number generator seeded from something, either the time of day, input from a user, or other factors.
But ultimately, a pseudo-random number generator is not going to be perfect. So a lot of times you
want to have some other source of entropy that's a little more random.
And the hardware devices that do this,
especially in the IoT world,
are relatively simplistic to understand.
For instance, you might have one that relies on
an analog NOT gate,
one that's unclocked,
and at any time you're going to ask it,
are you a 1 or a 0,
and it'll respond relatively randomly. The other way to do it is to have two clocks running at different speeds,
or slightly different speeds, and sample the delta between them. And they'll never necessarily
be exactly the same, although they could be. But for the most part, you're generally going to get
certainty that you're going to get a random zero or a random one out of that. The big issue with
both of these designs is they can only give you so many numbers
at a time. If you're asking for a lot of randomness at once, you might have to wait a little bit.
There's only so many random numbers it can give you at once before it exhausts itself and you have
to refill that pool. And that's where a lot of things start to go kind of sideways.
Well, take us through what's going on here then. I mean,
it's a combination of things. As you say, it's not just the hardware, it's the way people are
using it. Can you walk us through the problem? So the problem is at multiple levels. One of the
things that we discovered in our research is, first of all, the hardware number random generators
themselves, even when you use them properly,
sometimes aren't giving you truly random distribution of numbers.
So there's that issue.
Next step up, you're probably not going to be writing everything from scratch.
You're probably going to be using a library.
And what we discovered is that some of these libraries
had some pretty serious issues, which we'll get into in a second.
The next level up is, if you're a user,
you're probably going to try to use example code.
And in some cases, the example code itself
didn't work correctly or used bad paradigms.
For instance, not checking error codes.
Or in some cases, misleading the user drastically.
Or in other cases, you'd have to read a 1,000 page manual
in order to know exactly how to properly call the hardware RNG.
So there's all kinds of missteps
that someone trying to use a random number generator
on an IoT device can go terribly wrong.
Yeah, it is fascinating,
and it seems like almost a cascading set of possibilities here
where you would think something that on its surface would
be as simple as calling, requesting a random number from your system. There wouldn't be as
many things along the way that could go wrong. Can we dig into this thing about not checking
error codes? I mean, what exactly is that about? Yeah, I can feel that one. So this kind of gets
to the heart of the name of
our presentation. We called it You're Doing It Wrong, specifically because there's maybe not
one right way of doing it, but there's definitely many wrong ways of doing it. And this is sort of
our way of standing up in front of the IoT industry the best that we could and telling
entire industry of technology that they're doing it wrong. And so the way that this has been solved in basically every other field
is through a cryptographically secure pseudo-random number generator subsystem.
So if you are on a server farm and you need a random number
in your Linux process, there's an API for this.
You can ask Linux, hey Linux, please give me a random number.
I need it to make an encryption key. And it can do API for this. You can ask Linux, hey Linux, please give me a random number. I need it to make an encryption key.
And it can do that for you.
Securely, we've spent lots of time, we've had lots of smart people look at these algorithms,
this whole process we're doing it, and we've managed to figure this out.
But this unfortunately is not how things work in the IoT space.
When you're in an IoT device, you basically just call the hardware random number generator itself.
Like you talk to the peripheral.
And just like any other piece of hardware, it can potentially fail.
Because you're not interfacing with a piece of software subsystem.
It's an actual piece of hardware, right?
Any number of things could have gone wrong.
It might be overheating.
Maybe the bus got scratched.
Jupiter and Saturn just weren't aligned at the time, for all we know.
Hardware devices can return error codes.
So the very first thing we looked at was how many people in the real world
are checking the error code of these harder random number generators.
And it turns out almost nobody that actually checks these error codes in the wild,
just by doing a cursory glance at code available on GitHub. And this doesn't come as any great surprise being that, like, you know,
it's actually really hard to do this properly. This turns out to be a whole can of worms by itself.
So there's the kind of level one understanding of this, which is that you're sort of left to
your own devices in the IoT space in terms of
talking to the hardware and doing things properly. And most developers wind up doing the easy thing
and not the secure thing. But that's not where the story ends. It's really just where the story
begins. Before we go any further, just for my own understanding here to make sure I'm following
along properly. So is part of the issue here that on an IoT device,
basically you don't have an operating system
as an intermediary level to get in between you and the hardware
to make sure you're getting what you need?
Yeah, basically there are middleware,
or you could call them IoT operating system type things
like Contiki or FreeRTOS or other things like that,
but they don't currently have a subsystem for CSPRNG. They don't have a cryptographically
secure pseudo-random number generator setup where you as a user can call that, and instead you as
a user are forced to directly call the hardware, which often goes quite sideways.
So if I'm calling out to the hardware and I'm requesting random numbers and the hardware is failing, what happens next?
What's coming back to my IoT device?
One of the things that's interesting about calling random numbers
is a lot of times when you need a random number, you don't need just one.
You're generally going to be asking for a lot of randomness all at once. For instance, you might
be generating an RSA SSH 2048-bit key, for instance. So you're going to need a lot of
entropy, a lot of random numbers all at once to make that key. So you're going to call it,
and then call it again, because you're only going to get one bit at a time. So you're going to have
to make a lot of calls to get the number of bits you need. The challenge is, if you call too frequently, you will
exhaust the hardware random number generator device's pool of entropy, or pool of random
numbers it can hand you, and it will error. Now, there is a way to ask the hardware, hey, have you
errored because of whatever circumstance?
The problem that we ran into repeatedly is no one checks the error codes.
And when you dive into why, it gets kind of interesting.
I'm going to let Dan talk about
why people can't check the error codes.
Yeah, so the trouble with placing the blame squarely
on the user here is that they're just placed
into an impossible scenario.
So if you imagine trying to write a
networking stack on an IoT device using one of these harder random number
generators. So you're in the TLS stack, and
you need a crypto key. You need to generate some numbers
to make an encryption key to talk to somebody externally.
And you call the hardware RNG function,
and it comes back with an error code.
What are you supposed to do with that?
One of the things about random numbers is that when you need them,
they're sort of critical to the core concept of the thing you need to do here.
You can't just simply handle the error in some abstract way
and then move forward without that random number.
What does it mean to do TLS without a random number? Like you just kind of can't.
So you're left with really two possibilities here. One is to block, that is to just halt the entire
machine. And most manufacturers will instruct you to just call the RNG again a second time,
basically spin looping at 100% CPU, waiting over and over again for the
RNG to be ready, which in the case of a broken device or like a damaged device, it might never
return. You might just spin loop forever. That's not a great option. But the second option would
be just to quit out. This is to kill the process, stop the networking process.
And that's not acceptable either.
Are you really supposed to just
kill the entire process every time
you run out of entropy, which
happens quite frequently?
That's not a workable scenario either.
So you're sort of left between these two
terrible choices, and it's no
surprise that developers wind up going with
just, well, let's just ignore the error code because things work when I do that.
Now, if I'm a developer and I'm not checking for the error code, is it likely that what's
coming back to me looks random enough that I won't necessarily know that it's not truly random?
That's one of the problems. A bunch of zeros in a row can be a
legitimate answer. It is random
after all, so you could randomly get a whole lot of zeros
in a row. So when you glance at it,
sometimes it's not immediately obvious.
Now, one of the things that was
obvious in the data set that Dan looked at
was there was a large swath of zeros
all in a row. But it's not necessarily
always that obvious. Sometimes
what you'll get is repeating
patterns. Maybe you'll get three zeros every 50 bytes, which is what one of the devices we looked
at did consistently for some unknown reason. We would love to know why it did that, but if you
were very casually looking at the data, you wouldn't necessarily notice it unless you were
analyzing it very carefully. Even when we went into the trouble of doing statistical analysis,
sometimes you had to use
the right statistical analysis test
to find the problem.
So even if you were glancing at it
with statistical analysis tools,
sometimes even that wasn't enough to detect,
oh, wait a minute, there's actually a problem here.
Yeah, and that kind of gets into
how do we actually evaluate the hardware RNGs?
Because that kind of comes down
to a fundamental problem of how do you know randomness when
you see it?
At risk of making this overly philosophical, you do have a really hard issue that you expect
a random number.
But then if you try to really dig deep down into that question, like what on earth do
you mean by a random number?
You start to realize that you're asking for your security device to break the laws of physics.
You want it to come up with some number
that's not based on anything.
You want it to break the laws of causality
to create some number that isn't based on anything else.
And then you start talking about quantum mechanics
and the whole thing becomes garbage.
Right, right.
The important thing here is actually to not
be so overly concerned about
whether the number is random or not in some abstract sense, whatever that even means,
but rather whether it's predictable or not. And now we can actually talk about it in terms of
an adversary who has certain amounts of information and doesn't and has certain levels of access and
doesn't and is or is not able to predict certain numbers with given accuracies.
Now that's actually like a problem that we can wrap our heads around.
So there's lots of good statistical analysis tools that you can analyze to see how predictable certain numbers are.
That do rather interesting things.
One of the common ones out there is Die Harder.
It's a series of tools that have been out for, I think, a couple decades or something like that now.
a series of tools that have been out for, I think, a couple decades or something like that now,
that you give it a long string of numbers, and it'll do things like play games of poker and craps and things like that with the numbers, like basically transforming them into die rolls,
poker cards, and then seeing if they line up to the expected distributions that you'd expect from
those particular games, in addition to lots of other kinds of statistical randomness tests.
from those particular games, in addition to lots of other kinds of statistical randomness tests.
Would many people who are putting together systems like this find that even in its imperfection,
that the numbers they're getting back are random enough for their use case?
It drastically depends on the use case. So the thing about symmetric crypto keys,
so like if you're trying to communicate with somebody and you have an AES key,
that's like a 256-bit AES key, right?
And it turns out that you're using a really trash RNG and half the numbers are zero.
You're still left with a 128-bit AES key,
and that's still strong.
You're not going to crack a 128-bit AES key.
They're very resilient to those kinds of things,
the loss of entropy.
That's not necessarily true of other operations resilient to those kinds of things, the loss of entropy. That's not necessarily
true of other operations. Many different kinds of encryption, in particular asymmetric cryptography
like RSA, uses math as its base operations, not just simple algorithms. And so they can be much
more susceptible to low entropy states and certain sections of numbers being zero.
In fact, there was another talk at DEF CON this year called
The Mechanics of Compromising Low Entropy RSA Keys.
So this is a thing that's a real-world threat.
What we saw in our data matched what researchers in 2019 saw,
that there were millions of low entropy keys on the internet
that they found in their research.
And they couldn't exactly determine where they came from,
but they theorized that maybe it was coming from IoT devices
that had very poor hardware RNG devices,
low entropy key generation.
So what are some of the possible solutions here? I mean,
if we have a, this is a broad issue, right? I mean, we're talking about something affecting
many, many IoT devices, millions, potentially billions of devices out there. Is there a
solution? I'm going to steal my own law, as it were.
My law is don't attempt to write RNG code on IoT devices on your own.
It's as difficult as trying to write crypto code on your own.
No one goes out and tries to write crypto code
and gets it right the first time.
And you're going to have the same problem
with RNG code on IoT devices.
Don't try to do it on your own.
Instead, you should be using some kind of CSP RNG subsystem.
Unfortunately for end users,
that doesn't readily exist right now.
So for an end end user,
someone who maybe has a smart door lock
that perhaps is using the same password
as a lot of other smart door locks,
I recommend updating quickly.
Whenever you see your vendor supplying an update,
you should probably be applying that.
For developers, you're going to have to put pressure
on the hardware manufacturers
and the people who are making these operating systems
for these IoT devices to implement a CSPRNG subsystem.
Really, right now, if you have to do it on your own,
read the manual extraordinarily
carefully. You're going to find weird cases where you might have to call 32 times in a row,
get a number, throw out the next 32 calls, get a number, and repeat. So you're going to have to be
very, very diligent and double-check all of your work. It's very difficult right now.
Yeah, it's an interesting, fascinating part of your research here, was what you were
just describing, that the instructions in the manual are not intuitive. And you could see how
many people could overlook that and think that they're getting random numbers when they're not.
If I saw that example in code I was trying to work with, I would think it was a bug. I think
it was completely a mistaken bit of implementation
because who would do that?
Who would read and then throw out the next 32 numbers?
But that's what the manual tells you to do.
That was the LPC device, correct?
Yeah, yeah.
There's another example here that was fascinating.
You were looking at the MediaTek 7697,
which is, I believe, a system on a chip, and you did some statistical
analysis of the random number generator, and you ended up with a sawtooth pattern.
Yeah, the interesting sawtooth pattern on the MediaTek device was always very curious.
That was one of our initial devices that we looked into, in fact, as well. And so that one
initial devices that we looked into, in fact, as well.
And so that one started leading us down a path of wanting to make our own statistical tests as well, since the very first thing we noticed was we took these numbers, put
them into the existing statistical tests like Die Harder.
There's a tool called Byte Circle, and it fails all these tests.
But they just kind of tell you pass-fail based off of existing information.
They'll say, well, we tried playing a thousand games of craps with these numbers, and it didn't work.
They came out with bad results.
But that doesn't actually tell you what the heck is going wrong, right?
And so we tried as much as possible to make pretty charts and graphs and things. And so we postulated, well, what if we graph every byte,
like 0 to 255, and see if every byte happens the same?
Because depending on how the actual hardware works,
they might create random bits at the byte level, at the bit level,
or even at the word level, like 32-bit words.
And so that's all just
dependent on the hardware, where if the hardware might make a single bit and then concatenate it
with another single bit, and so basically every bit is independent of each other, or it should be.
And some devices create 32 bits all at the same time. Basically, you don't have one random bit
generator, you have 32 bit generators all kind of concatenated together. And so who knows, maybe
there's some correlations there.
So we basically plotted out on a histogram,
and lo and behold, you get this interesting pattern of bytes
where some were clearly happening more and less often than others.
And it seemed like that was likely due to a bit bias,
where zero was more likely to occur than one was across the distribution.
And it kind of creates that pattern
when looking at it in terms of bytes.
And so that kind of bias is exactly the sort of thing
that you would not like to see
from a thing that you're basing your cryptography on.
I pause it to the audience there.
You look at that graph, you say,
how confident do you feel using this directly
for your crypto keys?
Even if you did do all the software correct, even if you check the error codes and you went through all that process,
you're still kind of getting this number, this pattern that would keep cryptographers up at night.
To what degree, I mean, to what volume are we ringing the alarm here? How serious in the
practical real world is this potentially going to affect things? I would say that this affects
35 billion, potentially 35 billion IoT devices out there today. On the one hand, that's an alarming
number. On the other hand, this isn't a heartbleed type attack
where every device is immediately at risk.
It's much more device-specific and application-specific,
how you're using those random numbers.
So on the one hand, it's going to take a bit of,
how do I say it, tinkering?
It's going to take a little bit of specialized work to pull off a particular attack
against a specific device. But on the other hand, there's a lot of devices out there that are
vulnerable. I'll let Dan expand on that a little bit too. Yeah, we're not used to seeing attacks
or vulnerabilities that affect an entire industry in security.
Generally, we'll see somebody wrote some buggy code,
there's some library, there's some software.
And sometimes it's particularly bad because a lot of people depend on this piece of software
and then people will fix it
or maybe individual users have to patch it on their own.
And we're kind of used to this rinse and repeat process in security.
What we're not used to seeing is something come out where an entire industry, where the problem is the status quo. It's a
programming pattern. It's the way that an entire industry does things. Can you imagine if the
automotive industry just fundamentally doesn't do bounds checking? And then every time you ask them why they do it this way,
they give you some excuse about they have strict overhead requirements
and they don't have the time to bounce.
No, those are all bad arguments.
You absolutely have the time and overhead to do this properly.
These are important devices.
IoT devices are not toys anymore.
They're home security devices.
They're things you put your body into. There's things that you put into your body that are IoT devices are not toys anymore. They're home security devices. They're things you put your body into.
There's things that you put into your body
that are IoT devices.
This stuff is important.
We can definitely do it the right way.
So that's kind of the first level of that.
Because we're talking about an entire industry here,
remediation is tricky.
The IoT industry is not one thing.
It doesn't use one library. It doesn't
use one piece of hardware. It's very heterogeneous. The good news is that this can be patched in
software. The bad news is it must be patched in software. And that IoT devices are notoriously
difficult to patch. Many of them are burnt firmware onto devices and have no update capability.
burnt firmware onto devices and have no update capability.
Some of them have the capability of updating, but just, you know, it's not simple to.
And many of them do, in fact, have, you know, pretty low computing power to where, like,
you have to at least give it some design consideration into, like, how to solve this sort of problem.
I'm curious, you know, as the two of you were making your way through this research,
what was it like to realize the scope of what was going on here? I mean, did you have aha moments along the way where you kind of looked at each other and said,
holy smokes, this just keeps getting bigger? For me, it was more along the lines of,
am I doing this wrong? Like, really, seriously, am I properly,
I think I'm following all the steps correctly.
I took on tackling the STM32
and spent a considerable amount of time
doing it incorrectly, not intentionally.
I spent a lot of effort trying to make sure
I was spin looping properly
to make sure I was getting proper random numbers,
implemented it incorrectly on accident,
didn't realize it.
I went down a track of thinking,
wow, this device is really producing
absolutely garbage random numbers,
not understanding what could possibly be going wrong,
discovered a flaw, tried it again,
found a different issue, tried it again.
We were actively verifying it every step of the way
with hard line, really thorough statistical tests with Die Harder that we were doing it properly.
No intern at a third-party firm who's been brought on to help with a late project is going to take that effort.
So the fact that we spent measured effort at getting this right and couldn't do it is terrifying.
It's just really hard to get right.
Yeah, they say that things only occur
in increments and amounts of zero, one, or infinity.
So it's possible that we would look into this problem
and there'd be no instances of this vulnerability.
Or it's possible that we could look into it and there'd be one buggy device out there right or it's possible
that they're all buggy but having exactly two buggy devices would be that that'd be weird and
unheard of so it's kind of like we knew that there was one instance of this already um going into it
because like we kind of already found that so once we we took out a second IoT device and looked at it, and the exact same problem was
there too, that was the major aha moment.
That was the major breakthrough of like, oh crap, I think this is everything.
Because two devices by completely different manufacturers that use entirely different
software stacks that have completely different devices built on them,
have exactly the same issues,
now we suddenly realize this is actually everywhere,
that this is because of how the industry uses them.
Our thanks to Dan Petro and Alan Cecil from Bishop Fox for joining us. The research is titled You're Doing IoT RNG.
We'll have a link in the show notes.
And now a message from Black Cloak.
Did you know the easiest way for cyber criminals to bypass your company's defenses
is by targeting your executives and their families at home?
Black Cloak's award-winning digital executive protection platform
secures their personal devices, home networks, and connected lives.
Because when executives are compromised at home, your company is at risk.
In fact, over one-third of new members discover they've already been breached. Protect your executives and their families 24-7, 365, with Black Cloak. Learn more at blackcloak.io.
CyberWire Research Saturday is proudly produced in Maryland out of the startup studios of DataTribe,
where they're co-building the next generation of cybersecurity teams and technologies.
Our amazing CyberWire team is Elliot Peltzman, Trey Hester, Brandon Karp, Puru Prakash, Justin Sabey,
Tim Nodar, Joe Kerrigan, Carol Terrio, Ben Yellen, Nick Vilecki, Gina Johnson, Bennett Moe,
Chris Russell, John Petrick, Jennifer Iben, Rick Howard, Peter Kilpie, and I'm Dave Bittner. Thanks for listening. We'll see you back here next week.