CyberWire Daily - IoT security and the need for randomness. [Research Saturday]

Starting point is 00:00:00 You're listening to the Cyber Wire Network, powered by N2K. of you, I was concerned about my data being sold by data brokers. So I decided to try Delete.me. I have to say, Delete.me is a game changer. Within days of signing up, they started removing my personal information from hundreds of data brokers. I finally have peace of mind knowing my data privacy is protected. Delete.me's team does all the work for you with detailed reports so you know exactly what's been done. Take control of your data and keep your private life Thank you. Hello everyone and welcome to the CyberWire's Research Saturday. I'm Dave Bittner and this is our weekly conversation with researchers and analysts tracking down threats and vulnerabilities, solving some of the hard problems of protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us. I guess start, we do just pronounce it,

Starting point is 00:01:56 you're doing it wrong. At least when spoken, that seems to be the best way to say it. That's Dan Petro. He's joined by his colleague, Alanil from Bishop Fox on their research, You're Doing IoT RNG. And now, a message from our sponsor, Zscaler, the leader in cloud security. a message from our sponsor Zscaler, the leader in cloud security. Enterprises have spent billions of dollars on firewalls and VPNs, yet breaches continue to rise by an 18% year-over-year increase in ransomware attacks and a $75 million record payout in 2024. These traditional security tools expand your attack surface with public-facing IPs that are exploited by bad actors more easily than ever with AI tools. It's time to rethink your security. Zscaler Zero Trust plus AI stops attackers by hiding your attack surface, making apps and IPs invisible, eliminating lateral

Starting point is 00:03:00 movement, connecting users only to specific apps, not the entire network, continuously verifying every request based on identity and context, simplifying security management with AI-powered automation, and detecting threats using AI to analyze over 500 billion daily transactions. Hackers can't attack what they can't see. Protect your organization with Zscaler Zero Trust and AI. Learn more at zscaler.com slash security. This whole thing came about from an engagement that we were once on.

Starting point is 00:03:42 So we do a bunch of what we refer to as product security reviews at Bishop Fox. We'd like to say that if it breaks when you drop it, then it's a product security review. A lot of hardware sort of engagements there. And so there was an IoT engagement from one of our clients that was making a kind of like a security device that did a lot of cryptography as a part of its normal operations. And so when doing that engagement, we sort of asked the client, like, hey, what are you using as a random number generator since doing all this crypto?

Starting point is 00:04:13 How do you generate keys? Things like that. And they replied that, like, oh, well, the SOC that we're using, the system on a chip, has a built-in hardware random number generator on the board. That sounds great. Just kind of on a lark. I would ask them, like, do you mind giving us a bunch of the output

Starting point is 00:04:29 from the random number generator? Just tell it to produce a gigabyte of data and just send it over to us. Figured it'd be fine. Like, a hardware random number generator, surely that's the gold standard for generating random numbers. It's a peripheral that does nothing but this one thing, right?

Starting point is 00:04:43 So I got this file back, and to our horror, large swaths of it were just zero. We thought that something surely must be terribly wrong here. There must be some buggy code or, like, what happened here that would cause this? And so the research was kind of a follow-on from that to look up, you know, was this just a one-off thing, or was it wider than that? It turned out that it was a much wider issue than, you know, was this just a one-off thing or was it wider than that? It turned out that it was a much wider issue than, you know, just one ship. Yeah, it really is fascinating the way you all dig into this and unpack the sort of, you know, the mystery of how something like this could happen. And I think to your point here, it's sort of a head scratcher that something like

Starting point is 00:05:22 this could happen. Alan, can you give us some of the background here? How do we get to the point where a piece of hardware that's dedicated to generating random numbers is not doing that? That's a really good question. And what's interesting is it's not just the hardware that things are going askew at, but we'll get into that in a bit. The hardware itself generally works in a couple of specific ways, and computers are really, really good at deterministic behavior. Because if you did your taxes and every time you ran the program you got a different result, you'd be pretty upset.

Starting point is 00:05:55 So you want computers to be extremely deterministic, which is fantastic, but on the other hand, it's a real pain in the rear when you need things to not be deterministic, when you need randomness. So there's a variety of ways of getting randomness. One of them is to use a pseudo-random number generator seeded from something, either the time of day, input from a user, or other factors. But ultimately, a pseudo-random number generator is not going to be perfect. So a lot of times you want to have some other source of entropy that's a little more random. And the hardware devices that do this, especially in the IoT world,

Starting point is 00:06:30 are relatively simplistic to understand. For instance, you might have one that relies on an analog NOT gate, one that's unclocked, and at any time you're going to ask it, are you a 1 or a 0, and it'll respond relatively randomly. The other way to do it is to have two clocks running at different speeds, or slightly different speeds, and sample the delta between them. And they'll never necessarily

Starting point is 00:06:53 be exactly the same, although they could be. But for the most part, you're generally going to get certainty that you're going to get a random zero or a random one out of that. The big issue with both of these designs is they can only give you so many numbers at a time. If you're asking for a lot of randomness at once, you might have to wait a little bit. There's only so many random numbers it can give you at once before it exhausts itself and you have to refill that pool. And that's where a lot of things start to go kind of sideways. Well, take us through what's going on here then. I mean, it's a combination of things. As you say, it's not just the hardware, it's the way people are

Starting point is 00:07:31 using it. Can you walk us through the problem? So the problem is at multiple levels. One of the things that we discovered in our research is, first of all, the hardware number random generators themselves, even when you use them properly, sometimes aren't giving you truly random distribution of numbers. So there's that issue. Next step up, you're probably not going to be writing everything from scratch. You're probably going to be using a library. And what we discovered is that some of these libraries

Starting point is 00:07:58 had some pretty serious issues, which we'll get into in a second. The next level up is, if you're a user, you're probably going to try to use example code. And in some cases, the example code itself didn't work correctly or used bad paradigms. For instance, not checking error codes. Or in some cases, misleading the user drastically. Or in other cases, you'd have to read a 1,000 page manual

Starting point is 00:08:22 in order to know exactly how to properly call the hardware RNG. So there's all kinds of missteps that someone trying to use a random number generator on an IoT device can go terribly wrong. Yeah, it is fascinating, and it seems like almost a cascading set of possibilities here where you would think something that on its surface would be as simple as calling, requesting a random number from your system. There wouldn't be as

Starting point is 00:08:52 many things along the way that could go wrong. Can we dig into this thing about not checking error codes? I mean, what exactly is that about? Yeah, I can feel that one. So this kind of gets to the heart of the name of our presentation. We called it You're Doing It Wrong, specifically because there's maybe not one right way of doing it, but there's definitely many wrong ways of doing it. And this is sort of our way of standing up in front of the IoT industry the best that we could and telling entire industry of technology that they're doing it wrong. And so the way that this has been solved in basically every other field is through a cryptographically secure pseudo-random number generator subsystem.

Starting point is 00:09:33 So if you are on a server farm and you need a random number in your Linux process, there's an API for this. You can ask Linux, hey Linux, please give me a random number. I need it to make an encryption key. And it can do API for this. You can ask Linux, hey Linux, please give me a random number. I need it to make an encryption key. And it can do that for you. Securely, we've spent lots of time, we've had lots of smart people look at these algorithms, this whole process we're doing it, and we've managed to figure this out. But this unfortunately is not how things work in the IoT space.

Starting point is 00:10:00 When you're in an IoT device, you basically just call the hardware random number generator itself. Like you talk to the peripheral. And just like any other piece of hardware, it can potentially fail. Because you're not interfacing with a piece of software subsystem. It's an actual piece of hardware, right? Any number of things could have gone wrong. It might be overheating. Maybe the bus got scratched.

Starting point is 00:10:21 Jupiter and Saturn just weren't aligned at the time, for all we know. Hardware devices can return error codes. So the very first thing we looked at was how many people in the real world are checking the error code of these harder random number generators. And it turns out almost nobody that actually checks these error codes in the wild, just by doing a cursory glance at code available on GitHub. And this doesn't come as any great surprise being that, like, you know, it's actually really hard to do this properly. This turns out to be a whole can of worms by itself. So there's the kind of level one understanding of this, which is that you're sort of left to

Starting point is 00:11:02 your own devices in the IoT space in terms of talking to the hardware and doing things properly. And most developers wind up doing the easy thing and not the secure thing. But that's not where the story ends. It's really just where the story begins. Before we go any further, just for my own understanding here to make sure I'm following along properly. So is part of the issue here that on an IoT device, basically you don't have an operating system as an intermediary level to get in between you and the hardware to make sure you're getting what you need?

Starting point is 00:11:36 Yeah, basically there are middleware, or you could call them IoT operating system type things like Contiki or FreeRTOS or other things like that, but they don't currently have a subsystem for CSPRNG. They don't have a cryptographically secure pseudo-random number generator setup where you as a user can call that, and instead you as a user are forced to directly call the hardware, which often goes quite sideways. So if I'm calling out to the hardware and I'm requesting random numbers and the hardware is failing, what happens next? What's coming back to my IoT device?

Starting point is 00:12:17 One of the things that's interesting about calling random numbers is a lot of times when you need a random number, you don't need just one. You're generally going to be asking for a lot of randomness all at once. For instance, you might be generating an RSA SSH 2048-bit key, for instance. So you're going to need a lot of entropy, a lot of random numbers all at once to make that key. So you're going to call it, and then call it again, because you're only going to get one bit at a time. So you're going to have to make a lot of calls to get the number of bits you need. The challenge is, if you call too frequently, you will exhaust the hardware random number generator device's pool of entropy, or pool of random

Starting point is 00:12:55 numbers it can hand you, and it will error. Now, there is a way to ask the hardware, hey, have you errored because of whatever circumstance? The problem that we ran into repeatedly is no one checks the error codes. And when you dive into why, it gets kind of interesting. I'm going to let Dan talk about why people can't check the error codes. Yeah, so the trouble with placing the blame squarely on the user here is that they're just placed

Starting point is 00:13:22 into an impossible scenario. So if you imagine trying to write a networking stack on an IoT device using one of these harder random number generators. So you're in the TLS stack, and you need a crypto key. You need to generate some numbers to make an encryption key to talk to somebody externally. And you call the hardware RNG function, and it comes back with an error code.

Starting point is 00:13:50 What are you supposed to do with that? One of the things about random numbers is that when you need them, they're sort of critical to the core concept of the thing you need to do here. You can't just simply handle the error in some abstract way and then move forward without that random number. What does it mean to do TLS without a random number? Like you just kind of can't. So you're left with really two possibilities here. One is to block, that is to just halt the entire machine. And most manufacturers will instruct you to just call the RNG again a second time,

Starting point is 00:14:22 basically spin looping at 100% CPU, waiting over and over again for the RNG to be ready, which in the case of a broken device or like a damaged device, it might never return. You might just spin loop forever. That's not a great option. But the second option would be just to quit out. This is to kill the process, stop the networking process. And that's not acceptable either. Are you really supposed to just kill the entire process every time you run out of entropy, which

Starting point is 00:14:53 happens quite frequently? That's not a workable scenario either. So you're sort of left between these two terrible choices, and it's no surprise that developers wind up going with just, well, let's just ignore the error code because things work when I do that. Now, if I'm a developer and I'm not checking for the error code, is it likely that what's coming back to me looks random enough that I won't necessarily know that it's not truly random?

Starting point is 00:15:22 That's one of the problems. A bunch of zeros in a row can be a legitimate answer. It is random after all, so you could randomly get a whole lot of zeros in a row. So when you glance at it, sometimes it's not immediately obvious. Now, one of the things that was obvious in the data set that Dan looked at was there was a large swath of zeros

Starting point is 00:15:40 all in a row. But it's not necessarily always that obvious. Sometimes what you'll get is repeating patterns. Maybe you'll get three zeros every 50 bytes, which is what one of the devices we looked at did consistently for some unknown reason. We would love to know why it did that, but if you were very casually looking at the data, you wouldn't necessarily notice it unless you were analyzing it very carefully. Even when we went into the trouble of doing statistical analysis, sometimes you had to use

Starting point is 00:16:06 the right statistical analysis test to find the problem. So even if you were glancing at it with statistical analysis tools, sometimes even that wasn't enough to detect, oh, wait a minute, there's actually a problem here. Yeah, and that kind of gets into how do we actually evaluate the hardware RNGs?

Starting point is 00:16:23 Because that kind of comes down to a fundamental problem of how do you know randomness when you see it? At risk of making this overly philosophical, you do have a really hard issue that you expect a random number. But then if you try to really dig deep down into that question, like what on earth do you mean by a random number? You start to realize that you're asking for your security device to break the laws of physics.

Starting point is 00:16:48 You want it to come up with some number that's not based on anything. You want it to break the laws of causality to create some number that isn't based on anything else. And then you start talking about quantum mechanics and the whole thing becomes garbage. Right, right. The important thing here is actually to not

Starting point is 00:17:03 be so overly concerned about whether the number is random or not in some abstract sense, whatever that even means, but rather whether it's predictable or not. And now we can actually talk about it in terms of an adversary who has certain amounts of information and doesn't and has certain levels of access and doesn't and is or is not able to predict certain numbers with given accuracies. Now that's actually like a problem that we can wrap our heads around. So there's lots of good statistical analysis tools that you can analyze to see how predictable certain numbers are. That do rather interesting things.

Starting point is 00:17:36 One of the common ones out there is Die Harder. It's a series of tools that have been out for, I think, a couple decades or something like that now. a series of tools that have been out for, I think, a couple decades or something like that now, that you give it a long string of numbers, and it'll do things like play games of poker and craps and things like that with the numbers, like basically transforming them into die rolls, poker cards, and then seeing if they line up to the expected distributions that you'd expect from those particular games, in addition to lots of other kinds of statistical randomness tests. from those particular games, in addition to lots of other kinds of statistical randomness tests. Would many people who are putting together systems like this find that even in its imperfection,

Starting point is 00:18:14 that the numbers they're getting back are random enough for their use case? It drastically depends on the use case. So the thing about symmetric crypto keys, so like if you're trying to communicate with somebody and you have an AES key, that's like a 256-bit AES key, right? And it turns out that you're using a really trash RNG and half the numbers are zero. You're still left with a 128-bit AES key, and that's still strong. You're not going to crack a 128-bit AES key.

Starting point is 00:18:40 They're very resilient to those kinds of things, the loss of entropy. That's not necessarily true of other operations resilient to those kinds of things, the loss of entropy. That's not necessarily true of other operations. Many different kinds of encryption, in particular asymmetric cryptography like RSA, uses math as its base operations, not just simple algorithms. And so they can be much more susceptible to low entropy states and certain sections of numbers being zero. In fact, there was another talk at DEF CON this year called The Mechanics of Compromising Low Entropy RSA Keys.

Starting point is 00:19:13 So this is a thing that's a real-world threat. What we saw in our data matched what researchers in 2019 saw, that there were millions of low entropy keys on the internet that they found in their research. And they couldn't exactly determine where they came from, but they theorized that maybe it was coming from IoT devices that had very poor hardware RNG devices, low entropy key generation.

Starting point is 00:19:43 So what are some of the possible solutions here? I mean, if we have a, this is a broad issue, right? I mean, we're talking about something affecting many, many IoT devices, millions, potentially billions of devices out there. Is there a solution? I'm going to steal my own law, as it were. My law is don't attempt to write RNG code on IoT devices on your own. It's as difficult as trying to write crypto code on your own. No one goes out and tries to write crypto code and gets it right the first time.

Starting point is 00:20:17 And you're going to have the same problem with RNG code on IoT devices. Don't try to do it on your own. Instead, you should be using some kind of CSP RNG subsystem. Unfortunately for end users, that doesn't readily exist right now. So for an end end user, someone who maybe has a smart door lock

Starting point is 00:20:36 that perhaps is using the same password as a lot of other smart door locks, I recommend updating quickly. Whenever you see your vendor supplying an update, you should probably be applying that. For developers, you're going to have to put pressure on the hardware manufacturers and the people who are making these operating systems

Starting point is 00:20:57 for these IoT devices to implement a CSPRNG subsystem. Really, right now, if you have to do it on your own, read the manual extraordinarily carefully. You're going to find weird cases where you might have to call 32 times in a row, get a number, throw out the next 32 calls, get a number, and repeat. So you're going to have to be very, very diligent and double-check all of your work. It's very difficult right now. Yeah, it's an interesting, fascinating part of your research here, was what you were just describing, that the instructions in the manual are not intuitive. And you could see how

Starting point is 00:21:32 many people could overlook that and think that they're getting random numbers when they're not. If I saw that example in code I was trying to work with, I would think it was a bug. I think it was completely a mistaken bit of implementation because who would do that? Who would read and then throw out the next 32 numbers? But that's what the manual tells you to do. That was the LPC device, correct? Yeah, yeah.

Starting point is 00:21:56 There's another example here that was fascinating. You were looking at the MediaTek 7697, which is, I believe, a system on a chip, and you did some statistical analysis of the random number generator, and you ended up with a sawtooth pattern. Yeah, the interesting sawtooth pattern on the MediaTek device was always very curious. That was one of our initial devices that we looked into, in fact, as well. And so that one initial devices that we looked into, in fact, as well. And so that one started leading us down a path of wanting to make our own statistical tests as well, since the very first thing we noticed was we took these numbers, put

Starting point is 00:22:34 them into the existing statistical tests like Die Harder. There's a tool called Byte Circle, and it fails all these tests. But they just kind of tell you pass-fail based off of existing information. They'll say, well, we tried playing a thousand games of craps with these numbers, and it didn't work. They came out with bad results. But that doesn't actually tell you what the heck is going wrong, right? And so we tried as much as possible to make pretty charts and graphs and things. And so we postulated, well, what if we graph every byte, like 0 to 255, and see if every byte happens the same?

Starting point is 00:23:14 Because depending on how the actual hardware works, they might create random bits at the byte level, at the bit level, or even at the word level, like 32-bit words. And so that's all just dependent on the hardware, where if the hardware might make a single bit and then concatenate it with another single bit, and so basically every bit is independent of each other, or it should be. And some devices create 32 bits all at the same time. Basically, you don't have one random bit generator, you have 32 bit generators all kind of concatenated together. And so who knows, maybe

Starting point is 00:23:44 there's some correlations there. So we basically plotted out on a histogram, and lo and behold, you get this interesting pattern of bytes where some were clearly happening more and less often than others. And it seemed like that was likely due to a bit bias, where zero was more likely to occur than one was across the distribution. And it kind of creates that pattern when looking at it in terms of bytes.

Starting point is 00:24:10 And so that kind of bias is exactly the sort of thing that you would not like to see from a thing that you're basing your cryptography on. I pause it to the audience there. You look at that graph, you say, how confident do you feel using this directly for your crypto keys? Even if you did do all the software correct, even if you check the error codes and you went through all that process,

Starting point is 00:24:33 you're still kind of getting this number, this pattern that would keep cryptographers up at night. To what degree, I mean, to what volume are we ringing the alarm here? How serious in the practical real world is this potentially going to affect things? I would say that this affects 35 billion, potentially 35 billion IoT devices out there today. On the one hand, that's an alarming number. On the other hand, this isn't a heartbleed type attack where every device is immediately at risk. It's much more device-specific and application-specific, how you're using those random numbers.

Starting point is 00:25:17 So on the one hand, it's going to take a bit of, how do I say it, tinkering? It's going to take a little bit of specialized work to pull off a particular attack against a specific device. But on the other hand, there's a lot of devices out there that are vulnerable. I'll let Dan expand on that a little bit too. Yeah, we're not used to seeing attacks or vulnerabilities that affect an entire industry in security. Generally, we'll see somebody wrote some buggy code, there's some library, there's some software.

Starting point is 00:25:52 And sometimes it's particularly bad because a lot of people depend on this piece of software and then people will fix it or maybe individual users have to patch it on their own. And we're kind of used to this rinse and repeat process in security. What we're not used to seeing is something come out where an entire industry, where the problem is the status quo. It's a programming pattern. It's the way that an entire industry does things. Can you imagine if the automotive industry just fundamentally doesn't do bounds checking? And then every time you ask them why they do it this way, they give you some excuse about they have strict overhead requirements

Starting point is 00:26:30 and they don't have the time to bounce. No, those are all bad arguments. You absolutely have the time and overhead to do this properly. These are important devices. IoT devices are not toys anymore. They're home security devices. They're things you put your body into. There's things that you put into your body that are IoT devices are not toys anymore. They're home security devices. They're things you put your body into. There's things that you put into your body

Starting point is 00:26:48 that are IoT devices. This stuff is important. We can definitely do it the right way. So that's kind of the first level of that. Because we're talking about an entire industry here, remediation is tricky. The IoT industry is not one thing. It doesn't use one library. It doesn't

Starting point is 00:27:05 use one piece of hardware. It's very heterogeneous. The good news is that this can be patched in software. The bad news is it must be patched in software. And that IoT devices are notoriously difficult to patch. Many of them are burnt firmware onto devices and have no update capability. burnt firmware onto devices and have no update capability. Some of them have the capability of updating, but just, you know, it's not simple to. And many of them do, in fact, have, you know, pretty low computing power to where, like, you have to at least give it some design consideration into, like, how to solve this sort of problem. I'm curious, you know, as the two of you were making your way through this research,

Starting point is 00:27:51 what was it like to realize the scope of what was going on here? I mean, did you have aha moments along the way where you kind of looked at each other and said, holy smokes, this just keeps getting bigger? For me, it was more along the lines of, am I doing this wrong? Like, really, seriously, am I properly, I think I'm following all the steps correctly. I took on tackling the STM32 and spent a considerable amount of time doing it incorrectly, not intentionally. I spent a lot of effort trying to make sure

Starting point is 00:28:18 I was spin looping properly to make sure I was getting proper random numbers, implemented it incorrectly on accident, didn't realize it. I went down a track of thinking, wow, this device is really producing absolutely garbage random numbers, not understanding what could possibly be going wrong,

Starting point is 00:28:35 discovered a flaw, tried it again, found a different issue, tried it again. We were actively verifying it every step of the way with hard line, really thorough statistical tests with Die Harder that we were doing it properly. No intern at a third-party firm who's been brought on to help with a late project is going to take that effort. So the fact that we spent measured effort at getting this right and couldn't do it is terrifying. It's just really hard to get right. Yeah, they say that things only occur

Starting point is 00:29:11 in increments and amounts of zero, one, or infinity. So it's possible that we would look into this problem and there'd be no instances of this vulnerability. Or it's possible that we could look into it and there'd be one buggy device out there right or it's possible that they're all buggy but having exactly two buggy devices would be that that'd be weird and unheard of so it's kind of like we knew that there was one instance of this already um going into it because like we kind of already found that so once we we took out a second IoT device and looked at it, and the exact same problem was there too, that was the major aha moment.

Starting point is 00:29:51 That was the major breakthrough of like, oh crap, I think this is everything. Because two devices by completely different manufacturers that use entirely different software stacks that have completely different devices built on them, have exactly the same issues, now we suddenly realize this is actually everywhere, that this is because of how the industry uses them. Our thanks to Dan Petro and Alan Cecil from Bishop Fox for joining us. The research is titled You're Doing IoT RNG. We'll have a link in the show notes.

Starting point is 00:30:38 And now a message from Black Cloak. Did you know the easiest way for cyber criminals to bypass your company's defenses is by targeting your executives and their families at home? Black Cloak's award-winning digital executive protection platform secures their personal devices, home networks, and connected lives. Because when executives are compromised at home, your company is at risk. In fact, over one-third of new members discover they've already been breached. Protect your executives and their families 24-7, 365, with Black Cloak. Learn more at blackcloak.io. CyberWire Research Saturday is proudly produced in Maryland out of the startup studios of DataTribe,

Starting point is 00:31:29 where they're co-building the next generation of cybersecurity teams and technologies. Our amazing CyberWire team is Elliot Peltzman, Trey Hester, Brandon Karp, Puru Prakash, Justin Sabey, Tim Nodar, Joe Kerrigan, Carol Terrio, Ben Yellen, Nick Vilecki, Gina Johnson, Bennett Moe, Chris Russell, John Petrick, Jennifer Iben, Rick Howard, Peter Kilpie, and I'm Dave Bittner. Thanks for listening. We'll see you back here next week.

CyberWire Daily - IoT security and the need for randomness. [Research Saturday]

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.