Embedded - 414: Puff, the Magically Secure Dragon

Starting point is 00:00:00 Welcome to Embedded. I am Alicia White alongside Christopher White. When people tell me about chip bugs and non-alpha silicon, I usually nod quietly and wonder what bug they have in their code that makes them think they found something so rare. Not today. Today we're going to talk about a bug in the silicon that can be used to hack a system. And I'm happy to talk to Laura Abbott today. Hi, Laura. Welcome. Hi, thanks for having me. Could you tell us about yourself as if we met at the Hardware.io

Starting point is 00:00:37 conference next week? Sure. I'm a firmware engineer at Oxide Computer. I've been there since January 2020. For those who haven't heard of Oxide, Oxide is rethinking the server from the ground up. Servers haven't really changed in a number of years, and Oxide is hoping to build know, oh, a Dell computer next to my desk that, you know, some people get some files from. But these are big things that go in racks and for data centers and stuff like that, right? That's right. But that's actually a good comparison because the servers that you can buy from Dell that go in racks actually look a lot like your Dell PC. And it turns out that's a pretty difficult thing to work with, especially for new hardware. And big companies like Google and Facebook are designing their own hardware these days that's much nicer to be able to use. But if you're not one of these

Starting point is 00:01:38 big companies, of course, you don't have the chance to be able to buy this because you can't afford to be able to design your own hardware. So that's sort of what Oxide is going for, being able to build really nice hardware to be able to deliver this because you can't afford to be able to design your own hardware. So that's sort of what Oxide is going for, being able to build really nice hardware to be able to deliver a great experience. We want to do a lightning round where we ask you short questions, and if we're behaving ourselves, we won't ask how and why and all of that. Okay. Do you like to complete one project or start a dozen? I honestly have a tendency to do both, depending on where I am

Starting point is 00:02:06 and what type of thing I'm looking at. I definitely have a whole bunch of sort of electronics projects that are half completed that I need to actually finish sometime. But I think other things I tend to do one at a time. Hubris or humility? Oh, I'm going to have to go with hubris. Favorite Cortex-M?

Starting point is 00:02:28 Ooh, that's a tricky one. I'm going to have to go with the Cortex-M33 because I do have a soft spot for that trust zone. But no floating point? Less of a floating point.

Starting point is 00:02:45 Urgency or rigor? Did you go through and put it in Oxide application? I'm going to have to go with rigor, I think. That question is from Oxide's interview questions. What do you hear most often? Oxide has a number of questions people ask for on the application. And I think one of the questions Oxide likes to ask is, you know, talk about two values in tension, how you resolve them. And people do talk a lot about urgency versus rigor because that's a fairly common thing for people to talk about in engineering in terms of trying to figure out, okay, how much work do I need to do to make this correct versus can we get it done a little bit faster?

Starting point is 00:03:27 Favorite fictional robot? I love WALL-E. That was a great movie. Pi OCD or Open OCD? Ooh, I'm definitely going to have to go with Pi OCD. Open OCD has a soft spot in my heart, but, you know, it's kind of a pain. And my colleague, Cliff,

Starting point is 00:03:47 if you happen to listen to this, you know, I respect your choice to use OpenOCD. What's PyOCD? I haven't heard of this. I haven't either. PyOCD is a Python library to be able to do debugging support.

Starting point is 00:04:02 It has support for this SEMSIS DAP standard to be able to connect to a. It has support for this SEMSIS DAP standard to be able to connect to a lot of microcontrollers. And that's what we end up using at Oxide, or at least that's one of the tool chains we end up using. We should check that out. Yeah, sounds cool. Do you have a tip everyone should know?

Starting point is 00:04:18 Always read the documentation and don't be afraid to try things. Okay, so before we get into the silicon issue you found that you're going to be presenting at Hardware.io soon, tell me about the chip in general. It was the NXP LPC55? That's right. The NXP LPC55 is a Cortex M33 that we evaluated.

Starting point is 00:04:48 It has a number of nice features that when Oxide was evaluating for a chip for our root of trust that we found. And it has things like a strong identity to be able to give a cryptographically secure, unique identity. It has some hardware accelerators, but that's less important to us. It has a secure boot to be able to do things. And we chose these features in particular because they let us be able to build up for what we're doing for the root of trust. So the root of trust is that I have to be able to trust everything in the whole chain in order to trust that what I'm doing

Starting point is 00:05:28 at the end is useful. And this is often used in firmware update. Is that what you're talking about? That's part of it. So the idea behind the root of trust is sometimes when you say that term is that people, that can mean a lot of different things to many people. When I say root of trust, we're talking about answering the question about what software is running on the system. So the idea is that we're going to say, okay, we're going to trust what's running on the root of trust, and then that can be used to build up other parts of the system. So the idea is that we know exactly what's running on there, and we'll be able to compare that against an expected set of hashes. And then, say, when it gets time to be able to do something like system updates, you'll be able to

Starting point is 00:06:08 not just install the update, but also get another set of expected calculations about what should be running on the system. So this is like when I, if I jailbreak my iPhone, I've lost the root of trust for Apple, and so their apps won't work on it. That's a good example there. And the LPC55 is an M33, which you said has Trust Zone. What is that? So Trust Zone is another way to provide isolation from code. So for a lot of chips, you'll often have the privileged versus unprivileged mode. TrustZone provides another axis of secure versus unsecure. So you could have secure-privileged,

Starting point is 00:06:52 secure-nonprivileged, nonsecure-nonprivileged, nonsecure-privileged, and other things like that. Every permutation. Yes. So, I remember using on Cortex-M4, there was a memory protection unit. not an MMU, but a memory protection unit that would allow you to mark various pages as privileged versus non-privileged. Is this part of that? It's a similar concept. So the MPU is still definitely there, and that provides isolation to be able to choose what things are accessible. The trust zone is based on, uses a controller called the SAU security attribution to be able to specify which

Starting point is 00:07:32 regions are secure or non-secure. So in our system, for example, we end up having both of those pieces configured. So when you're running in a non-secure world, for example, the regions that are secure are specified. But you also have the MPU going to specify what regions of memory you're allowed to touch. Okay. And if I didn't have the trust zone, what would be different? How would I treat my—is it only for IoT systems? If you didn't have TrustZone, you could still build a secure system, but it's helpful to think about TrustZone as just another layer of protection that provides another way to isolate things. So it's even more, so the goal is that if you're

Starting point is 00:08:19 running in code in a non-secure world, you should only be able to get into TrustZone through very specific paths. So if for some reason you didn't have TrustZ world, you should only be able to get into trust zone through very specific paths. So if for some reason you didn't have trust zone, then you would be left with coming up another way to be able to fully protect that. And if you have a properly working system, that always should be fine. But the idea with, for example, something like trust zone

Starting point is 00:08:38 is that if you did end up having a bug in your code that, say, might expose secrets, then the other layer of security protection will make it even harder to get those secrets. Okay. And this is part of the ARM chip, not part of the NXP chip. That's correct.

Starting point is 00:08:53 Trust Zone itself is a part of a specification defined by ARM. And one of the things is that when you start looking at microcontrollers and looking at everything, it turns out that implementing various things like TrustZone can be optional. So depending on what version of the specification a chip vendor chose to implement, there may or may not be TrustZone.

Starting point is 00:09:16 But the LPC-855 has one. Correct. Do you use it? I guess, you know, it's okay. I understand security. You need to have something on there that is like the base of security. And you mentioned an ID. And then I can, in manufacturing, use that ID to assign a particular cryptographic key and put that in it. And then I use that for communications, including firmware update. What am I missing about the specialness of TrustZone? I think a good example there is that if for some reason you wanted to really make sure you didn't want to be able to read out the secret for being able to do your firmware update. So for example, if you had a key that you really wanted to keep private, you could put that in TrustZone so that when you are,

Starting point is 00:10:10 say a common design may be to have your bootloader be secure and then jump into non-secure, there would be no way to read that out once you're in secure mode. Another example that we were looking at for it to be able to do is provide even more, with our chips is being able to use TrustZone to provide even more hardware isolation. So we would only be able to access certain hardware blocks that we were

Starting point is 00:10:32 really not sure about and put them only in TrustZone. So that way the non-secure world couldn't potentially access them. So the non-secure world wouldn't be able to access the power off button. That's a good example, yeah. Or the rewrite flash code. Yes, and that's part of what we were looking at when we were evaluating is that, you know, what exactly can we do to make sure that our product is secure? And being able to rewrite flash is definitely one of those dangerous scenarios. And there are fuses you can change in manufacturing so that people can't read out your code over a JTAG link or a CMSIS DAP. This isn't really JTAG. And there are fuses you can blow so that you can't ever change what's on the board. But those are small things and cover large pieces of the system. The trust zone lets you change parts but not other, is that right?

Starting point is 00:11:33 Yes. So I think your example of fuses is a good one to sort of compare and contrast because it's also worth noting is that trustZone is ultimately just another sort of hardware configuration. And especially for things like fuses, that tends to be a permanent one-time thing. So TrustZone, you may choose to, once you have your finalized settings, you may choose to blow your fuses to be able to make sure you can't actually make further changes to your Flash, for example. Okay. Do you understand Tristan and Christopher? I do understand it better, yes. Okay.

Starting point is 00:12:12 Laura, you're giving a talk called Unwanted Features, Finding and Exploiting in ROM Buffer Overflow on the LPC55S69. That's correct. So what's that about? So this is a bug I stumbled on somewhat accidentally. So I mentioned that I'm a firmware engineer at Oxide Computer. My job is not actually vulnerability hunting. But during the course of trying to work with a feature of the chip related to software updates, I stumbled across a buffer overflow that could be used to break some security boundaries in the chip and really violate some pretty fundamental assumptions. So stumbled across, is this you had a horrible bug and then noticed it did something?

Starting point is 00:13:07 Uh, no. The Fifth Amendment doesn't apply to this podcast. So, honestly, I ended up finding out this bug because I was a little bit lazy, and I didn't want to write a parser for the update format that NXP was going for, or at least I started to work on it and realized, huh, this format is kind of complicated. There's a lot of fields in this header. So the update format that NXP uses is called SP2, and it starts out with an unencrypted header before actually getting to the keys and then commands to actually do things like erase the flash. And so the way that this is transmitted sequentially. So I started thinking and going, there's a lot of fields in this header. How well does the ROM actually validate all parts of this

Starting point is 00:13:58 header? So I had a ROM dump laying around, and I started looking a little bit closer, and I happened to find one of these fields that wasn't being validated correctly and gave me a buffer overflow. So I'm clear. What ROM is on the Cortex? This is a ROM that's specifically designed by NXP. This is a ROM. Cortex itself doesn't actually mandate any of this. This is a design choice that NXP actually made. And when we were initially first choosing a chip, this was actually, several of my colleagues pointed out that this might be a disadvantage just because they have had bad experiences with ROMs. And so far, you know, that wisdom has turned out to be very correct.

Starting point is 00:14:46 Interesting. But this is flash that is on the chip. It's flash or is it? It's probably flash. It may be masked ROM, but it's probably flash. Okay. And it's flash that we can't get to and we can't modify because it probably has had its fuse blown when it left manufacturing. Okay. And this is, like I gave that memory map talk and I had that area that was like the unused addressed spaces in the ocean and you can just, we don't know what all the registers are.

Starting point is 00:15:20 They don't tell us what all the registers are in the manual. They tell us the registers they want us to use. Yeah. And so this ROM code is kind of like that. It's like they tell us how to use it, but they don't tell us what it is. And what made you – you said you had coworkers who were distrustful of the ROM code. What made you look around for it? And how did you dump it? My coworkers who were distrustful, I think one of my colleagues, Cliff, in particular,

Starting point is 00:15:52 just pointed out that I think especially for what we're trying to build with the root of trust, part of what we're doing, because we really want to know what exactly is running on it. And I think especially as you gave a great example for the manufacturer being able to tell us everything that's in the ROM. And for building a root you gave a great example for the manufacturer being to tell us everything that's in the ROM. And for building a root of trust, this is kind of

Starting point is 00:16:09 terrifying because it's hard to know exactly is there something in there that may break our assumptions about being able to do our chain links to be able to do our measurements. So I think we initially were a little bit worried about this. And it turns out in this case that NXP did not actually add any sort of read protection off the ROM. So dumping the ROM was a very simple matter of literally just reading it out with a debugger and saving it. Okay, that was mistake number one. I don't know. I kind of disagree there. I actually think that the ROM actually should be available. But I mean, really, if I have a complaint, they should just be giving us the actually think that the ROM actually should be available, but I mean,

Starting point is 00:16:45 really, if I have a complaint, they should just be giving us the source code to the ROM. Yes, either it should be transparent, or you should make it totally opaque if that's the path you're going. Yeah, but totally opaque. I was going to ask this later, but now I'm going to ask it right now. So I worked with an authentication chip many years ago, and this stuff was still pretty much in its infancy, and it had a lot of problems. But the question I kept getting from other technical people in the company when we chose this chip or any chip that did authentication is, what's preventing someone? And back then, the question was, what's preventing somebody with a million dollars of equipment from, you know, acid etching down the chip and reading out the secret key from the flash or wherever it's programmed?

Starting point is 00:17:31 And I said pretty much nothing except a million dollars at that point. Now it's much less than a million dollars. Is that sort of thing? So when Alicia talks about, well, they should block off the ROM, does that actually prevent anything if somebody wants to go look at it visually? Or are flashes harder to do that with these days? I think it's still possible to be able to do that to some extent with the ROM. On the LPC-55, I think we did a little bit of investigation with that.

Starting point is 00:18:00 But I think to your question about being able to read out the secret key, one of the features that was appealing to us about the LPC55 was that it can't be cloned, such that it's only tied to the actual chip itself, which is a great way to be able to get that strong, unique ID. And you can do that to get further encoding so that even if you did happen to, say, get a copy of some of the Flash, if it's been encoded by the Puff, you can't actually decode it. Interesting. Okay.

Starting point is 00:18:42 Puff the magically secure dragon? Yes. I like that. Okay. So you found a buffer overflow and buffer overflows are the sort of thing that lead to security issues because if you have, say, a buffer overflow in a function, you may be able to overwrite the stack and then you can change the code to run what you want instead of what it was supposed to run, which is bad. I mean, that's why we get denial of service attacks and all kinds of things that lead us to say, always check your inputs for malicious actions. So what does the ROM overflow do? You gave a great example of what people usually think of when they think of a buffer overflow, which is that doing a classic stack overflow.

Starting point is 00:19:40 In this case, because of how the ROM was actually parsing the code, it was an overflow in the global space. So the idea is that you would just continue writing and it was something that's maybe somewhat closer. I call it a heap overflow, but that's incorrect just because where this was, it wasn't actually overflowing in the heap. It was overflowing something to the equivalent of, say, the BSS section that's normally all zeros and then you later set up. So I found an overflow there and then it turned out right next to

Starting point is 00:20:12 this global variable I was able to overflow was the heap address or heap allocator and I was able to use that to be able to turn that into a way to get code execution. And I will be talking more about all the gory details about how I did that in my talk. Okay. And just to make it clear,

Starting point is 00:20:33 the talk is in early June, second week of June. That's correct. In Santa Clara, California. And it's physical. You're going to a real, actual, in-person conference? I am. I will admit I'm a little bit nervous, but also excited to be able to potentially do a meetup and see people in person. And have you been to a hardware IO event before? I have not. I heard about the conference from my colleague, Rick,

Starting point is 00:21:10 who suggested after we found this bug that it might be an interesting place to submit a talk. They tend to do security talks more than any other kind of embedded talk, but it's all very embedded. Yeah, it's all definitely fairly low level. A lot of things at the hardware layer. I'm excited to see some of the other talks that are going

Starting point is 00:21:33 and really get to learn about more aspects about the hardware security. And usually their talks are put online afterwards, and I'm hoping yours will be too. Do you know? I don't know right now, but again, I'm hoping the talk will be online. If not, I will definitely be releasing some more of my slides and details about once everything is over. Cool. Have you written your talk yet? You can tell me. I know how this goes.

Starting point is 00:22:00 I'm in the process of doing that. I am not one of those people who can write the talk and slides while they're on the plane. I need to be practiced. I definitely have the slides going, and I'm beginning to practice to make sure I have everything. Especially for a talk like this, I really want to make sure I'm explaining everything correctly, because I realized I did a test of the talk, and there are a few more diagrams I need to add to be able to explain things, like what exactly the puff does. Yeah, okay. And how you found it, and the tools you used, and really, how can we use this? How can a malicious actor use this?

Starting point is 00:22:40 How can we use this to do things? That's actually how a malicious actor could use this to do bad things is probably the most interesting of questions. What do you have for that? That is a very interesting question. And I think it's sort of, part of it goes back to the system configuration of the chip. And I'd say, you know, when I say I found a buffer overflow in a software update that sounds pretty bad, and, you know, people might initially say, wow, this thing is completely broken. But it actually, if the chip is properly configured to prevent modifications to certain configuration areas, it's not completely broken. You can't, I mentioned the chip has Secure Boot.

Starting point is 00:23:23 You can't change Secure Boot keys. You can't change I mentioned that you pass secure boot, you can't change secure boot keys. You can't change other various configuration settings. But what is available for you to be able to do is to perhaps write to unwritten flash pages that aren't covered by a secure boot image. If you have another image that's already been signed, you could do a rollback attack to be able to boot an older version

Starting point is 00:23:42 that say might have a buggy version. One of the most serious issues we found with this is that typically the way the chip is set up, it has a feature called DICE that's designed to be able to compute an identity. And part of the way that DICE works, it relies on keeping a particular puff-encoded secret restricted. So the idea is that once it calculates the value for Dice using the existing image, it will restrict access and make a change to register to prevent you from being able to access that same puff-encoded value.

Starting point is 00:24:21 It turns out that at the point this buffer overflow happens, you can read that value out, which means it's possible to be able to write some code on there to be able to read out this puff and coded value. You should really not be able to and be able to say clone an identity. Yeah, clone the identity. That's where the being able to read the identity comes out. Yeah. And so you could make another system that pretended to be yours and then use that to probably practice other attacks. Yeah. And that's especially for what we're trying to do with being able to have root of trash measure other parts of the system. That's pretty bad if you could have another part of the system pretend to be the measurement and say, oh, here are some measurements. Yeah, they're definitely coming from me. Wait and quick.

Starting point is 00:25:08 Do I need physical access if I was a malicious actor? That ends up depending on how exactly the chip itself is set up. So I'd say you do, but it's also important to think about what physical access actually means. Just because, say, if you have this chip deployed out there and it's getting updates over the network, then maybe it might be able to do things over there. But, I mean, it requires stuff to be physically sent over a hardware interface, perhaps. But there's also a way, depending on the other software you've written on your system, you can also invoke it that way. So it's not one versus the other. It depends on a whole bunch of other parts of your system. Because it's part of the in-system program, which can happen over

Starting point is 00:25:55 UART spy, I2C, it can. And so you need to send things over one of those. Of course, one of those might be attached to a Wi-Fi chip or something like that. You can in-system program this over CAN? Yes. Isn't that awesome? I guess it makes sense. Yeah, no, you'd have to. Yeah, there's a version of this chip that does in fact have CAN support.

Starting point is 00:26:22 So I think for those who aren't familiar with CAN, this is oftentimes used in automotive. So the idea is that, yeah, if they're trying to update something in your car, you know, you could potentially be able to send things that way. So you've mentioned that as long as things are set up correctly, this probably doesn't affect most people how do i avoid letting people clone my device how do i avoid having the puff be puffed sorry i can't get past puff it's it's a great name but uh yes i should uh clarify that um i i think when i said that it doesn't affect certain things, I mean, this isn't as completely bad as it could be.

Starting point is 00:27:10 Is that the POP you can definitely always read out as long as you're trying to use this buggy code. And I think the real concern, people, is that if for some reason you didn't actually fully seal the CMPA programming area, it's possible to change a whole bunch of things there by rewriting the flash. But to prevent using this, I think we did a lot of evaluation when we found this issue at Oxide, and I think we came to the conclusion that the best way to avoid this issue is just to not use this ROM update code at all. That's the safest path until you get a fixed chip. Oh, and so you can write your own programming, your own flash programming, and have your own ROM update or your flash-based update like everybody else does.

Starting point is 00:28:02 Yeah, that's correct. And I mean, it's definitely considered a rite of passage, I think, to write a software update on microcontrollers at some point. It's a rite of passage that gets repeated over and over again sometimes. Serial drivers, bootloaders, and... They all work differently, so... Yeah.

Starting point is 00:28:20 So has NXP had a reaction? Yes, they have. So I think we were generally pleased with how NXP had a reaction? Yes, they have. So I think we were generally pleased with how NXP responded to this. We sent them the proof of concept and they definitely accepted and, you know, we were able to get a fix out. And I think, you know, we had previously had an interview reactions with NXP's product response that was less than satisfactory. And I think this time they definitely took it more seriously. And they certainly made the announcement. And I was actually pleased to find, I think it was last week,

Starting point is 00:28:52 I stumbled upon that the security vulnerability was actually publicly available on their knowledge base. And this was something I don't think we had actually seen before. So I think it's really good to see hardware vendors like NXP making these things public because I think all these things should in fact be. So I think it's really good to see hardware vendors like NXP making these things public, because I think all these things should in fact be public so that if there's a chip vulnerability, you know, you should know and it should be freely available to everybody. Certainly, if you're advertising that your chips are secure and that you have all these features,

Starting point is 00:29:20 then being open about when they fall down is probably a prerequisite of being trusted. Yeah, and I mean, it's not that, you know, you expect chips to be completely bug-free and to never be in errata, but I mean, it's a matter of making sure everybody can, like, actually be aware of it. And are they changing... Is it in the errata?

Starting point is 00:29:39 Are they changing their ROM for new versions of this chip? Or are they coughing up that programming code so that you could fix it and compile it and put it part of your program instead of in their ROM? I think they do plan to issue fixed chips. And I was thankful I got an engineering sample to be able to test it and verify that it was fixed. So I think the hope is we'll be able to get some fixed shifts.

Starting point is 00:30:09 But of course, you know, trying to get your hands on any kind of silicon these days is difficult. And even in the best of times, trying to seen for, I want to say, big bugs. But for, I guess I only see it when I hear about something so catastrophically bad that then I go look at the NIST database. What made you do that and how did you figure out that nobody else had found it before you? So NIST and the CVE database is an interesting discussion. So CVE assignment is ultimately left, it's ultimately, I'd say, left up to both the reporter who is finding a bug and, say, the receiving end of a bug. Sometimes companies may do the receiving end themselves. But ultimately, Oxide decided to report the CVE to NIST to be able to have an easy way

Starting point is 00:31:17 to track the vulnerability. And that's really what I see this as being about, is being able to say, OK, we need to have a way to identify this and be able to point to specific ways that it goes there. And you're right that you oftentimes see CVEs as being highlighted for big issues, but anyone technically can request a CVE if they want for any kind of issue. I think it's important to always read the details about what's on the NIST database to see what's actually there and what the issue actually means. And then to your question about how did you know if anyone else had actually found this issue already? That's an interesting question, one I actually thought about a lot. And, you know, I honestly, I don't think we had a good answer. And I think there wasn't a good way to know until

Starting point is 00:31:58 we actually tried to publish this and see if anybody had come out. I think if someone had come out and said they had already found this issue, honestly, I would have been really excited to see what exactly someone else was doing to be able to find this. And you'd hope that NXP would tell you, thank you, we've already started fixing this. And here's your $10,000 reward.

Starting point is 00:32:19 Yeah. Yeah, I mean, when we reported to NXP, you know, they'd immediately come back and offer us, you know, say, oh, yeah, you know, we're getting ready to fix this. But I mean, you know, sometimes these bugs aren't found, but we are looking forward to being able to get, you know, fixed chips and being able to deliver them. Finding this bug and writing it up in such a way that NXP can take action on it and writing it up for the NIST vulnerability database, this all took time that you didn't have to spend. That is correct. I think Oxide definitely supports, you know, my work and finding issues like this. But at the same time, I think, you know, we're kind of tired of doing this work of being, you know, finding these bugs. And I think, you know, we hope that this is the last one that we found.

Starting point is 00:33:16 Kind of hope that this is the last one you found. But do you think it is? I think it is. I mean, at least for now, I think. But who knows exactly what exactly is going to try and do will end up happening. good for the community, but Oxide's building servers. So it is nice of them to give you the opportunity to talk more about it kind of on their time. I imagine some of the preparations on your time. Yeah, and I'm grateful to Oxide, but it definitely does take some time to be able to do this. And I think internally, we did have some back and forth about what exactly we should do.

Starting point is 00:34:08 There's a lot of people who work in security will have many opinions about how exactly the disclosure process should work. And I think in some respects, Oxide's way, the way we do disclosures, how we would like to be disclosed to is that when, you know, someone inevitably finds an issue in Oxide's bug, I mean, we'd like to believe is that, you know, if someone came to us with a proof of concept, we'd, you know, take it seriously and be able to give an estimate about when things were fixed. But it definitely does take a time, a lot of time to be able to do all that. But ultimately, I think it's good, you know, and not just for the community, but I think it's also that I think it's the right thing to do. But I mean, there's lots of debate out there about how you do disclosure. Beyond the right thing to do, I think for a company with a product like Oxides, which is

Starting point is 00:34:52 a server, which has certain security requirements, demonstrating competence, that, oh, we at Oxide find these kinds of issues and are really good at it. And that should give you more confidence in our ability to make solid hardware. I mean, that's not nothing. Yeah. Absolutely. Being able to say we found these vulnerabilities, therefore we're not passing them along to you is definitely confidence building for the server. When is the server coming out? We're still definitely working on building it,

Starting point is 00:35:29 iterating on hardware and trying to be able to do things. What's the space for when we have an oxide space for when we're actually able to deliver it? But it's definitely coming, and people were out doing bring up um about two weeks ago it was very exciting to see another iteration of uh hardware come out and make a you know a lot of lights blink and fans turn on your website says late 2022 is when you're going to start shipping racks so i won't ask you beyond that because i know very well that if you answer it's probably bad

Starting point is 00:36:04 the website needs to be updated as well. Oh, okay. Well, we'll just leave that as it is then. You said you were looking forward to some of the other talks. Do you have anything in mind? Yeah, I'm excited to see things related to glitching and side channels. There's a talk about breaking stock security by glitching data transfers. I'd love to learn more about in the future about how to do physical glitching and side channels. There's a talk about breaking stock security by glitching data transfers. I'd love to learn more about in the future

Starting point is 00:36:27 about how to do physical glitching attacks. I bought a chip whisperer, which is designed to be able to do glitch attacks. I haven't had a chance to actually sit down and play with it sometime and hopefully be able to try that on the LPC-55. And find something else? I don't want to find something else but you know there are certain

Starting point is 00:36:46 things I'm like I wonder if we if I glitch this here if I could actually break it and it's sort of like this you know tempting me just to be able to find something else yeah so tempting it's a different attitude toward development because when I'm my attitude toward development is

Starting point is 00:37:01 I want to get this code done and never see this again. And I certainly don't want to find out there's something wrong with the chip. I think it's admirable to have, I think your attitude is much better to be deeply curious about the things you're using. And I think it's cool that the company supports that because a lot of times, I think you might've mentioned that you're right, that other companies might not be so supportive of that. And what are you doing finding a bug in this thing that may or may not apply to us?

Starting point is 00:37:35 Just get this done. Yeah, I'm really lucky to have a lot of support from everyone at Oxide. And I think I was also joking on before to some coworkers is that in some respects, the fact that I had to do some reverse engineering to be able to figure this out actually made it more tempting

Starting point is 00:37:52 to try and figure out what's going on. If they at NXT actually just put up all the C code for the ROM, I may have been less tempted to want to dig into that and just do a whole bunch of reading of the code to be able to find out what was there. Because it would be transparent. And if somebody else, if it was transparent, someone else probably looked at it. And reading code is not as fun. Reading code is not as fun. Reading code is important, but that doesn't mean it's always fun.

Starting point is 00:38:29 It's more fun to pit yourself against the puzzle of what they've done. Yes, and reverse engineering is definitely a fun puzzle to try and figure out just because Ghidra is a great tool for being able to reverse engineer, but it doesn't do everything, you know, always do everything perfectly. It's still up to you to figure out exactly what exactly this code is doing and what it's calling. Gidra, that's the one that you put in ARM machine code, and then it makes assembly code, and then it makes C code. Is that right? Yeah, that's correct. You give it some code, and it will disassemble it into assembly, and then also attempt to put it back into something C-like.

Starting point is 00:39:03 How well does that work? I mean, how does it decide what to use for variable names? It tends to assign them sequentially. So you're absolutely right that figuring out what the variables do is one of the first things you do when you're looking at a disassembly. What you end up with essentially is that if you imagine if you took a C function and took away all the nice names for everything and everything is just, you know, variable one, variable two, variable three.

Starting point is 00:39:28 So it does a lot of complicated algorithms behind the scene to be able to generate this. And then it's, you know, you're left trying to figure out, pick up patterns. But it's pretty easy to start guessing what things are, for example, like, oh, this looks like a loop. And especially with something like when you're engineering a ROM, I spent a lot of time comparing what the ROM was

Starting point is 00:39:48 actually accessing to physical hardware blocks in the memory map. So I was able to say, okay, this function is touching the GPIO block. This function is touching the clock configuration block, which gave me a good idea about what things were doing. How much code was this in the ROM?

Starting point is 00:40:06 Ah, there's a pretty big chunk of stack there. I mean, it includes stuff to be able to do the ISP. It supports, there's a USB stack. Oh my God. It gives you an idea about how much you have to be able to support in ROM. That's a lot. Wait a minute. Gidra is from the National Security Agency?

Starting point is 00:40:28 Yes. Okay, I had no idea. But they have a Git repo, and it says NSA, which, you know, that's interesting. Too many secrets. Sorry. Yeah, but that sometimes makes some people nervous.

Starting point is 00:40:44 But, I mean, I found it to be a great tool. Too many secrets. Sorry. Yeah, but that sometimes makes some people nervous. But I mean, I found it to be a great tool. I'm not actually an expert in reverse engineering. This was really one of the first serious projects I've ever taken up. But I found Gator to be a nice tool that's available. Does it put the C library all together so that you can identify what the C functions are? Like string copy? I think it might.

Starting point is 00:41:12 Again, I learned, I'm learning a lot about Ghidra every time I use it, but it has some things built in to be able to identify things like common formats, but it did not have a way to automatically detect things like string copy and mem copy. So that was actually one of the things I ended up having to spend some time doing and staring at some of these functions and realized, huh, okay, this is actually just mem copy, just written out of a bunch of assembly because it's well-optimized mem copy.

Starting point is 00:41:38 Yeah. Optimized code is very hard to understand. Yes. Wow. Now I kind of want to play with it. What do structures look like in Gitter, Gidra? There is a structure editor, so you can define your own structures to be able to say what things want. So the idea is that you can edit it field by field

Starting point is 00:41:57 and be able to specify the layout of things. Yeah, so I think as you go along, you kind of deduce what things are and then rename them and sort of make it more readable from the automatically generated stuff. I hope you're right. Kind of like a crossword puzzle where things have to line up. The crossword puzzle is actually a great example because I think sometimes what I ended up doing was saying, okay, this is definitely a structure. It can tell by what it's accessing.

Starting point is 00:42:21 And it's also got some other nested structures. So I sort of end up with, you know, structure one has structure two has structure three. And I knew how, I could guess how big things were. And then having them all have fields with, you know, assorted names and being able to try and guess what these was. And as I looked at the code, I would go back and be able to say,

Starting point is 00:42:38 okay, this looks like it's calling a function that's for initialization. This is like a teardown function to be able to change the names to better match. I'm just now thinking of terrible interview questions, like just hand somebody this and a pile of machine code and say, okay, tell me what this does.

Starting point is 00:42:53 You have a couple hours. I don't know. I swear I feel like I've seen that as an interview question before about, you know, tell me what this code does. I've seen that, but not from... Yeah, I've been given that question. I got very angry.

Starting point is 00:43:08 I've been given that question twice. It was because the question was, what does this code do? And the code was plus plus, plus plus, plus plus, plus plus. There was minus minus. No, no, there was minuses, minuses. Minus minus, plus plus, I. And the answer is it gets somebody fired. Well, there was stuff afterward, too.

Starting point is 00:43:23 Because if it's just on one side, it's fine. But there was stuff... The only answer is it it gets somebody fired. Well, there was stuff afterward, too. Because if it's just on one side, it's fine. But there was stuff. The only answer is it should get somebody fired. And if that's how you write your code, please let me know so I can leave the interview right now. The other question I got asked, what is this to, was Duff's device, which is a loop unrolling thing. Oh, my God, that's so hard to identify. Yeah. That's an annoying question.

Starting point is 00:43:43 And yeah, I hate that interview question just because it involves actually knowing the answer beforehand. It's one of the things you keep actually solved in an interview. I mean, it's very cool to be able to learn about, but that's not a great interview question. I agree completely. You do a lot of interviewing for Oxide, don't you?

Starting point is 00:44:01 You know, I do quite a bit of interviewing with Oxide. I've gotten a chance to meet a lot of candidates, and I help with application review, too. What do you look for? I don't think there's one necessarily right answer about what Oxide is looking for. I think it is a combination of some level of experience, but also an interesting interest in what we're building. I think, you know, when I say interest, some people think, oh, yeah, so you need to be completely passionate. But I think sometimes when people say, you know, passion, they assume that

Starting point is 00:44:34 means it must be, you know, all-consuming, the only thing you ever do. But I like think it is more, it's about, you know, can you demonstrate that, you know, you're able to get the job done? And I think there's a lot of different ways you can show that you have relevant skills to be able to do what you want. I mean, can you talk about what have you built before? I always like to ask people about the past problems they've solved, because I think that shows a lot about types of things they've solved and exactly what problems you actually overcame. And I think Oxide has definitely tried a unique approach with its materials question and getting a chance to be able to show exactly what they want, just because I think that materials are an interesting way to be able to show off a different background, for example.

Starting point is 00:45:19 Can you tell us about the materials question? Sure. So Oxide has everyone submit written materials. I'd say one thing that Oxide definitely values is being able to write well. I think Oxide asks for a work sample, which can be left open-ended. So it's a way for you to be able to talk about what people have done. If you've done open-source work, that's a good way to be able to be able to do there. And I mean, a lot of times what I'm looking for there is that how exactly, you know, does that relate to what Oxstead is doing? I mean, what exactly are you showing me about why that would make you a good person to work with? Oxstead also asked for an analysis question.

Starting point is 00:45:58 I honestly love reading the analysis questions just because I love seeing what kind of problems people have worked through in the past and getting to work through the nitty-gritty details about these weird bugs and seeing what sort of things people have done. And then there are also some questions related to, you know, your happiest time, your unhappiest times. And some people may think these questions are a little bit cheesy, but I think they also are a good way to get people to really reflect on, you know, what exactly they've learned and maybe even, in fact, things they wish they had, you know, done better or might have done differently today. I find those kinds of questions much, much better than solve this problem or... Duff's device. Or, yeah, or, you know, here's a high pressure situation that you'll never, ever actually encounter. You have 30 minutes to do some code thing that, you know, if you had four or three hours, wouldn't be a problem. I like engaging with the candidates and figuring out, okay, do they have a history of solving things? Do they have a history of delivering? And, you know, are they somebody I want to work with. And I like what you said about engagement instead of passion, because

Starting point is 00:47:05 I can be really engaged with work, but also not very passionate about it. So I think that's a distinction. I mean, that's why we're consultants, because... I don't want to be. We don't want to be passionate about companies anymore. We want to be passionate about our lives. And I'm happy to be engaged. But at the end of the day, I'm not going to be dreaming about your product. It reminds me of the time I was asked

Starting point is 00:47:32 if I had a passion for iPod and then I knew I was doomed. You don't have a passion to, you know, listen to, you know, 4,000 songs in your pocket? Well, no, that part was cool. It was, well, interviewing there. They wanted to know if I was super into iPod. I didn't have one.

Starting point is 00:47:49 And music. Does that count? So what do you do? What was your day work like aside from finding holes in LPC-55? Yeah, so I do a lot of firmware work in Rust, so I spent a lot of my time doing that, and I'm writing code that goes from the root of Rust and sometimes related to the service processor.

Starting point is 00:48:15 I also help with code reviews, and I'm lucky to have a lot of fantastic colleagues as well, so I'll talk to them if I have questions about what I'm doing or especially Rust. I didn't really know Rust before I joined Oxide and I've definitely gotten much better at it. But I mean, there's certainly a lot to learn there to be able to pick up on everything and be able to do a lot of things correctly there. I know that the Oxide folks like Rust. I mean, they named their company after the language.

Starting point is 00:48:44 How do you like Rust? You can tell me. Be serious. I do like Rust. I promise there's not a Rust crab sitting next to here, pinching my leg, telling me to say this. Mostly, I like to say is that Rust, it's a powerful language. And I think it also, it makes it a lot easier for me to be able to write C-like code

Starting point is 00:49:09 because of what the language offers. The fact that I don't have to think about array index out of bounds errors or it will give me an error in a way I can actually parse is much nicer. I think it was some time ago, I remember I was working on making some change to the hubris humility stuff, and I ended up hitting a bug that I think probably would have taken me

Starting point is 00:49:34 significantly longer to figure out if it had been done C simply because it would have been some sort of silent array index out-of-bounds error as opposed to giving me a nice error message. And I think it's things like that that are really great to work with. Has it been difficult? Is the language changing at a rate that's somewhat difficult to come up with? That was one of the issues I had with iOS development with Swift. It was like, every six months, like, oh, here's Swift 5.5, and look at

Starting point is 00:50:00 these eight things you can do that are really complicated now but are probably cool and you should learn about them. And it got really in the way of writing code sometimes because I was like, well, I got to keep up with the latest thing. As Rust is also a new language, has that been an issue? I think the Rust community has tried to minimize that in terms of splitting things out and having a well-defined process for a stable tool chain and an unstable tool chain. So I think if you're working with the stable things, you should mostly be able to find things are roughly the same and you'll be able to do things.

Starting point is 00:50:35 Now, I think, especially for what Oxide is doing, we are definitely close to the leading edge of things. So I think there are certain features we're keeping an eye on that we're hoping to see go stable. But I think that the language has definitely come a long way and it should be pretty stable to be able to do a lot of things.

Starting point is 00:50:53 Going back to the bug you found and are going to be talking about, this wasn't the only one, was it? No. So actually last year, I ended up finding another bug or a different kind of bug. That was actually why I originally had the ROM dump around, was that while taking a look at the ROM dump, I discovered there was an undocumented hardware block

Starting point is 00:51:18 that could be allowed to patch the ROM and be able to make changes to the ROM. And something like this definitely does have its use cases, but it couldn't be logged out, so it was possible to reprogram it, which could be used to break isolation between the secure and non-secure world for TrustZone. That sounds kind of important. Yeah, and I mean, I think this was our first experience with NXP,

Starting point is 00:51:43 and that was the one I think that we were less than satisfied. I think it took a little bit of convincing to have NXP believe that this was an issue. And then I think more than anything, Oxide, we really just wanted NXP to give us the documentation for what this thing was doing and make sure everybody knew it was available, just because there's good reasons

Starting point is 00:52:01 to want to be able to patch your ROM. I mean, ultimately, your ROM is just code, and you're probably going to have a few to patch your ROM. I mean, ultimately your ROM is just code, and you're probably going to have a few bugs in your ROM. This is understandable, and you need to have a way to fix that up. But I think what's also important is to make sure that you can't reprogram that to say to be able to do other things you weren't expecting. So let's go back.

Starting point is 00:52:20 The RAM patcher, ROM patcher, sorry. Let's go back. Yeah, RAM patcher would let you modify the ROM, including how you program for the trust zone, including the puff generator system and the firmware update. And, okay, so why didn't you ditch NXP at that point? That seems really important. How did they, what? Yeah, this is a question we get a lot. And again, we spent a lot of time trying to figure out exactly what we should do. And it sort of comes down to a couple of factors.

Starting point is 00:53:07 One was that I mentioned we had some specific requirements for what we were looking for in a chip. And it turned out that there weren't a lot of chips out there that met our requirements. But we still have the documentation for write-up when we selected this chip back in spring 2020. And even back then, there were some chips we had to rule out simply because we couldn't actually get our hands on silicon.

Starting point is 00:53:29 All we could get were data sheets. Probably still can only get data sheets. Yeah, and so trying to find that. And then there's also the factor of about, you know, we're pretty far into our product, you know. We're getting boards and being able to do things like that. So trying to find another chip and be able to put that in and then having to do even more silicon, you know, we're getting boards and being able to do things like that. So trying to find another chip and be able to put that in and then having to do even more silicon, you know, takes

Starting point is 00:53:49 more time. And we've all spent a lot of time evaluating the chip. So I think we know far too much about the chip by now. So in some respects, we are reasonably confident we know exactly how this thing works. So we've decided to go with it. I do think this is a great lesson that, you know, for everyone to think about. And I'm definitely going to be talking about this in my talk about making choices like this. I don't wish silicon bugs on anyone, but sometimes you end up having to make these hard choices. ARM itself has a module for ROM patching. They used to, actually, is that there used to be the flash patch breakpoint unit, which was for ARMv7

Starting point is 00:54:27 and earlier, but it was explicitly removed in ARMv8m, I think, because of TrustZone, because they realized you could actually, you know, use this to be able to do bad things. So, have you used it to do bad things to show the vulnerability? Yes. And when we found this issue, I think we shared it to NXP, and I don't think they were fully convinced. So I worked with my colleagues to be able to do a full proof of concept. And my colleague, Rick, I think was the one who really helped to dig in and figure out how to turn this into something that was pretty impressive.

Starting point is 00:55:01 And I think we joked about figuring out how to do assembly code golf in terms of, you know, finding the smallest number of instructions we could do to be able to do something interesting to be able to reprogram the ROM. And what Rick and the rest of us eventually came up with was something to be able to take the, you know, what is essentially reference code out there and demonstrate that using the expected APIs, we could have the non-secure world read out stuff from the secure world, which was definitely not supposed to happen. So how did you fix that?

Starting point is 00:55:35 That one was actually somewhat easier to fix, mostly, is that it is possible to actually lock out changes and access to the ROM patcher via another security mechanism on the NXP. So that, in fact, is available or at least restrict it to only certain levels such that only certain levels are able to make modifications. Is it something you have to do on boot each time? Or is it more like a fuse that you say, okay, never again can the ROM patch this? No, it's not a fuse, unfortunately. You have to do it each on boot. So if somebody could hijack the boot, they could read out your secrets? Yeah, if you managed to hijack the boot and disable that check, you know, you were probably

Starting point is 00:56:27 running into some problems, assuming you could be able to take something else to be able to do this. I mean, this is a lot of times what security is looking like, is that, well, if you could do this, you could do this, you could do that. So it's all a matter of finding, you know, that one little inch and being able to come up with the mile. Well, this has been really interesting, and I look forward to your talk, hopefully being available online after the conference. Laura, is there anything you'd like to leave us with? Stay curious, everyone, and don't be afraid to break things. You never know what you might find.

Starting point is 00:57:03 Our guest has been Laura Abbott, an engineer at Oxide Computer working on Rust software for microcontrollers. She'll be speaking at Hardware.io in early June 2022 in Santa Clara, California. Thanks, Laura.

Starting point is 00:57:19 Thanks. Thank you to Christopher for producing and co-hosting. Thank you to Andrea at Hardware.io for the introduction and Rick Arthur for his Patreon support and his lightning round questions. And of course, thank you for listening. You can always contact us at show at embedded.fm or hit the contact link on embedded.fm. And now a quote to leave you with from Audrey Hepburn. Nothing is impossible. The word itself says I'm possible.

Your Ad Here

Embedded - 414: Puff, the Magically Secure Dragon

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.