Embedded - 34: Really Big Cabbage

Starting point is 00:00:00 Welcome to Making Embedded Systems, the show for people who love gadgets. This week we're going to talk about updating device software. It'll be exciting. Really! Or double your money back. I didn't intro a guest, but that doesn't mean this is all me monologuing. Chris White, my producer, is going to be standing in for you, the listeners. Hopefully asking the questions you have and preventing me from going off into the deep end before we get the water wings on. Hi Chris, thank you for joining me.

Starting point is 00:00:36 Hello. So I want to talk about bootloaders for more update. Over the air updates. Yeah, there are a lot of names. O update or over the air updates yeah there are a lot of names uh ota for over the air and otap for over the air programming my firm my favorite is fwop which i don't think is how you're supposed to pronounce it but for how else you're going to pronounce it if you're going to try to pronounce it so updating uploading whatever you want to call it it's the programming of your device with new code after it has gone into the wild.

Starting point is 00:01:09 So not JTAG. Brought with danger. Not about manufacturing and how code gets in at that point. This is all... Not burning your EEPROM with the UV light and then... Those were the days. You never did that. I really did do that. What do you mean I never did that i really didn't i never did that did you ever have a project where you i must have yeah i don't know uh okay so you haven't done

Starting point is 00:01:35 firmware update i have not done firmware update in a small device but you have done it as a user as a consumer yes and i've done it for big systems. Yeah, medical devices. You've had some Linux-based systems, and you just replace the executable, and poof, you're done? Well, there's a little more than that. You have to check digital signatures and make sure everything's okay and make sure you don't blow away the good stuff

Starting point is 00:01:58 before the new stuff is validated. But, yeah. Well, it's all the same, except smaller and more likely that you're going to blow away all the good stuff before it's done. Okay, so I have to admit that this is probably the podcast I prepared the least for. So we're winging it. And if you hate it, please... Write someone else. I was going to say, please send email to Chris.

Starting point is 00:02:30 Great. So one of the things I am going to do is I have my book here. If you don't know, I wrote a book. It's called Making Embedded Systems. It's by O'Reilly. No, it's by me. The publisher is O'Reilly. Strange, no, it's by me. It's the publisher is O'Reilly.

Starting point is 00:02:45 Strange that the podcast is the same name. Wow. It's like we planned this. Although I've been thinking about turning the podcast to Embedded.fm and just calling it that all the time. Eh, we'll decide later. There's somebody else to decide for us.

Starting point is 00:02:58 Oh, yes, right. I feel like a total idiot when I refer to people, when I refer people to the book because I had a client who said, I need to make my stuff faster, and can you come in and talk me through it? And I said, sure, or you could just read my book.

Starting point is 00:03:17 There's a whole chapter about it. It tells you exactly what you want to do, and all I'm going to do is... Charge a lot more money. That's true. I guess my hourly rates are more than what I get for the book.

Starting point is 00:03:30 But a lot of this show is going to come from chapter 7 in the book. So if you've already read it, I hope you don't get bored. But you know, it'll be fun. Chris will have questions, right? I always have questions. Just maybe not about what you want.

Starting point is 00:03:45 That's fine. That's fine. Do you have any questions to start with? No. Okay. So when we're updating code, the idea from the user's perspective is you push a button or maybe it happens automatically and the device just updates and suddenly has new features. Or maybe it fixes the bug that everybody hated.

Starting point is 00:04:08 Okay. And it's funny. I went to see the optometrist this year, and he asked me what I did. And at one point I was working on motorcycle stuff. And so now I think there's a note in my file that says, ask her what she's working on. It may be interesting.

Starting point is 00:04:25 And I said I was working on Internet of Things widgets. And he said, oh, I got an internet enabled thermostat. And I said, the Nest? And he said, no. And so we talked about it. It's the Honeywell one. He said, I asked, since I'm planning this rant for the Embedded Systems Conference about the Internet of Things and how it really fails consumers, I asked him how setting it up went. And he said it was okay.

Starting point is 00:04:50 I mean, it had a little, it did its own network and he logged in with his phone and gave it his password, blah, blah, blah. And that was okay. But three days later, it lost its mind. It couldn't connect to the internet anymore. And he was baffled. But do you know what happened? Well, either it was very badly programmed and it just lost its mind, or it tried to do a firmware update. No, he said it hadn't done it since.

Starting point is 00:05:18 It did a firmware update and either... And erased its information. On purpose? Well, probably they released the device with one set of firmware and said, oh, we're going to fix it, but we won't change the flash area or the prom or wherever they're keeping it. We won't change the structure of that. And then they updated the firmware and something changed and the EEPROM or the flash or whatever they were storing the password in

Starting point is 00:05:45 no longer passed the checksum routine. Given that entering all of your data in those little devices somehow to get your Wi-Fi network connected is such a pain in the neck, that's probably something as a developer you'd like to avoid blowing away that information. I would think so, but... I mean, how does that pass testing? I guess they didn't test it. Well, they had to test that version zero went to version 12 because, you know, they probably spent some time in the box and instead QA has been testing that version 11 goes to version 12. Sure. I mean, I have to say that I have had one device that did that, and I was unhappy when I found out. Although the version zero for us had only gone to beta users.

Starting point is 00:06:31 And so that was how we justified it. But to this day, I'm like, couldn't you have kept it in two formats or something? But a lot of firmware update has to do with what you have and what... See, the reason this is hard for small devices is not... I mean, you said it was the same for big devices, but it's not, because you have no room sometimes in the small devices to store the new thing temporarily,

Starting point is 00:06:55 or you don't have the ability to run the update code and the normal code at the same time, right? Exactly. This is all about which resources are constrained and which ones aren't and how you can do the best you can with what you got, which is kind of the story for embedded systems. So there are a couple of things that we need to figure out regarding which features you have and which ones you can use more of.

Starting point is 00:07:27 The first thing is how you're going to store the code, the actual final image, the what you're running from. And we say that this runs from ROM. Of course, the acronym means read-only memory. And since we're about to change it, that's not really read-only memory. But it may be onboard flash. Read-often memory. Read-often memory, right, changing it.

Starting point is 00:07:51 The other thing that you need to know is where are you storing the code, the new code? Not the code you're currently running, but the new code. So the temporary, I have to put this somewhere because I can't overwrite the thing I'm running.

Starting point is 00:08:07 Not even that, not yet. We're going to get there. But the thumb drive, if you're updating via thumb drive, the cloud server. So where the original source of the new stuff is. Where the source of the new stuff is. And the communication method. And those two things can be, I mean,

Starting point is 00:08:22 a thumb drive indicates that this is going to be over USB, but cloud indicates over Wi-Fi. Cloud. So you have to know what those things are. Right. And I don't think I can help you with that. That's kind of how your systems decide. But when you're building all this, you need to know what it is. At some point you boil down to a function that says get new image.

Starting point is 00:08:46 Right. And however that works, who cares? Not always. Sometimes some of these tips like the STM F152 line and I bet... Seriously had that just sitting on the top of your head? I used it recently.

Starting point is 00:09:03 Okay, so the Piccolo C2000 line also has one you're just gonna list product names that have weird numbers to demonstrate your eidetic memory uh no because my memory for these numbers is man i have to see them about a thousand times you you know we watched top gear and they had those Chinese cars. Right. With the really, really long, long... Do you remember any of the Chinese car names? Of course not. It was like the Ford F1 4000... They weren't Fords because they were the Chinese companies.

Starting point is 00:09:36 It wasn't... BQR... Yes. It was like... We're going to offend people if I continue, so... Yeah, well, Top Gear. So anyway, the source of your data. So the source of your data and communication method

Starting point is 00:09:49 are the important parts that you need to remember. Okay. And so that's going to be like a constant as we talk through is this, where does this stuff come from? Some processors have onboard bootloaders. Whoa, whoa, whoa. A what now? An onboard bootloader.

Starting point is 00:10:05 And what this does is it's a part of the chip. It may take up some code space, but probably it's just part of the chip as it's shipped to you. And you put some IO lines in some configuration. You pull them up, you pull them down, whatever. And then it goes into this mode where when you apply power or reset, come out of reset, you can just upload the firmware.

Starting point is 00:10:30 And you do it like over a UART or a spy terminal or whatever. It's like the processor can update the firmware for you. Okay, and there's no way to accidentally destroy this piece of code. You have to work really hard. Well, no, that's good. I mean, I did do it once, yeah. But that usually involves trying to change things deep inside the processor

Starting point is 00:10:55 that you weren't supposed to touch in the first place. That was one of my questions, is why do you need a bootloader at all? And I know we're not staying on track here very well, but this is a fundamental piece, right? So why do you need a bootloader? Eventually, to start a program on your chip, chips are, you know, they read from address zero

Starting point is 00:11:15 and they just go or something like that, right? Oh, but this bootloader isn't between the code. It doesn't always run. It kind of does because it checks those little... Checks the pins that may be pulled up or down to indicate you should go into bootloading mode. So it is the first thing that runs. But it may be the first thing that runs

Starting point is 00:11:39 before the processor really gets into running code from address space. Oh, okay. So it may actually run outside normal bounds. Although sometimes if you're willing to walk through the reset factor and all the things that come between when you reset and when you hit main, sometimes you can find the bootloader call in there. And it calls, it checks these GPIOs, and then if they're set in the way that says

Starting point is 00:12:04 go into bootloading mode, it goes into bootloading mode. And it sets up the UART to the data sheet agreed upon standard, and poof, now you can update your firmware. That only works if your firmware is going to onboard flash, because the chip vendor knows how to deal with onboard flash. And it may work if your code is going to onboard RAM. Right. It's not going to help you at all if you need to get your firmware update from a network. Right. Because the bootloader is not going to have anything really except I can check my pins, I can read stuff from some sort of serial interface

Starting point is 00:12:46 and put it into RAM or to internal flash, but that's it. Right, but serial interface is pretty useful because it might come from an SD card if you can get a SPI interface to an MMC or a small thing that basically is a flash. And the bootloader is smart enough to talk to an SD card usually? Some, some not. Okay, but that's a feature of the bootloader.

Starting point is 00:13:09 Well, and you can make your own bootloader. I mean, if you know you're going to be updating this way, you can make your own little bootloader that does the same thing. So if you're crazy enough, you could make a really complicated bootloader that could handle sophisticated things, like even talk over a network if you had enough code space for it or something. Yeah, you could. But if you're talking over a network,

Starting point is 00:13:32 you're probably talking about an operating system, and that would mean having two copies of the operating system. Well, we'll go back to two copies. But you could definitely have a bootloader that could talk to any flash that you ship your boards with or any of a small set of SD cards that you have qualified. Okay. And then you put your new code onto the SD card, you plug it in,

Starting point is 00:13:57 and now your communication method is spy and you just update your code space. Your bootloader doesn't change, which is kind of the downside to this process. Why? Well, what if... I mean, if you could change your bootloader, or you don't necessarily want to, right? Because there's always a danger with doing an update. Yes.

Starting point is 00:14:21 And you try to protect, you try to keep all those danger zones as small and short as possible and if you're overriding the bootloader which is the last line of defense for bringing you know for starting the system then you could potentially turn it into a brick and that's yes we're going to be talking about bricks bricks are what happen when you have a device that formerly worked fine and then you were updating it and it lost power or it lost its mind or cosmic rays interfered or whatever. And suddenly you no longer have a device that does jack. And often you no longer have a device that can ever do anything again if you do this wrong.

Starting point is 00:15:01 I bet it could do something. Well, a lot of the new processors will lock out JTAG because that's a security thing. And the only way to update the firmware now is to actually do this firmware update process. And if you crash in the middle of updating your bootloader... I always think it would make a nice bookend, maybe. Oh, I see. Cat toy, bookend, landfill tak taker, upper of space, yeah.

Starting point is 00:15:29 Okay, but we're going with the simple method still. We have the code. We have a communication method. We have the bootloader, which we're not going to update ever. And we have the code. And that all makes sense. And now if you're going along and you power on you check the bootloader you check to see if new code is available how do i do that

Starting point is 00:15:51 what does that mean well the bootloader has to have a communication method it has to know how to talk spy or you are or whatever that's what the bootloader has sure so the bootloader checks to see if new code is available on its communication method. Maybe it's a USB thumb drive. It goes out to the thumb drive and says, do you have code? And if the thumb drive says yes, then it says, is this code new? Yes. Does this code checksum?

Starting point is 00:16:18 And if you are writing a bootloader and you don't have a checksum man really you're totally gonna make anybody ever done that well yeah because the checksum is one more piece of code and if you're you're trying to keep a bootloader to i did it once in 256 bytes you're trying to do that was more impressive before i saw the the entire pc emulator in 4000,000 bytes the other day. Oh, yeah. What was that called? Do you remember? I'll look it up. Yeah, that was pretty impressive. Well, yeah.

Starting point is 00:16:51 I don't know that I believe you. Anyway. So the bootloader runs and it looks for a new code update and then it loads the new code to Scratch Space, ideally. That would be RAM or maybe it's off-board RAM, or maybe it's even internal flash. But you're assuming that that thumb drive or SD card may get pulled away, and you want the bootloader

Starting point is 00:17:15 to be able to continue as much as possible. It goes back to keeping the amount of time that you're vulnerable to power off causing brain damage short. And you can shorten that time to zero by having lots of space available, but sometimes you may not. I don't think so. Really?

Starting point is 00:17:36 I think we're going to get to the end of this and we're going to say no. There is no way that is foolproof all the time. Okay. Is foolproof all the time. Okay. I mean, except maybe having two full images. Yeah. Oh, all right. Well, then I'll give that to you.

Starting point is 00:17:56 I mean, unless something goes wrong when you're, okay. We'll get there. Okay, so your bootloader has detected there's new and valid code available. You've loaded it to someplace local and onto Scratch Space, if you don't have enough Scratch Space to load at all, you're just loading little tiny bits and erasing a sector, programming the new code, and then you just keep doing that until you finish.

Starting point is 00:18:17 And then once it's done, you can run your new code. Now, Scratch Space could be enough space to hold your whole code, and then if somebody pulls your SD card, you're safe. Okay. Or it may be whatever your sector size is, or whatever your minimum amount that can be programmed.

Starting point is 00:18:39 Okay. So it may be that I have less scratch space than the size of my image. Right. Right. But you can't have less scratch space than the size of my image. Right, right. But you can't have less scratch space. Well, you really shouldn't have less scratch space than the size of your sector because you have to program a whole sector at once. So you have to put it in place.

Starting point is 00:18:55 Okay. So yeah, you have to, the goal here is to kind of be fast too. Okay, so if you need to update that bootloader, you're right. You're going to have this period where you could totally lose your mind permanently. The advantage to having a bootloader that's resident,

Starting point is 00:19:18 that always is there, is that if you get your power yanked in the middle of your erase and program cycle, at least you still have your bootloader. You just have to put in the SD card or the thumb drive or whatever again, and the bootloader will say, oh, okay, I've got new code. I'll program it. But if you need to update your bootloader, then that's where things can go bad. So why would I need to update my bootloader?

Starting point is 00:19:45 Well, let's say your... Seems like a huge mistake was made somewhere. SD cards that you were using are no longer available, and the new SD cards have different timing. Or your code now needs to take up one more sector, and the only sector available was in the bootloader sector. And so now you need to make a smaller bootloader. I liked that wince.

Starting point is 00:20:12 That was a great wince. You couldn't see that. It all sounds very painful. It is painful. But on the other hand, you really kind of need to plan for this because if you try to do the bootloader at the very end... It seems like a lot of these things are a result of not planning.

Starting point is 00:20:27 No, they're the result of squeezing the pennies out. When you need to make a million of something, then getting from 256k to 128k is worth a lot of money, actually. It's certainly worth a few engineer hours. Okay, I can see that. Okay, so now we're updating the bootloader. And it's a

Starting point is 00:20:51 shell game. You update the new code to have a new bootloader loader. And then you load the new bootloader and then you erase the bootloader loader with the new code and then you run. Yeah, it's all very... The bootloader loader? you erase the bootloader loader with the new code, and then you run. Yeah, it's all very...

Starting point is 00:21:07 The bootloader loader? Yeah, the bootloader loader. I think you're just making this up now. No, no, no, I'm not. But it's one of these things where, well, you know, there's the goat and the cabbage and the wolf. And you are going home from a fair, wherever you take a goat, a cabbage, and a wolf.

Starting point is 00:21:34 Why? I don't know. And then you need to get them across the river. And it's really similar because you have to take... Go on, I'm listening. Okay, so with this puzzle, which is this stupid ass interview question. No, it's an important interview question. This will answer whether or not the candidate

Starting point is 00:21:59 is a qualified person for this job, any job involving technology. If you ask this one question, you will be able to determine instantly. You need to turn the sarcasm up a little bit because I know there's sarcasm, but I'm not sure it's really coming across here. All right, well.

Starting point is 00:22:19 Okay, so the goat question. You have a goat, you have lettuce, and you have wolf. You need to get them to the other side of the river. And you have one boat and only one thing fits in at a time. Which makes no sense because they're all different sizes. I know. I mean, it's not like the cabbage is going to take all that much space. Big cabbage. It's a really big cabbage. And the goat will eat the cabbage

Starting point is 00:22:48 and the wolf will eat the goat. And so how do you get them all to the other side? What does the cabbage eat? The cabbage eats nothing. It's cabbage. Okay. You have to get them all over to the other side. And really the whole problem

Starting point is 00:23:04 is you have to keep the goat away from the other two. And so you take the goat over. No, no, no. I have no idea how to do this. Well, you're not qualified. I know. But the bootloader problem is strangely similar to this in that you have to do one thing at a time.

Starting point is 00:23:27 In the right order. And you keep it in the right order. And you really have to think it through before you start. Because if you start with the wrong thing, it just goes bad. Okay, so you cross the other side with the goat. I'm probably going to say this a lot. You go back to the original side. You take the wolf or wolf or the cabbage doesn't matter at this point and you take that over to your house and now you take the goat back with you because the goat's not

Starting point is 00:23:56 allowed to be alone with the cabbage and then you take the wolf over to your house and now the wolf and cabbage are alone but that's okay then you go back and you get the goat. So you've taken an extra trip, but you get all of your creatures home safely. Congratulations, you can work for Microsoft. You know, I think the interview processes are different than when we went to college. But yeah, when you're working with bootloaders,

Starting point is 00:24:23 when you need to update a resident bootloader, you start out with the bootloader in place and some form of code in place. And you replace that code with something that can then update the bootloader. Okay. And then you... You run that. You run that, and now you've updated the bootloader, so you've got bootloader prime or new bootloader,

Starting point is 00:24:48 whatever you're calling it. And now you don't have any good code, so now you have to update the code. But that last step is the normal code update that you would have done anyway. But updating the bootloader step is not strictly necessary every time and should be minimized, right? It and should be minimized right it really should be minimized

Starting point is 00:25:06 because if you fail to to update the bootloader you have created a brick um on the other hand sometimes that's necessary creating a brick or updating the bootloader updating the bootloader okay um no creating a brick and making a system unrecoverable is bad and one of the goals of good loader design is to make that period as short as possible okay okay so so now that's that's what happens if you are doing that um so now let's move on to a different type, a totally different style of loading. Our new code now is going to contain a loader. And so we're always going to update the bootloader.

Starting point is 00:25:55 Except we're not going to call it the bootloader anyway. We're just going to call it the loader. Okay. We just said that you don't want to do that unless you really, really have to. Well, now it's more like a ping pong buffer. You have two things that are always valid and they can update each other. And if one of them becomes invalid, the other one is still okay.

Starting point is 00:26:16 And as part of the whole thing, you know which one is valid and which one is okay. How does that work when you come out of reset? How does it know which one to go to? Because secretly there are three parts to this. If I ask you another question, are there going to be secretly four parts? No, what you have is some little, little, tiny, tiny... A little, not quite a bootloader, but almost a bootloader. A micro-bootloader.

Starting point is 00:26:41 A micro-bootloader. I knew this was going to happen. And it can... It's like Zeno's paradox. but almost a bootloader. A micro bootloader. A micro bootloader. I knew this was going to happen. And it can... It's like Zeno's paradox. You load the bootloader and then the bootloader loader and you never actually load that code because you always have to do a smaller loader.

Starting point is 00:26:54 Exactly. No. No, not at all. But you have one little tiny part of code that I swear is resident and will never, ever, ever change. Really, this time. Not like the bootloader.

Starting point is 00:27:05 Jump zero. Yes, that. And so this little tiny piece of code, which I hesitate to even call it a bootloader, but it's the only name that it seems consistent. Bootstrapper. Bootstrap, yeah. That's actually a better name for it.

Starting point is 00:27:23 So you bootstrap and you check to see if the loader code is valid. And you check to see if the runtime code is valid. And if the loader code is valid, you run it because then it can check and see if there's anything waiting for new code. If it isn't valid, then you go ahead to the regular runtime code and you run that. The runtime code can update the loader and the loader can update the runtime code. It's a ping pong thing. You're going back and forth. One can always work with the other. And because you're only doing one of these at a time, because you're only, you're finishing one before you start the other you're finishing the loader or you're finishing the

Starting point is 00:28:07 code update before you swap and do the other that means that something is always valid you can always run the system you can't make a brick anymore so so that's the advantage of this one

Starting point is 00:28:23 is it's a little bit safer but presumably there must be a downside because So that's the advantage of this one is it's a little bit safer. Yes. But presumably there must be a downside because not everybody does this. Well, now you have two sets of code that can communicate to your communication method, the loader and regular code. And both of them have to be able to talk whatever protocol you're talking. And if that's Wi-Fi, that's a pretty big protocol. If it's spy, then that's not so big.

Starting point is 00:28:52 If you're talking serial to a Bluetooth chip, that's not so big. But you're making it so you can update the code more easily and you're preventing the brick situation, but you're paying for it in Codespace. So if you're trying to minimize your Codespace, this might not be the method for you. Okay, that makes sense. Is this totally secure?

Starting point is 00:29:19 I mean, is there still a way to brick this? The only way really to make a brick at this point is to fill your checksums, and that's a hardware error. So you can lose power at any time, and it's okay. Okay. And mostly with bricks, you just assume that you're going to lose power at the worst possible time,

Starting point is 00:29:41 and when you boot up, can you still run? You just need to put a big capacitor on everything. Three minute long capacitor. Three minutes? Well, I mean, if you're talking to a server. What decade is this? I know. But if you have, you know, a serial interface or a really, really small communication protocol, even for HTTP over serial, so you

Starting point is 00:30:11 have data, but you don't have a lot of communication overhead, then you might be able to do that. You might be able to put that in a loader, but you can run into problems. I'm telling you, power goes out at the worst possible times. Okay, so let's say you don't have the code space. But you do have RAM. Not all of these. This whole microcontroller thing where you get onboard code space is really spiffy. I have one processor I'm working with right now

Starting point is 00:30:50 that's all RAM space. And so its code lives off the processor and I have to update it from the flash over there. And then I reboot and it loads into RAM. So it runs out of RAM. It runs out of RAM. Running out of RAM is kind of cool. It's kind of RAM. Running out of RAM is kind of cool. It's kind of fast.

Starting point is 00:31:07 It's faster than running out of flash, for sure. Well, it's the way most people think computers work. Well, that's for computers. Those are boring. I'm using computers in a very general term here. You may be surprised, but the things we actually work with are computers. Yes.

Starting point is 00:31:24 They certainly would have been considered large computers only two or three decades ago. I've heard that the difference between a microprocessor and a microcontroller is whether or not it's got onboard flash. I still don't use those words properly, but we're going to go with that being the actual difference. Well, they have peripherals attached and things that microprocessors don't. Yeah, I don't know if there's anything that has onboard flash but doesn't have peripherals. Okay, so

Starting point is 00:31:54 but I'm not going to worry too much about the things that run out of RAM because those well, you can kind of figure those out. You have to update whatever your code space storage area is. It's just one extra step, right? Eventually, you have to move the stuff to RAM when you first start up. Yeah.

Starting point is 00:32:13 But otherwise, the process should be the same. Right. So let's say onboard code space is too expensive. So onboard Flash is too expensive. But you have some RAM, which you're using because your algorithm takes a whole bunch. A lot of signal processing falls into this realm. And so what you do is you put a little RAM chip

Starting point is 00:32:38 on the outside of your processor. And you load the new code into the external RAM. Who does that? The bootloader? Your code does that because you don't have your regular runtime code. Okay, so during the update, the code loads the new code.

Starting point is 00:32:58 The existing code loads the new code to the external RAM. Yes. Got it. And then the new code to the external RAM. Yes. Got it. And then the existing code adds a little piece, puts a little code into run space RAM. So in the processor's RAM. We need a diagram.

Starting point is 00:33:19 I know, I've got one in front of me. It's kind of bad. I think it's making it worse, actually. Okay, do you have runtime code you've got runtime pieces you have runtime flash code yeah you have the ram on the processor which is run space ram code that's what we're going to call that i'm not sure your terms are distinct enough to to not be confusing oh all right so you're the ram on the processor that everything executes from no no most of the RAM on the processor that everything executes from. No, no, no.

Starting point is 00:33:45 Most of the time on this processor that we're talking about here, this is a made-up processor, although there are many like it. It runs from Flash. Oh. On board Flash. I thought we were still talking about ones that run from RAM. Oh, no. Because those are kind of an easier case.

Starting point is 00:33:59 Sorry. They can fall under this, but they can... Got it, got it. Okay, so... This is why people listen, right? It makes them feel better about themselves. Yes. When you get into the office today,

Starting point is 00:34:17 you're driving along, listening to this podcast, thinking, I can totally explain this better than they can. Find a nice junior engineer and explain it to them. We're unexplaining it. Yeah, okay. So I don't really care about the processors that run from RAM all the time. Boring, easy, done, fine.

Starting point is 00:34:36 Those are boring. We're not going to worry about those. Okay. So now this Phantom processor has some onboard flash, but it doesn't have enough to do that ping pong thing we just discussed. Okay. It's got a little bit and it's got some on onboard RAM as well. Okay. Now we want to update it, but we don't really have enough resources to do this because we've only got some

Starting point is 00:34:57 RAM. We've only got some code space and not enough for a duplicate copy of our code. Yep. So we're going to take the new code and we're going to put it into an external RAM. Now you have to have thought about this ahead of time because you have to have the external RAM. This is hardware. You're not just faking this piece. Okay, so now the code, the regular runtime code, puts the new code into the external RAM,

Starting point is 00:35:24 scratch space RAM. And then it takes some little piece, a bootstrap sort of thing, and it puts it into the run space RAM on the processor. Run space RAM. So we have some RAM on the processor we can run code from. But we don't usually do this because there's not a whole lot of it. That was the confusing piece. Okay, so you can run from RAM on this processor,

Starting point is 00:35:50 but you generally don't. No, because we usually run from flash because there's enough flash and there's not a whole lot of RAM. Got it. But there's enough RAM to run something small. Okay. And should I digress here or should I continue?

Starting point is 00:36:05 I thought we already had digressed. So a lot of times this happens when you have, I said, signal processing applications. Normally, you would be using all of your RAM in the signal processing application to buffer it to look whatever you're doing with your actual algorithm. You'd be using this RAM. But since you've started this download process, that RAM now is suddenly available to do more things with. And so what we're going to do after we've loaded all of our code into our external scratch-based RAM,

Starting point is 00:36:37 we're going to take a little tiny bit of ourselves, a little tiny program of ourselves that is the code that can program the onboard flash. Got it. And we're going to put that in RAM. And then we're going to take that RAM and run from it and read the external RAM into the onboard flash. So how does that work?

Starting point is 00:37:03 So presumably you're not doing a reset between those two things. Oh God, no. So you have to... Because that would be a brick. So you have to copy that and then jump to it. Well, it's even worse than copying it and jumping to it. I mean, you could do that

Starting point is 00:37:19 if you didn't have any function calls, but you have to link specially for running from RAM because the addresses are all different. It's weird. function calls, but you have to link specially for running from RAM. Because the addresses are all different. It's weird. You have to deal with the linker file. So this is my question. You've got a situation that's sort of dynamic that you don't usually do.

Starting point is 00:37:41 So how does that, how do you actually make that work? Be more specific, please. So you have to set this up in the linker file presumably ahead of time of when you build your whole system. Yeah. Right? Yeah. So do you have to have a special symbol then

Starting point is 00:37:54 that says this is... Right. What's the mechanism through which I start executing the code that's in the RAM, the special little run space RAM? Jump to. That's what I just asked, and you said no. Oh, I'm sorry. It was more complicated. No, no.

Starting point is 00:38:15 You can jump to or you can make a function pointer to where you have now copied it to RAM. Okay, but what's the point of the linker file? But you can't do it... You can jump to this address and run from this address. But this address has to be self-consistent, has to be compiled to run from RAM. So the code has to be position independent or position dependent the right way. Right.

Starting point is 00:38:41 That you put into the RAM. Right. So it's less about the action, it's more about how you prepare the small piece of code. Yes. Okay. Your little bootstrapper that's now running from RAM has to be designed to run from RAM.

Starting point is 00:38:52 And realistically, if you're doing this type of heavy algorithm data sort of thing, you probably have something else running from RAM for speed. Your FFT algorithm probably is already compiled to run from RAM. So you've already had to go play in the linker hell that is the linker file. Does that make sense now? Yeah, it makes more sense. Okay. Is that the most complicated scheme?

Starting point is 00:39:17 Please say yes. That's the most complicated scheme I was going to talk about today. That's funny. I neglected to figure out what we were starting. I was talking about holes and time, space, displacement. Well, I mean, that is because external RAM

Starting point is 00:39:37 is cheap, particularly if it's slow. And internal code space is expensive. And that's why you're doing it that way uh and basically the game is to find someplace you can stash a piece or a copy of the entire new thing and to spend as little time as possible in the danger zone of where you have a mix of new and old code or you've overwritten something that you need to execute

Starting point is 00:40:08 and if you lose power, you can never get back. Those are the two big fundamental pieces. And so you end up with these schemes depending on the architecture of your system. Yes. But it's probably helpful to think about this early because it would influence the architecture of your system. Well, it may influence your hardware if you need an external hardware piece.

Starting point is 00:40:27 Yeah. And it is, if you wait until the last minute to do your firmware update, you will end up with things like that thermostat did, where, oh, I had this plan. It was all going to work just fine. Oh, crap, we've already shipped two and it doesn't work the way we really need it to work for the future well they won't complain much discon you disconvenience your users and then disconvenience inconvenience inconvenient i never said disconvenience but you you end up with amazon reviews that say you suck which

Starting point is 00:41:02 you know is always a great way for you to feel better about yourself. Not. So I already said checksums, and we talked about danger zones, and when you're doing this plan for your system... Danger zone. Sorry. We're writing to the danger zone.

Starting point is 00:41:23 You're going to need a flowchart. Just trust me on that make the flow chart and then buy yourself a nice set of colored pencils and highlight in red when I was 10 I got this template and I made my parents buy it for me

Starting point is 00:41:36 at some store and it had little flow chart symbols and I had convinced myself it was going to lead me to do all kinds of really cool projects because I could now draw the little symbols for all the flowcharts. I never used it. You really thought the symbols were the important part, whether it was a rectangle or a diamond? It's very important.

Starting point is 00:41:58 Yeah. All right. Well, draw your flowcharts. Mark the areas that are dangerous and try to keep them short. If you can, convince somebody that it's worth the extra 15 cents never to have your unit turn into a brick. Well, there's a cost-benefit to that. There is, absolutely.

Starting point is 00:42:19 If the odds are one in every thousand units is going to turn into a brick and return it, and that costs you $100 versus $1,000 to use a more expensive chip in your product. You might err on the side of a couple of unhappy customers. I neglected to write down what time we started. Do we have time for one more? We have time for anything you like. All right. As long as it's not another bootloader scheme.

Starting point is 00:42:45 No, no. This is the way that not another bootloader scheme. No, no. This is the way that makes the bootloader schemes all look simple. It's when you add security to it. I was going to say you're going to add a hard drive. So I worked on a system that had security keys in it and there were consumables as part of the system. And so when you updated the firmware,

Starting point is 00:43:13 you could update the security keys in it and there were consumables as part of the system and so when you updated the firmware you could update the security keys and thereby invalidate previous consumables yeah so security keys are wandering around loose around the world that's not good people can if if your consumables are worth enough, the people are going to hack your system. The easiest way to hack it is through this firmware update scheme. So if you don't, if you, if you have a consumable or something like this or some secret key, then you either can't put it in your bootloader, or you can't put it in your new code. You can't send it plain text through the world. Or you need to encrypt your code,

Starting point is 00:43:55 in which case your bootloader... But even if you encrypt your code, you still have to decrypt your code, which means you have to have a key somewhere. Yeah. The best way to do this is to have a crypto chip that you never actually read the key from and that does the hash calculation locally and then tells you if it was okay or not.

Starting point is 00:44:10 You had one of those in one of your bigger systems, didn't you? I've never had one. I have had to do encryption in my loaders and in one system that was super space constrained, we pretty much made our code plain text or hex plain text. And then we had a secret area that was encrypted and it took special, if you had to update the keys or you had to update the encryption keys, it was much longer load process. And that was just for authentication. If you're going to actually encrypt your code, that's a whole other thing. I think you need

Starting point is 00:44:45 hardware support for that. In which case, you're probably storing the key somewhere that can't be read out. Well, I meant encrypt as you're doing this loading process. Right. Not encrypt when you're running from it. It's only during this load and unload. I mean, if you're doing

Starting point is 00:45:00 that ping pong bootloader where you have a loader and you have runtime code and you can go to either one and secretly there's a third bootstrap piece, then your loader and your code both could have encryption pieces in them. I mean. Sure, sure, sure. But you don't want to ever have anything in a piece of something that can be, if somebody has physical access to it and a little bit of know-how, you don't want to have, like you were saying. This is why they disable JTAG. The family jewels available for reading, you know, for your company, just in plain text. Yeah.

Starting point is 00:45:35 So be careful. Security. and you mark all the red bits where you can destroy your unit, you also need to mark the areas where you have exposed your family jewels. Thank you for that. That was a great image. That image brought to you by Christopher White. I didn't mean it that way. I know you didn't, but I really enjoyed it. So, some questions to ask yourself about your design.

Starting point is 00:46:09 How often will the new code be loaded and by whom? Are they trusted people like technicians or unknowledgeable end users? The way you organize your system really may depend on who's uploading your code. And then there's low-cost parts can make loading code safer. So, you know, it's definitely cost-benefit analysis. High-cost parts. Not really. That external adding a little piece.

Starting point is 00:46:50 Another piece that might not cost much. It certainly costs more than zero. Yes. Oh, you're definitely adding cost to the system, but you may not be adding a lot of cost. It's a risk. It's a cost-risk benefit analysis

Starting point is 00:47:04 thingy, Bob. Bob. That's what economists call it. What other pieces of the system might change? We didn't really talk about, you know, if you, the SD cards, I mentioned that what if the SD card itself changes? Well, that's kind of important,

Starting point is 00:47:21 but what if you have other things that can change in your system? Your communication mechanism gets updated someday. If that is important to your system, then you have to deal with it at a bootloader level because that's where things may seriously go wrong. And then always, where can your new code be corrupted and how can you figure it out? What is the soonest possible area you can figure out your code's toast? Do you have another question? What is the worst thing that can happen at each stage of your flowchart?

Starting point is 00:47:58 I was going to add, what happens when your code gets bigger than your scheme? Then you go to a new scheme. Right, but that can be difficult. Well, and then the next chapter in the book, I think, is, yes, yes, chapter 8, optimizing your code for space cycles or ROM. So RAM, ROM, or cycles. So, yeah, when it gets too big, you should optimize it. My point was, you might have this great scheme, and you might have the right hardware for that

Starting point is 00:48:31 scheme. And then somebody says, we need this new feature. And it can fit in your system, but it no longer maps well to your firmware update scheme. That's a good point. Because that off-board RAM version means you get to use all of your on-board flash code is available for your algorithms and for running. And so it's dangerous, but you get 100% of that code for doing what you need to do on a normal daily basis but the the ping pong

Starting point is 00:49:08 loader code method that you always have essentially two images three kind of once you can't the bootstrap uh in your code means that you've got a bunch of wasted crap in there and i mentioned that i had done a loader in what seemed like an extremely small space. You can do that, but then you are a little constrained. Life gets tough when you have to deal with that. And you have to decide in the beginning, you can't usually switch midstream, or you can't switch after midway, midstream. Or you can't switch after

Starting point is 00:49:45 units have left the factory. Even though that's guaranteed to happen every single time. Something will change outside your control every single time. Well, actually, that wasn't true for LeapFrog. That was one of the best things about working for LeapFrog, was there was no firmware update.

Starting point is 00:50:04 You released the code. They masked the ROMs. You prayed. You prayed. There were no bugs like that time when nobody noticed that N said nibbling for nuts, which finally somebody realized was not a good thing. And so those ROMs went into the sea

Starting point is 00:50:25 or wherever ROMs go whenever somebody says, yeah, let's make a new masked ROM. But you could never update the firmware. It was part of the system. It was real ROM. It was real ROM. Not this other ROM we're talking about. Not this fake flash ROM,

Starting point is 00:50:39 which isn't really read-only at all. So what else? Did I answer all of your questions about bootloading? I mean, I have a better understanding of it than I did an hour ago. That's good. I hope to never have to do it. You'll have to someday. Nah, I'll let somebody else do it.

Starting point is 00:51:02 Ah, yeah. No, I think... Nobody likes that sort of thing i think it falls under one of those things that uh people who have done it use it as a benchmark for other people it's kind of like sometimes in interviews i'm trying to figure out what this person knows and what they don't know and so you ask them the go question oh god God, I hate the go question. No, no. I ask them if they've ever started a system from scratch. Have they ever done Hello World?

Starting point is 00:51:32 Have they installed the compilers by themselves? Have they chosen all of this stuff on their own? Chosen a processor? Because there's a certain, I'm going to say innocence lost when you actually have to go and choose all this stuff for yourself. And bootleaders are kind of like that. It's a, it's a coming of age benchmark to me. Wow. Can we cue the sappy music? Do you have, we have sappy music? Maybe we need sappy music.

Starting point is 00:52:01 No. Where's your, didn't you get a new instrument for Christmas? Yes. I'm going to play the ukulele on the podcast. That's going to happen right now. I doubt that. I'd continue talking about this probably, you know, for at least another half an hour, two or three hours. But I'd want diagrams and a whiteboard.

Starting point is 00:52:23 Chris has a meeting to attend soon. I guess I have to have somebody ask me about optimizing graphics. Hey, that's not in the book. It's because you don't believe graphics are a part of an embedded system. No, no, they are. Your narrow definition of an embedded system only includes

Starting point is 00:52:39 things that have bootloaders. No, no. Yeah, graphics. No floating no. Yeah, graphics. No floating point. Oh, floating points for wimps. It's IQ math. It's so much better. Never mind the cash.

Starting point is 00:52:53 Cash? I like cash. Are you going to give me cash? Not that kind of cash. I'm not going to give you cash. You're not a spider. I don't get it. Sorry.

Starting point is 00:53:10 I'm going to need help. You remember the big spider? I put the quarter up for reference. Oh, right. And he said, does that make them go away? Yeah, he sent me a picture of a gigantic spider in our kitchen with a quarter in front of it, and I asked if he was paying it to go away.

Starting point is 00:53:26 I think we've got off topic. All right. So if you have questions or comments, please hit the contact link on embedded.fm or email show at embedded.fm. I'd like to thank last week's guest, Allison Chaikin, for suggesting this topic. I've heard some of you are interested in embedded systems vision projects. And I have to say, that's kind of cool, but I can't spout any of that off the top of my head. So be patient while I find a guest.

Starting point is 00:53:57 Or if you are that person and want to be a guest on the show, let me know. Hit the contact link on embedded.fm. Let's see, One final thought. Oh, well, here you go. Get 40% off the print book of Making Embedded Systems and 50% off the ebook version of this book by entering the discount code authd at o'reilly.com. That is auth as in author, d as in discount, all capital letters. Or if you're into this sort of thing, it's Alpha Uniform Tango Hotel Delta. I think Plato said that first, right? Oh, yeah.

Embedded - 34: Really Big Cabbage

Elecia describes to Christopher (@stoneymonster) how to design and create a firmware update mechanism. Hilarity ensues. 4k PC emulator Making Embedded Systems, the book, on O'Reilly (coupon in las...t 2 minutes of the show) or on Amazon.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Embedded - 34: Really Big Cabbage

Elecia describes to Christopher (@stoneymonster) how to design and create a firmware update mechanism. Hilarity ensues. 4k PC emulator Making Embedded Systems, the book, on O'Reilly (coupon in las...t 2 minutes of the show) or on Amazon.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.