Embedded - 290: Rule of Thumbs

Starting point is 00:00:00 Welcome to Embedded. I'm Alicia White. My co-host is Christopher White. And our guest this week is Philip Johnston of Embedded Artistry. Hi, Philip. Welcome. Hi, guys. How are you? Good, good. Could you tell us about yourself?

Starting point is 00:00:23 As Alicia mentioned, my name's Philip, and I'm the founder of Embedded Artistry, an embedded systems consulting firm based in San Francisco, California. I run the company with my wife, Rosie, and we also run a website by the same name, which is dedicated to embedded systems content. On that site, you can find hundreds of articles and other resources targeted for embedded systems engineers of all varieties and skill levels. When I'm not working, I enjoy volunteering as a gardener at the San Francisco Japanese Tea Garden, playing music, cooking good food for my friends and

Starting point is 00:00:56 family, and reading Latin to my four-month-old son. Buried the lead there. The last part there. The Latin or the gardening? Yeah, reading Latin. What sort of things do you read in latin poetry or a lot of cicero philosophy kind of material yeah because there's

Starting point is 00:01:17 a lot of latin poetry that you shouldn't probably be read to kill kids it's four months old certainly true yeah he doesn't understand he likes the rhythm uh all right we're mostly going to be talking about that website and latin with lots of with lots of embedded material and what's on it and why and how, you know, all the usual questions. But before we get into that lightning round where we ask you short questions and we want

Starting point is 00:01:51 short answers and we might say how and why, but we're not supposed to, but you don't have any rules or are you ready, Philip? I'm ready. Favorite processor? Family or architecture? Dealer's choice or architecture dealer's choice dealer's choice i would have to go with the nrf 52 favorite artist can i pass favorite plant favorite plant would be the black pine black pine it's a japanese pine tree all right least favorite compiler iar it's such an easy target yeah this week we are sponsored by

Starting point is 00:02:35 iar uh what's the most important point when looking at contracts for consulting i really look for equitability in the relationship and in the contract. So I tend to not trust things that are one-sided on either my end or their end. Because I think that sets up how the relationship is going to go. What is the most useful part of Nulip? I actually don't use Nulip very much. All right. What's the least useful part of Nulip? I've done some explorations use Nulib very much. All right. What's the least you use?

Starting point is 00:03:13 I've done some explorations with Nulib, but I actually use my own LibC, so I tend to avoid most of the Nulib. Your own LibC. Well, we'll have to talk about that. For my least favorite part of Nulib, it would be memory allocation scheme, I think is just too heavy-handed for most embedded systems. Yes. Yes. So much yes. Favorite Roman philosopher? I'll take the easy target and go with Seneca. All right.

Starting point is 00:03:39 Tip everyone should know. Get enough sleep, and it will really improve every aspect of your productivity and quality of your work output. I totally agree with that. Do you want to do one more, Christopher? I thought you were looking at me to disagree. No, I like sleep. I'm just bad at it. No, I think we should move on to the actual podcast.

Starting point is 00:04:10 Okay, let's do the contracting part uh you you have a company i mean kind of like i do but you uh you actually do things a little more formally than i do so if i was coming to you for a i had a project i had a napkin sketch of an idea. Let's say a light that is also a temperature that hooks into my Alexa. How would the process work with you? Usually there's an initial conversation where, you know, we're discussing the goals of the project, not necessarily the requirements, but what's the value you're looking to provide? How are you differentiating yourself from other products? What's your timeline? What are your constraints?

Starting point is 00:04:57 Things of that nature, trying to get a good overall view of what the project would entail. And that informs a lot of the follow-up work. There, you know, as always, the open-ended, we want to build this thing and we have no idea how to do it and we have no idea what chips we need or parts we need. And that ends up leading more towards, I guess, what you would call formally a discovery type arrangement where we're working perhaps on an hourly basis or even a fixed fee if the work is bounded to try to scope out how the system will behave and what the different requirements are and what parts we might want to use. If all of those questions are answered, which is pretty rare when a startup comes to us for help,

Starting point is 00:05:42 then we can outline a development plan for how we think we should best tackle the project. And a lot of that work is actually done by my business partner, Rosie, who spent 12 years doing project management at Apple. So she's very good about getting engineering to come up with a plan and sort of wrangling the various ideas

Starting point is 00:06:03 and figuring out how you're going to answer the open questions that are most pressing and how that informs your schedule and sort of ordering those events so it's most sane. And usually, depending on the contract, that's something that we can do over two or three phone calls with the client or, you know, we need to have an actual month-long engagement to sort of architect a solution and see how we're going to proceed. Do you do the hardware as well as the software? I personally don't do the hardware, but I've got a few business relationships with various design houses that can do hardware. But I find that most of the companies that come to us, and I don't know why this is the case, but they almost always have an in-house electrical engineer and have just really struggled to hire firmware engineers. And so firmware tends to be

Starting point is 00:06:56 the piece that's stalled out. Firmware also does tend to be the piece that ends up doing the project management. Why is that? I think it's because it's very nebulous to most of the rest of the organization. And if you don't have somebody who's experienced with firmware requirements in-house, you tend to have a lot of assumptions that are made by the rest of the project team on what is or isn't possible in the firmware side of it. And because firmware starts last a lot of times, I think it's the big driver for the completion of the project. And I don't know if that really answers the question. I was just rambling there. Oh, it's fine. Okay, so do you work with little companies or big companies or medium companies? Do you have a preference? My preference is the medium company who perhaps has shipped one or two projects and has discovered that the way they've been working

Starting point is 00:08:02 isn't sustainable and needs to revisit their approach, I would say that's about 25% of our clients. Most of our clients are very, very, very early stage startups in the two to six employee range who just have an idea or just created their initial prototype. When you say not sustainable, what kinds of things, what kinds of realizations are they having or what kinds of things are you finding for them and saying, well, you can't do it this way? The usual realizations, I guess, are they've released their product and then their customers are using the product and there's all these bugs. Well, they're also trying to work on a second version of the product or get a second product in their suite to market.

Starting point is 00:08:47 And they don't know how to effectively manage those different requirements and the different priority levels. And a lot of times we find that it could be traced back to software design flaws or process steps that are just being skipped that could be added and even automated in a lot of ways to sort of squash some of the easy bugs before they get into the field. Do you have a lot of clients, you mentioned that a lot of them seem to have double E's on staff, but not not firmware engineers. Do you have a lot of uh clients coming to you with like a prototype or something that they think is product quality um but it's based on an arduino or it's you know something modular based on arduino or uh is that a big part of your work like oh yeah well we need to take the step from prototype to to real hardware and i need to help your EE to do that? Or is it mostly, okay, they have the hardware

Starting point is 00:09:47 and it's just a matter of getting the firmware to work? It's a good mix of the two. I would say it's probably 50-50 split. You have the companies who have a prototype, they've shown the prototype to investors. It's hacked together as an Arduino with the off-the-shelf camera module and something else. And then they want to transform that into product, like into the final product type thing. And they may not know how to do that.

Starting point is 00:10:15 And that's something that we can help guide them for. The other 50% is like you said, they've, they've got a double E, they've got their form factor decided on, they've got their first version of the circuit board, and their electrical engineer or their software team doesn't know how to write the firmware for it. And so then we're just handed a hardware design, and usually we'll perform the initial bring up and get them started on features or help them explore what those features should be. This sounds not very Agile. Do you have feelings about the agile development process do you mean that my process doesn't sound very agile or the customer's project process doesn't sound very agile well you mentioned like architecting and processes and design it that that doesn't sound very agilely this is not an accusation this is not bad yeah um i have so i started out in um as a defense

Starting point is 00:11:17 contractor so i come from a formal background that was very, very, very stifling, but also did open my eyes to some more of the upfront design aspects. And then I went to Apple, and I've worked at various startups as the first firmware hire where there was no upfront design and no process. So my goal is blending of the two. I think that there is some amount of upfront design that can be done to answer big questions on paper before you even start coding. You know, drawing things with a pen or a whiteboard marker and connecting little boxes can unveil a surprising number of problems before you get into weeks of coding and find out you've made a solution that's untenable. On the flip side, you could spend all your time in design and

Starting point is 00:12:12 justify that as not starting. And I definitely see teams do that, and that's not good. Where I get the most phone calls from existing companies is usually along these lines. It's, hi, Philip, we had an initial design, whether it was a prototype or version one of our product, and we decided we need to go to a new processor for some reason, power, better radio, different radio, whatever it might be. And then they have the uncomfortable realization that they have to rewrite all of their software because they tied everything to the SDK and the RTOS that vendor A was using. And their new chipset is a totally different SDK with a totally different RTOS and none of their code can be moved over easily. And usually that's coupled by,

Starting point is 00:13:00 we have a build in a month and we need all of our software up and running by then. And we have no tests and we have no way to determine whether it's successful or not. And I get that phone call at least twice a month. So when I'm doing upfront design, I'm really trying to make it so we can understand what parts of the system are going to change and how can we design the software in a way that isolates most of the system from that change? So if you need to make a big critical change, such as even with your processor, that should be two to four weeks rather than now you need a six-month rewrite of everything. So that's sort of where I try to blend agility into the design.

Starting point is 00:13:41 I think some design can inform future agility, but you have to invest in that. I think that's an angle and a question that I see so rarely asked at the start of projects. What things could we potentially change in the future or swap out? And how does that affect how we architect it? What modules we have? What are the API layers? Where do we want to abstract things? And so a lot of times people do abstraction just for abstraction's sake, which can lead to some really bad software. But doing it that way and saying, wait, you know, we might change the OS. Okay, then everything that talks to the OS, you know, we might need a translation layer or we might change out the hardware. Then

Starting point is 00:14:28 we should have, you know, a good driver model that allows us to swap that out. A lot of times people just don't bother to ask that question. Then you end up like with what you're talking about, a very brittle thing that you have to start over on. And you mentioned testing. And I think you and I have similar feelings on testing. If it is something that is fixed, something that is important beyond the scope of the processor or the RTOS, if it's the secret sauce, the goo. The goo! It needs to be tested. It needs to have a way to test so that when you change the other things, it still works. Do you do a lot of that testing? I implement sandboxes for people so that their code will work on a PC or a Linux box so they can test their algorithms. Do you do much of that? I do exactly that. I actually try as much as

Starting point is 00:15:28 possible to be able to test my embedded code on the PC. And I really try to avoid on-target testing as much as I can, although it happens and it's valuable and you check different things. And the hard part is always where's the dividing line? Because I don't necessarily think that test everything is the right answer. But certainly test your algorithms, test the things that make your software special and where you're providing value, especially. And the goo, the goo, as you say, the magic goo. Okay, so your website. You have, like, my business website is like, yeah, we do stuff. Contact us if you want to.

Starting point is 00:16:17 And occasionally, if I don't have work, I will make it nicer. But since I'm fully booked for quite a while, I don't really care. It's a placeholder. It's a spam sock. But your website goes way beyond that. You have a blog that you work on often. You have different parts of the site that go to resources for beginners who embedded systems. It's a lot. Where should I get started if I am just coming to your website? If you're just coming to the website, we have a welcome page, which is up at the top on our menu bar. And that provides some orientation to the site with some of the articles that I'm particularly proud of and some that are popular, as well as explaining some of the different areas of the site.

Starting point is 00:17:07 Because as you say, there's the blog and there's different resources and we have a glossary of strange embedded terms. So it's definitely a lot. And that page is meant to help orient people to the website. You have a list of development kits you like. That too. And I need to keep adding to that. I've used a lot of embedded development kits now that i haven't written about wow this glossary okay

Starting point is 00:17:32 christopher i'm gonna quiz you now what keep in mind that i do not care what is a critical section uh it is a section that is very important. I'll allow it. It's a section you don't want to be interrupted while it's running because there's a resource or something that might be shared and you don't want something else to run that could interrupt it or change a resource out from under you. What is COGS?

Starting point is 00:18:04 C-O-G-S. All capital. Cost of goods sold. Yeah. See, I mean, there's a lot of stuff like that here where, yeah, you might know it, but every once in a while you come up with something. People come up with terms and you're, this is nice. This is nice.

Starting point is 00:18:19 Okay. One of the things that you mentioned on your welcome page is you have a list of your favorite articles and then you have a list of your most popular articles. Right now, those do not overlap at all. Does that irritate you? It doesn't irritate me, but it makes me laugh. I've definitely, I've realized that. And it's a little disappointing because the articles that I'm proud of are the ones that I spent the most time on or the, their problems that I really

Starting point is 00:18:50 grappled with for a long time. And some of those in the most popular articles section, I put together in 15 minutes based on a problem that was really annoying me with the client project or a personal project. And I just published it because I needed something to publish that week. And then for two years, it's been in the top 10 articles every month. So that definitely is a surprise. And I just learned that I can't predict what people are going to find useful. And it changes over time as well, what's popular and what's not. It's interesting to watch that sort of evolve month to month and see something as popular for a year and then nobody cares about it for six months.

Starting point is 00:19:27 And then six months after that, it's suddenly the number one article again. I think that's true of all of us who put things out creatively. It's like, why do you like that one? Okay, I guess I don't understand. But, you know, at least you like something. Three of the 10 most popular articles are about Jenkins on an embedded blog. And I do a lot of build server work, but I just wouldn't have guessed that, you know, 30% of my top articles are going to be on Jenkins.

Starting point is 00:19:56 Oh, and one of them is installing LLVM slash Clang on OSX. Yeah, that's so embedded. Or creating a circular buffer in C slash C++ is that that looks like it's your most popular article it is absolutely the most popular article it is the most commented on I get a lot of emails about improving it and how to do things differently and why did I design it that way I get a lot of engagement out of that article and yet it's something I design it that way? I get a lot of engagement out of that article. And yet it's something, I mean, it's important and yet it's not something, it's one, it's something you think about once. And then once you have a really good version,

Starting point is 00:20:36 you just stop thinking about it. Circular buffers are circular buffers. You don't need to, you don't need engagement on that. I agree with you, but I just can't question it. That's just the way that one goes. Okay. So let's go back to your favorite articles instead of the popular ones. You have number one as embedded rules of thumb. Is that like, you know, do stuff? Never start a land war in asia yeah what what is the most important embedded rule of thumb i would say it changes depending on what problem i'm facing but

Starting point is 00:21:15 my current favorite is a rule of thumb from jack gansell which is it's easier and cheaper for you to completely throw away and rewrite your 5% of problematic functions that you're always trying to fix the bugs in than it is to try to go back and incrementally fix all the bugs as they come up. And I think that's an underestimated and underused tactic for cleaning up code that's bad, is just throw it away and start over if it's bad. You'll do it better the second time because you have the problem fixed in your mind and you have the flaws fixed in your mind as well. And, you know, you're just going to come up with a better design than the initial one, which you weren't necessarily aware of all of the things you were trying to solve and account for.

Starting point is 00:22:00 I think that's great advice and also really tough to manage in large companies. What are you going to do this week? Well, I'm rewriting this section. We need to. No, you need to fix these bugs or yeah, it can be an uphill climb to refactor. But yeah, it's a sunk cost fallacy thing, right? It's like, oh, we spent all this time on this code. We have to keep it. But it's also taking all our time. And I've been through it. I've spent two months trying to clean up a bad module of code where the entire team acknowledged it was bad and we knew it was the source of all of our problems, but we just wouldn't commit to redoing it, which would have taken a week instead of the two months it took us to squash all the gopher bugs that started popping up. In this Rules of thumb post, you have lots of references to other people.

Starting point is 00:22:49 How do you find this information? How do you find people who give you good rules of thumbs? Thumbs? Thumbs? Rule of thumbs. Yeah. Is that where the strongest thumb wins? Never start a thumb warning. That's the iron law of thumbs. Oh, okay. Yeah. Is that where the strongest thumb wins? Yes.

Starting point is 00:23:06 Never start a thumb warning. That's the iron law of thumbs. Oh, okay. I get them all mixed up. Well, I can answer your question in a broader scope, which is when I was trying to become an embedded systems engineer, I learned on the job, essentially. And I didn't know where to look. So I was reading different papers, different articles, looking for other embedded engineers who were putting out quality content, which is hard to find.

Starting point is 00:23:38 And I've read thousands of articles over the past 10 years. And some pieces of information keep coming up over and over and over again. And it's like a simple rule of thumb that you'll hear repeated everywhere might be something like comments should not talk about what the code's doing. It should talk about why you're doing it that way. So some of those ideas start to stand out over time. And as I've gotten more experience, other ideas that I'm encountering just ring true, such as the throw out your bad functions and start over. I've spent the two months dealing with that. And so as I encounter these situations, I just started noting things down. And mostly they were personal notes. But for the rules of thumb article in particular,

Starting point is 00:24:23 when I had like, I don't know, 50, 100 rules of thumb in here, I was like, okay, maybe other people would find them either useful or amusing. Applying them is hard because a lot of them are derived from experience and you'll probably only agree with them if you've had the experience that gave rise to that particular rule of thumb. So, whether they're actually useful to people is, I think, somewhat questionable, but they're certainly very amusing to those of us who have been in the trenches. You don't have one of my favorites on here. Every sensor is a temperature sensor. Some sensors measure other things as well.

Starting point is 00:25:03 That's a great rule of thumb. I'm going to add that. Okay. So you have your rules of thumb, which I agree, a lot of these things you don't get until you have experienced it. And so it's only in retrospect that you can use that information. You can't pre-apply it. On the other hand, one of your other favorite articles is improving software with five processes you can adopt this month. And you go through Jenkins and Builds and... Let's see. Let me actually do all five. Fix all your warnings. Yes. Set up static analysis support, like Clang or PCLint. Measure and tackle

Starting point is 00:25:48 complexity. That's hard. You're going to make me do that in one month? It's hard. You could start doing that in a week. You could find the problems within a day. You can get complexity analysis set up in a day. Are you going to fix it all in a day? No. But you also won't know there's a problem unless you're looking. Create auto-formatting rules. Why is formatting such a big deal? For me, I propose auto-formatting because I don't think it's a big deal and I wish teams would not talk about it as much. And so if you have an auto-formatting system, it's a way to avoid the formatting arguments. Because I've been a part of too many code reviews that don't focus on the actual

Starting point is 00:26:35 implementation at hand, but only focus on the tabs versus white space argument or the way the braces are indented. And so for me, this is a strategy to eliminate the bike shedding that happens with formatting and to help teams focus on the stuff that actually matters. And you also suggest doing code reviews, which is another thing that I think probably takes longer than a month to even start. How do you do code reviews given that you kind of work by yourself? Code reviews are the thing that I miss the most about working in a company. I would say that that's really has been a major boost to just exposure to new ideas and learning different approaches to solving problems. So now that I'm a contractor, I try as much as possible to

Starting point is 00:27:26 work with my clients on reviewing the code that I write, which is a surprisingly difficult task. It's like they don't care. It's like they don't care. And I've even had plenty of deliverables where I know they don't care because then three months later, they told me they didn't use it, which is always a sad thing. But people pay a lot of money for code they won't use, and I will never understand that. Yeah, I've been paid a lot for code that people resented. Like, oh, I could have done that. Like, okay, you should have done it. You know, I'm just a consultant. I did what you told me.

Starting point is 00:28:02 Right. I have thought sometimes that the consultants that I know should band together to do code reviews exchange, but then you have NDA problems and all of that. You can't do that, but I do, I miss it. And it is so often that I will do a code review for a client and at the end, they'll be just like, why did you do this? We could have done a PowerPoint in five minutes. And I'm like, but you needed to see the code. I have to say in my current situation, you have like 25 people. You might not miss it. If you are going through what I go through every week. There is such a thing as too much code review. There's, yeah, there's a sweet spot and finding that sweet spot I think is the challenge that takes more than a month.

Starting point is 00:28:59 Yeah. Especially. And it takes, it takes discussion and understanding of what a code review means, what's expected of the reviewer, what's valuable and what's not valuable, what kinds of comments are going to hold up your code or what kinds of comments are things that should be tackled later, and being really clear about all of that stuff. Because when that's not clear, it's just a free-for-all and people are just throwing food at your code and you don't even know necessarily what things to address and what things not to and where the important parts are. So I don't have any experience recently with that. You have some best practices for code review and some discussion of the social aspects of code review, which is often an adventure.

Starting point is 00:29:45 My code is not me. My code is not me. Your code is not you. I'm not criticizing you when I criticize your code. I'm not saying you're stupid. I'm not saying any of those things. I'm just saying this code could have been better. Why are you pointing at me? I'm just having flashbacks.

Starting point is 00:30:03 Yeah, there's ways of being neutral and still giving comments. I actually started going to a writing group about a year ago, not necessarily because I really need help with my writing, but because I wanted to learn how to give criticism better. And that was what the group focused on. and so I could watch people give better updates better feedback and that has helped me with code reviews for other people or design reviews usually and it's not just that I am naturally sort of blunt or naturally sound angry to people, even when I'm not. It's just the, we aren't cogs. We aren't replaceable with each other.

Starting point is 00:30:57 We do have feelings and code reviews are hard on feelings. What are some of the things that you learned from your writing groups about giving feedback? You can't give too much. I mean, you can't, no, I phrased that weirdly. I can't. There is too much feedback. If you give too much, none of it will be taken. So you don't necessarily want to give only three pieces of feedback, which is advice I've heard from other people. Because in code, you can't do that. I mean, more than three pieces per line is too much. But sometimes you have a big file, you have a bunch of comments. But you do have to figure out if you can, instead of pointing out each error, point out the thought process error. Instead of circling

Starting point is 00:31:53 where all of the commas are wrong, you tell the person, this is how you use comments, commas, and this is how I would like you to use commas. And so you give them the tools to improve for the future. That may mean this piece, this code, this writing doesn't get the benefit, but you're helping them become better as a whole instead of making them feel criticized for circling all these commas and they don't even know why. They don't have a tool to use in the future. And so that's the thing that I have been a lot more focused on with code reviews is, okay, there are things wrong here, but what can I say that will allow them in the future to not make these mistakes as opposed to just fix these right now?

Starting point is 00:32:41 And I think another way to complement that approach is, and this I think applies more toward code reviews than writing, is I don't think people ask enough why the person implemented it in that way. So if I think that there's a problem with the implementation, it would help me significantly to understand why it was implemented a certain way, because I may be criticizing something that I also don't understand, which isn't productive at all. And if I'm a senior engineer talking to a junior engineer, you know, there's an inherent power and respect differential that's going to play into that. And I think that we're not always as aware of that. And asking for a thought process can help slow us down and prevent some of these easy to have happen mistakes from actually happening and if somebody

Starting point is 00:33:33 figures out their own issue before you have to point it out then the criticism doesn't come from you and so they don't have to get defensive at you. Yeah, I would just say that that's important, but also has to be phrased correctly. Oh, yeah. Like, could you explain your thought process is great. I'm confused why you did X. It's not great. Yeah, that's the famous consultant's line. We receive a lot.

Starting point is 00:34:00 And actually, that does apply to the writing group. We do that a lot. I don't understand why your character does this now. Could you walk me through your thought process of making this stream of consciousness instead of some reasonable writing style? Of course, you only say the first half of that sentence. And do you prioritize your code comments? That's always been the big one for me that I used to do for the writing group and they don't care anymore. What do you mean by prioritize my code comments? Well, I separate them into nits, which are things that I need to tell

Starting point is 00:34:40 you about, but you don't need to fix. Maybe, which is things that I think may be wrong, but I'm not sure. So if you agree, please fix it. Ought to, which is this is wrong, and if you're not going to fix it, tell me. Bug, which is this is very wrong. Stop right now. Don't proceed here. Either we need to talk about this or we need to file a ticket or something. And then I also have kudos, which is I really liked this. And I really try to

Starting point is 00:35:15 use kudos because we forget to say, I mean, it's a review. You do want to encourage the good behaviors too. I absolutely do that. I absolutely do that. And I learned that from my first manager at Apple, who I went on to work with at a startup following that. And he was the most excellent reviewer, particularly because he would give you the kudos and particularly because he would ask you why you did something before he explained another approach. And usually it was, I think, his style was, I think there would be a different way to consider this problem that you might try, such as X. And then that's much less personal. And every team I've worked on sort of has different categorizations of comments, but I think everything you listed is certainly,

Starting point is 00:36:07 you know, I tend to bucket my review comments in that way. And I usually summarize at the end of a code review, just to be clear, the ones that I expect to actually be fixed before something should land. Yeah, that helps. Because it is so easy to just have little nits all over the place that nobody really cares about but need to be documented because it isn't following the style guide

Starting point is 00:36:32 i don't want to talk about the style guide but we have one so use it that sort of thing usually it's we have a style guide that nobody looks at, but you should use it anyway. Yes. Two style guides. Okay, so what are some of your other favorite posts? The other posts I enjoy a lot are the series on implementing dispatch queues with RTOSs because I think it's a really interesting concept that can clean up a lot of threading problems that I see across embedded systems. And I also am really proud of the Boeing article, which started as a one paragraph newsletter update and ended up as a 25 page long harangue about that whole saga. Okay, this is the Boeing 737 Max thing,

Starting point is 00:37:31 where there was a sensor, and you had to have software to turn off the alarm, you had to pay extra to turn off the alarm, and it crashed, and it was bad. Most of that's wrong, but okay. Okay. So tell me what I should have said about it. What is it about?

Starting point is 00:37:47 There were two crashes involving the 737 MAX with, as you said, seemed to be related to a software system that behaved in a way that the humans didn't understand and prevented them from controlling the plane in a way that they didn't understand is the base problem. As the situations unfolded, there's a lot of other contributing factors that have come to light, such as Boeing and Airbus are competitors, and Airbus announced a new airframe where they were going for improved fuel efficiency, and Boeing needed to respond to that so they didn't lose business to Airbus. This competition leads to specific timelines and cost targets and certification requirements to minimize the cost and training time for airlines, and a lot of corners appear to have been cut through that process. So Boeing could

Starting point is 00:38:45 maintain their type certification and not have to have pilots go through expensive training to fly this airplane. The goal was, this is your grandfather's 737. You'll be able to fly it the same way that you've flown every other 737. The way they achieved fuel efficiency increases was with bigger engines, which they had to mount differently on the airframe, which caused different aerodynamic behaviors, especially with regards to stall angles. And so the software system was implemented to prevent pilots from entering into an angle which could cause the airplane to stall. That seems to be a big point of contention on whether that should have been allowed, whether there should have been a type rating change, and how Boeing implemented that specific software also appears to be problematic. There are two angle of attack sensors on the plane,

Starting point is 00:39:43 one on each side. And the flight computer, which has this new software that prevents the pilot from entering too high of an angle, only reads from the currently active side of the plane. So if the co-pilot's flying the plane, the co-pilot sensor's red. If the pilot's flying the plane, the pilot sensor's red. And as you mentioned, there was an optional upgrade that would check both and let you know if the sensors disagreed. For both of the crashes, it appears that there was a faulty angle of attack sensor reading off by dozens of degrees, which caused the plane to... Yeah, dozens of degrees. And they're known to be off or to be easily damaged, which is another ding against the implementation, I think, is if you have a sensor that is notorious for

Starting point is 00:40:35 being off and you need to disagree, you should probably check both. Or if you're a pilot, you know, if my AOA sensor is saying, is telling me it's 20 degrees off and I look out the window and I see that that's not true, that I'm not going to change how the plane's flying. But a computer getting one data point doesn't necessarily know what's happening and is just going to make the decisions it's programmed to make. Okay. I don't understand because when I worked on aircraft stuff, it was just for little planes and we often had to have three sensors agree or they would vote for things like horizon sensing which is very similar to angle of attack this should you're not boeing this should have been they should never have been

Starting point is 00:41:22 allowed to certify anything with a single sensor well yeah the certification process is under the under the microscope too oh yeah yeah and they appear to have been able to take this approach because of a specific um i don't know the correct aviation term so I'm probably going to get this wrong, but say the failure rating. So it wasn't rated as a catastrophic failure if the sensor is wrong because the pilot could still theoretically control the plane. Theoretically. And they were allowed to self-certify some of this, if I recall correctly. Yeah, for the past 20, 30 years, the FAA has been increasingly outsourcing much of the certification work to the different manufacturers. And so Boeing is responsible for a lot that the FAA wasn't aware of some of the changes that Boeing made to how the MCAS, this is the software under scrutiny, the MCAS software, it was changed after the certification process happened. And so when Boeing released data after

Starting point is 00:42:40 the second crash, the FAA was like, hey, we didn't actually know that the plane could be controlled to this degree by the system. So that's also another eye-opening problem that's, I'm sure, being looked at by multiple parties. One of the things I liked about your article, in contrast to another article that was going around the Internet around the same time for my triple A spectrum was that you you seem to correctly characterize it as a multi system, multi organizational failure and not just a the software. Part of that, I think, is we have this idea that software boeing can say we corrected the software we fixed the problem and a lot of the other systemic problems still exist and the factors that led us to this still exist right so the it almost seems like the software is being used as a distraction to the the rest of the problems that have happened under the hood. Yeah, they can fix this and then they can go on to the next bad decision that does something similar.

Starting point is 00:44:09 Right. How do we, as a software person, even if we don't blame the software, because I agree that this is a systemic business and environment and quality control and management problem. But as a software engineer, I do feel like at some point you have to stand up and say, wait a minute, if this, this, and this happens, we could crash planes. And I'm not having that happen on my watch.

Starting point is 00:44:43 Is it realistic that you would even know when a system is complex? Would you be responsible for such a small area that you might not know how it's all interacting? It's true because it had to do with both the autopilot and the sensor system. And if I was the sensor system engineer, I would figure that the autopilot people had their crap together. And if I was an autopilot engineer, I might not know how broken the sensor system is. I see that a lot, even outside of Boeing and organizations I've been a part of. It doesn't seem like in most organizations, there's a steward of the system as a whole. And the difficult thing about complex systems is if you take all of their pieces and you just look at your piece, you can't really extract the behavior of the system out from all of the

Starting point is 00:45:32 pieces. You have to be thinking about the entire thing as one thing that behaves on its own and probably is going to behave in ways that you're not expecting. And in the case of the MCAS software, I do think that, yeah, if you're on that team, then it probably would have been the right idea to say, okay, well, I need more than one reading to really verify this. On the other hand, maybe as the software developer, you don't actually know the inherent flaws in that particular AOA sensor or the problems that are seen in the field. I've certainly seen a lot of disconnect between what the engineering team believes about a product and what the support team believes about a product, how customers use it and what the bugs are. That is so true. And it isn't even just the support team. Sometimes it's the customer. You're

Starting point is 00:46:22 like, oh, no, we never meant for it to do that. I'm glad it was working for you.. Sometimes it's the customer. You're like, oh no, we never meant for it to do that. I'm glad it was working for you and now it's broken, but that was never the goal. What is the most important thing to learn from the saga? I mean, is there like one thing you can take away? The architecture, there should be a system architect or a systems engineer who has a view into the whole system. Are there other things you think we should take away from that? I don't know that I have answers that are satisfactory, but I think there are lessons that we can learn that should drive ways that we handle this situation in the future. We're starting to build systems that are so

Starting point is 00:47:07 complex that whole organizations cannot accurately grapple with the consequences of the behavior of the system or their decisions with relation to the system. And that's not going to change. Everything we're building is getting more and more complicated. Somehow we're going to have to get a handle on that. And I don't necessarily know the right way to do that. And I think the other key takeaway is it's not like Boeing's unique in the decisions that they made that led them to the outcome. They're unique in that the decisions that were made led to two crashes, which resulted in the deaths of 346 people.

Starting point is 00:47:48 And I work with dozens of organizations that make very similar compromises. They make very similar mistakes. And sure, you might ship your widget and there's no real critical customer impact, but it's not like Boeing's doing something that every other company isn't doing in some way. And so it's easy to point fingers and say, oh, look at Boeing and their bad behavior. But I think we should also take a look within our own organizations and see how we're cutting corners and what are the consequences of the things that we're focusing on? Because you're probably not thinking about that. And somebody probably should think about that. Yeah, risk and hazard analysis are two things that most people don't do.

Starting point is 00:48:31 And just in terms of security with IoT devices and things, that should be something that every company does. Because every time you build something like that, there's a potential for something catastrophic. You're not going to crash a plane, but you might expose hundreds of thousands of people's private data or whatever. Or even like some of the alleged Nest hacking attempts where you have a baby in a room and the heater's just turned on full blast. Whether or not that one is actually substantiated, it's definitely the kind of possibility when you skip out on security for connected devices. questions to put on your risk list by saying, okay, let's pretend it's two years later and whatever the worst possible thing could happen has happened. How did it happen? And for Boeing,

Starting point is 00:49:37 that actually should be part of their process of let's pretend that a plane crashed because of the software. How did this happen? And work backwards and then try to minimize the risk at each stage. That is not that hard of a process to do when you're working with software. I mean, it doesn't have to be that critical. It can just be the worst thing that happened was we lost all our client data or the worst thing that happened was someone got shocked. And so you work backwards. How would you get shocked from the light switch? Okay, well, maybe we shouldn't make it out of metal and electrify it. You know, you're saying we need black hats for, for trying to crash planes. You need black hats for the risk assessment. Um, so many of us go forward. Uh, I didn't think about reverse

Starting point is 00:50:36 engineering for a long time. It was something somebody else did, but then I got into a BBA and I realized it is the same thing I'm doing usually. It's just debugging something I didn't write, which I do a lot of anyway. But it's the same process of, it's backwards. It feels backwards. And yet it is just as important as what feels naturally. And it should, with practice, it doesn't feel backwards anymore. Did that make any sense?

Starting point is 00:51:04 Yeah. Yeah? Philip didn't say yeah, but I think we're going to go with it. My question for you is whether you think we're running up against, just call it a fundamental limit of human psychology, and how do you train teams to think in a way that's sort of antithetical to the way that we all think by default? For the most part, we're all thinking of the happy case, right? We're all going to make it rich and we're only going to pick the things that we need to pick to get to the part where we're going to be rich. And all the things that are going to cause us to not be rich, we'll deal with them when they come up. I think that's the human tendency. So I wonder if there's a way to

Starting point is 00:51:41 work around that and instill that kind of culture where you are looking at the risks, even though it's not natural and it's not fun and it feels like a step back from getting you towards your end goal. Yeah, because sometimes that step back, you realize, oh my God, this path doesn't work. I can't do security on this chip. And it just becomes untenable and terrifying. And so you have to give people time. You can't just ask for the answer. You can't just ask for the happy path. You can't just ask for people to fill out all of the features.

Starting point is 00:52:22 You also have to ask them to think about the system. And that is not something we get a lot of time to do as developers. And so it's sort of time we have to take and say, look, this is part of my job. This is part of what I do to be a developer, to be an engineer, to be a programmer, whatever word you want to use. Even as a hacker, I would like people to consider, okay, I have hacked the nest to do bad things to people's houses. What's the worst thing that can happen to this? For me, it was just a fun thing. But if I put it online, what's the worst thing that happens? And you do take some responsibility for the consequences of your actions.

Starting point is 00:53:12 Christopher's making faces. I don't know. No, no, because I'm thinking about QA, right? And engineers don't test their code well. No, because we think of how it should work. Well, we think of how it should work, but also we know how it works. And so there's an inherent blind spot there. Yes. And so whether or not you're consciously avoiding doing things that you know might break it because you know how it works or whether that's

Starting point is 00:53:33 unconscious or whether you're just not creative enough at creating something and breaking it. Yeah. I think that's probably not going to be, it's not going to be super effective to have the engineers doing that. Because this is why we have QA teams and dedicated testers, because they're better at coming at something with a fresh mind and finding ways to break it. So I think the same thing applies to risk analysis. You kind of need somebody who does that, at least for a significant portion of their time, and isn't just an engineer trying to turn their brain upside down for a day a week. I mean, this is the security model, right? There's people who are penetration testers and people who, you know, consult on here's what's wrong with here. Here's what I'm finding are holes with your system.

Starting point is 00:54:22 And they're not the people necessarily writing the code. And maybe that's what's needed in some organizations where the risk is high is to have, you know, dedicated adversarial people not testing necessarily, but doing what we're talking about, you know, imagining the worst case scenarios and trying to construct a way to get there. Okay, I have a new idea for business. We're going to call it Agents of Chaos. And it's going to be all risk assessment. Well, wasn't that Netflix's Chaos Monkey thing where they just deliberately injected automated breakages? And then they made sure that their system was robust.

Starting point is 00:55:00 But they did that on production. Chaos Monkey still runs. Yeah, I can see that sometimes. Oh, Philip, I'm sorry. We do sometimes end up talking to ourselves. It's okay. It's fun to listen to. Let me get back to one of the articles.

Starting point is 00:55:18 Actually, we have a question from a listener, Krusty Auklet. Krusty Auklet, really? I've met him. Is he Krusty or an Auklit? No, no. Okay. There are a few articles on Embedded Artistry, your site, about

Starting point is 00:55:36 retargeting modern C++ using FreeRTOS and ThreadX. Krusty Auklit is curious what you think the pain points are and or the limits of GCC and Nulib. For example, it seems like thread, standard thread,

Starting point is 00:55:53 is not great since you can't tweak the stack size of threads. What? You can't? That's really important. You cannot. Okay, that was a pretty long question. I will just leave you to answer it. Go ahead. So I'm going to start my answer by prefacing that all of my retargeting C++ experience is using libc++, which is Clang's implementation.

Starting point is 00:56:18 So I don't actually have any comments on the GCC C++ library. But overall, it's very clear to me that the C++ standards committee implemented the threading library support simply based on how Pthread works. And there's almost like a direct one-to-one mapping for the API set, including like, you don't need to set thread stack sizes in Pthread because there's a pretty good default thread stack size. And like, you really need to know what you're doing if you're modifying that. So as the reader mentioned, like, that tends to be a problem. The other types actually work pretty well. mutex is relatively straightforward to implement. Using it makes sense. There's a lock, there's an unlock, there's all sorts of useful helper types that you can lock a lock when you enter a function, and no matter where you return

Starting point is 00:57:13 from the function, it will automatically unlock. Using that is wonderful. Implementing it, the only difficulty is because of this pthread C++ standard library dependency, the standard mutex constructor has to be a constant expression, which I've never actually seen apply for any RTOS or other OS that I've used. And so that tends to be like the one tricky thing. You have to check before you lock your lock whether you've initialized this mutex and if not, call some initialization code. That's some unneeded overhead.

Starting point is 00:57:47 Yeah. But, you know, using it once that's done and doing that is straightforward. Condition variable, also easy to use once it's implemented, but implementing a condition variable can be a tricky problem space if your OS doesn't have a primitive already for that because the obvious approaches to handling it lead to logical flaws. So there's a great Microsoft research white paper,

Starting point is 00:58:12 which I'll send you the link to, and you can add to the show notes, which goes through an algorithm for actually implementing that, where you need a semaphore and a queue where you keep track of threads waiting on a condition variable and the ability to directly suspend and resume threads. And so if you have those things, you can implement

Starting point is 00:58:30 the condition variable in a straightforward way. Wait a minute, let's go back. A condition variable tells you which threads are running, the state of the thread? No. A condition variable is a concept where I want to sleep my thread until a specific condition is true. And that might be waiting for a variable value or some functor to return true or any number of things. But essentially, when whatever my predicate is returns true, I'm going to wake the thread and resume running. Okay, so a condition variable, for me, when I think about not using this sort of thing, it's the global variable I set in the interrupt that tells the loop to go deal with the data I just collected. That's more of a single flag.

Starting point is 00:59:16 So a condition variable could be a set of flags or a value that you're waiting for, right? Yeah, I think it's more in line with like an event group where my thread is actually sleeping until some condition which I specified in the flag get call is true. Okay, so it could be a series of flags, or it could be a series of states happening, things being ready. Okay. Let's see. So that's condition variables. So then for thread, like the reader mentioned,

Starting point is 00:59:52 implementing the threading support is relatively straightforward, but the APIs are very limited. So if you need to create a thread and you have a good system default stack size and good system default priority, and you need some generic concept where just an average old thread will do then it works well but you know unembedded systems you

Starting point is 01:00:11 often want to control your thread priority because i want my thread that's handling my interrupt callbacks to run before anything else and you might have different priorities for different hardware component managing managing threads. And stack sizes are definitely variable. My LED blinking thread doesn't need a 128 kilobyte thread stack. So we often want to tune those values. And my particular solution has been to, I have a, I guess, a separate set of C++ threading concept APIs, which lets you tweak all of those things. And I'm currently working on validating those APIs by using them for Pthread

Starting point is 01:00:55 and ThreadX and FreeRTOS. And I'll probably pick two other RTOSs to make sure the abstractions work. And I'll release those as open source interfaces if you wanted a good C++ RTOS abstraction layer. And then you can always go the route that I went, which is I have STL support for their types based on these APIs I wrote. And so you sort of get the best of both worlds. If you have a C++ app developer who wants to use standard mutex in the app layer of your firmware and doesn't actually care about setting all the other little OS bits that might come with that, like priority inheritance settings, that really works well and is a huge time saver and sort of opens up the possibilities of who can work on your system.

Starting point is 01:01:41 Do you have examples of where this might be used? As you write tests, are you writing little demo programs too? I have, I have a general, I guess, framework demo thing that I'm working on, which is sort of nebulous at the moment. I don't actually know how to talk about that to answer your question in good details. Oh, that's fine. I'm just, some of the concepts you've talked about, I know what they are, and yet I'm still a little on the fence of, okay, so if I have FreeRTest running and have C++, how do I glue these Lego blocks together with your code? The Clang libc++ has a nice external threading abstraction layer. So there's

Starting point is 01:02:28 a set of functions that if you wanted to use the standard C++ types, you just need to supply an implementation for those particular functions. And so if you wanted to use my code, I have an external threading header that maps directly to that. And I will be, in the future, releasing some demo apps that do show how to use that and how to hook the different pieces together. But that is under development. Okay, so I have more questions about various blog posts and going more in depth. But we don't have that much time. Okay.

Starting point is 01:03:05 So I want to go back to the question of, you have this website full of pretty useful information. Why? I mean, you're a consultant. You get paid by the hour, I assume, unless you do a lot of fixed bids, but most of us don't. So, yeah, why? I mean, you do a lot of fixed bids, but most of us don't. So, yeah, why? I mean, you could be making money.

Starting point is 01:03:31 I actually started the website before I started my business. And I alluded to this earlier in the episode. When I was learning about embedded systems, there wasn't really, there wasn't a class that I could take, there wasn't a good book to turn to that I could find, there weren't really good websites on it. And so most of the aspects of how to write firmware and how to debug embedded systems and do all of the various things I needed to do, I had to do a lot of research for I had to bug senior engineers and try to figure out how they did it. I had to, you know, read through 1000s of pages of data sheets to figure out how something worked, read through the ARM processing, architecture requirements to figure

Starting point is 01:04:19 out how arguments were passed, all sorts of things like that. And you slowly piece together various aspects. And I'm a pretty prolific note taker. So I ended up with thousands of pages of notes. And one day it just hit me that I'm probably not the only one in the world struggling with all of this stuff. And the internet today is a bit better in that you can Google things and get a lot more answers. And there tends to be more embedded resources. But even those, I think, are geared heavily toward the hobbyist and maker space. And so I just started going through all of my old notes and cleaning them up and adding some examples and having my wife edit them. She worked in publishing

Starting point is 01:04:59 for four years. So that's a handy person to have on your team. And just slowly releasing those. I mean, over time, I think the first year, 200 people read my website for the entire year. And now we're hitting 50,000 people a month. So it really seems like there's a need. And I really derive a lot of enjoyment from taking what I learn and the things that I'm struggling with and that I'm researching and just writing it down and publishing them and then seeing that other people find that useful and helpful. I understand that feeling. And I still get a little confused about it. I mean, it is time. It's time we could be doing other things. Why are we spending our time helping other people for free? Do you ever, do you worry about that? Or do you just enjoy the sharing enough? I would like to find a way to make money doing that because I enjoy that. I enjoy

Starting point is 01:06:07 publishing the information and helping other developers learn a lot more than I enjoy doing consulting contracts, which quite honestly end up being the same thing with a different set of requirements, maybe. But most of that work is the same. And that can get old after a while. And the exploration of the unknown is certainly more exciting and helping people is certainly more exciting. So I have struggled on how to find a way to have more of my income come in from doing the website without relying on ads. Because, I mean, you look at the big embedded in electrical engineering news sites, they're mostly product placement ads. And it's really disappointing to me. Even the articles, it's just super disappointing to see. And I don't want to

Starting point is 01:06:57 go down that route. But I understand why they do because you get offers, you can make money, but it's just that part would be sad to go that route. So it bugs me a lot to answer your question. And something I think about a lot how I could make more income from that and be able to focus my, you know, my daily work activities on it. You already spend quite a bit of each day working on the posts. About how long does it take per week? I spend, I would say, an hour and an hour and a half to an hour and a half every morning researching or writing. Depending on the length of the post, most of them take five to 10 days plus four hours of editing. So let's call that 20 hours, you know, 20 hours, I think on a post of any significant length. So that, you know, it certainly adds up,

Starting point is 01:07:55 like you were saying, time is money. It is tough. It's tough to have that mental shift of giving things away. But it is sort of advertising, isn't it? Do you consider it advertising for your consulting services? I do. I've certainly received work from my website. And I would actually say the most rewarding projects that I've worked on came from my website because they were from clients

Starting point is 01:08:21 who shared the same values that I shared and who cared about what I had to say rather than just thinking that I was some generic firmware developer contractor who would sit in a desk for them and type code at a specific number of hours a week. So it certainly has had it, it's paid its dividends beyond just, you know, getting enjoyment from it. And it does land me work and introductions, and I meet a lot of great engineers who reach out to me about the site. There's all sorts of nice network effects that come from that.

Starting point is 01:08:54 Yeah, I understand that. Have you considered writing a book? I would love to write a book, but I'm not sure yet what I need to put in a book that I can't put into an essay. And for me, that's the bar. And I've had some publishers reach out to me about publishing a book on embedded systems or embedded C++, but they always seem to have strange, almost startup-esque deadlines where it's like, we need you to write this book in three months and we're going to pay you $5,000 to do that,

Starting point is 01:09:23 which is a full-time job and who knows if that could actually be done in three months, and we're going to pay you $5,000 to do that, which, you know, is a full-time job. And who knows if that could actually be done in three months. And $5,000 certainly isn't enough to live on for three months while you're doing that work. So, that's also been presented, but I haven't found the right thing yet. We should talk more offline. Would love to. I do think I should let you go. We are about out of time. And now I get to ask you this

Starting point is 01:09:50 final thoughts, questions, or last thoughts. People didn't like... Last thoughts? That's even worse. On my way to the grave? Philip, before you go, do you have any thoughts you'd like to... Do you have any last words last words do you have thoughts you'd like to leave us with um yes i would like to say that i publish a monthly newsletter it comes out on the first monday of every month if you're interested in keeping up with

Starting point is 01:10:19 ongoing developments in the embedded systems industry, you can sign up on our website at embeddedartistry.com slash newsletter. A modified version of our Boeing essay is going to be published in the June edition of the Software Quality Professional Journal, so keep an eye out for that. And I want to end on a serious note, so, you know, I'm going to get real for a second, related to the Boeing discussion we had, which is an idea that's just been sitting in my mind since I've been researching the Boeing 737 MAX crashes. And it's rooted in my studies in philosophy and psychology. And it's a simple idea that I think all of us really understand deep in our bones, which is where we're aiming at is the most important factor in determining where we're going to end up.

Starting point is 01:11:10 And I'm not going to say what the aim should be. It's certainly up for debate. But the goals and the values that we select and that we focus on change how our brain sees the world and how we make decisions. So I just want to ask all the listeners to take a few moments and think about where you're aiming in your life and where your organizations are aiming. And are you happy with where that's going to end up? And if you're not happy with the consequences, or even if you are,

Starting point is 01:11:41 I hope that you'll take the time to pick a new aim, raise your aim a little bit higher. And I think we'll all be pleasantly surprised at the positive outcomes that that will have on our own lives and the lives of the people around us. Very nice. Our guest has been Philip Johnston, founder and principal at Embedded Artistry, an embedded consulting firm in San Francisco, California. You really should check out his blog, embeddedartistry.com slash blog. Thank you for being with us, Philip. Thanks for having me. It was really a pleasure and I hope we get to do this again sometime. Thanks, Philip. Thank you also to Christopher for producing and co-hosting. And thank you all for your

Starting point is 01:12:22 patience over the last few weeks as we missed a few shows. Also, thank you for listening. You can contact us once again at show at embedded.fm or hit the contact link on embedded.fm. Welcome back embedded.fm. Now a quote to leave you with. This I am getting from

Starting point is 01:12:43 Philip's Rules of Thumb post, and it's from Jacky Cancel. It's one of my favorite quotes from him. Study after study shows that commercial code, in all of the realities of its undocumented chaos, costs $15 to $30 per line. A lousy thousand lines of code, and it's hard to A lousy thousand lines of code, and it's hard to do much in a thousand lines of code,

Starting point is 01:13:08 has a very real cost of perhaps $30,000. The old saw, it's only a software change, is equivalent to, it's only a brick of gold bullion. Embedded is an independently produced radio show that focuses on the many aspects of engineering. Thank you.

CODACE Plant Stand

Embedded - 290: Rule of Thumbs

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

Embedded - 290: Rule of Thumbs

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.