Embedded - 290: Rule of Thumbs
Episode Date: June 6, 2019We spoke with Phillip Johnston (@mbeddedartistry) of Embedded Artistry about embedded consulting, writing about software, and ways to improve development. In the Embedded Artistry welcome page, there ...is a list of Phillip’s favorite articles as well as his most popular articles. Some of Phillip’s favorites include: Embedded Rules of Thumb Improving SW with 5 LW Processes Learning from the Boeing 737 MAX saga We also talked about code reviews and some best practices. The Embedded Artistry newsletter is a good way to keep up with embedded topics. You can subscribe to it at embeddedartistry.com/newsletter What are condition variables?
Transcript
Discussion (0)
Welcome to Embedded.
I'm Alicia White.
My co-host is Christopher White.
And our guest this week is Philip Johnston of Embedded Artistry.
Hi, Philip. Welcome.
Hi, guys. How are you?
Good, good.
Could you tell us about yourself?
As Alicia mentioned, my name's Philip, and I'm the founder of Embedded Artistry, an embedded
systems consulting firm based in San Francisco, California.
I run the company with my wife, Rosie, and we also run a website by the same name, which
is dedicated to embedded systems content.
On that site, you can find hundreds of articles and other resources targeted for embedded
systems engineers
of all varieties and skill levels. When I'm not working, I enjoy volunteering as a gardener at
the San Francisco Japanese Tea Garden, playing music, cooking good food for my friends and
family, and reading Latin to my four-month-old son.
Buried the lead there. The last part there.
The Latin or the gardening?
Yeah, reading Latin. What sort of things do you read in latin
poetry
or a lot of cicero
philosophy kind of material
yeah because there's
a lot of latin poetry that you
shouldn't probably be read to kill kids
it's four months old
certainly true
yeah he doesn't understand he likes the rhythm
uh all right we're mostly going to be talking about that website and latin with lots of
with lots of embedded material and what's on it and why and how, you know, all the usual questions.
But before we get into that lightning round where we ask you short questions and we want
short answers and we might say how and why, but we're not supposed to, but you don't have
any rules or are you ready, Philip?
I'm ready.
Favorite processor?
Family or architecture? Dealer's choice or architecture dealer's choice dealer's choice i would have to go
with the nrf 52 favorite artist can i pass favorite plant favorite plant would be the black pine
black pine it's a japanese pine tree all right least favorite compiler iar
it's such an easy target yeah this week we are sponsored by
iar uh what's the most important point when looking at contracts for consulting
i really look for equitability in the relationship and in the contract.
So I tend to not trust things that are one-sided on either my end or their end.
Because I think that sets up how the relationship is going to go.
What is the most useful part of Nulip?
I actually don't use Nulip very much.
All right.
What's the least useful part of Nulip? I've done some explorations use Nulib very much. All right. What's the least you use?
I've done some explorations with Nulib, but I actually use my own LibC, so I tend to avoid most of the Nulib.
Your own LibC.
Well, we'll have to talk about that.
For my least favorite part of Nulib, it would be memory allocation scheme, I think is just too heavy-handed for most embedded systems.
Yes. Yes. So much yes.
Favorite Roman philosopher?
I'll take the easy target and go with Seneca.
All right.
Tip everyone should know.
Get enough sleep, and it will really improve every aspect of your productivity and quality of your work output.
I totally agree with that.
Do you want to do one more, Christopher?
I thought you were looking at me to disagree.
No, I like sleep.
I'm just bad at it.
No, I think we should move on to the actual podcast.
Okay, let's do the contracting part uh you you have a company i mean kind of like i do but you uh you actually do things a little more formally than i do so if i was coming to you
for a i had a project i had a napkin sketch of an idea. Let's say a light that is also a temperature
that hooks into my Alexa. How would the process work with you?
Usually there's an initial conversation where, you know, we're discussing the goals of the project,
not necessarily the requirements, but what's the value you're looking to provide?
How are you differentiating yourself from other products?
What's your timeline?
What are your constraints?
Things of that nature, trying to get a good overall view of what the project would entail.
And that informs a lot of the follow-up work.
There, you know, as always, the open-ended, we want to build this thing and we have no idea how
to do it and we have no idea what chips we need or parts we need. And that ends up leading more
towards, I guess, what you would call formally a discovery type arrangement where we're working perhaps on an
hourly basis or even a fixed fee if the work is bounded to try to scope out how the system will
behave and what the different requirements are and what parts we might want to use.
If all of those questions are answered, which is pretty rare when a startup comes to us for help,
then we can outline a development plan
for how we think we should best tackle the project.
And a lot of that work is actually done
by my business partner, Rosie,
who spent 12 years doing project management at Apple.
So she's very good about getting engineering
to come up with a plan
and sort of wrangling the various ideas
and figuring out how you're going
to answer the open questions that are most pressing and how that informs your schedule and
sort of ordering those events so it's most sane. And usually, depending on the contract,
that's something that we can do over two or three phone calls with the client or,
you know, we need to have an actual month-long engagement to sort of architect a solution and see how we're going to proceed. Do you do the hardware as well as the
software? I personally don't do the hardware, but I've got a few business relationships with
various design houses that can do hardware. But I find that most of the companies that come to us,
and I don't know why this is the case, but they almost always have an in-house electrical engineer and have just really struggled to hire firmware engineers. And so firmware tends to be
the piece that's stalled out. Firmware also does tend to be the piece that ends up doing the project management. Why is that?
I think it's because it's very nebulous to most of the rest of the organization.
And if you don't have somebody who's experienced with firmware requirements in-house,
you tend to have a lot of assumptions that are made by the rest of the project team on what is or isn't possible in the firmware side of it. And because firmware starts last a lot of times, I think it's
the big driver for the completion of the project. And I don't know if that really answers the
question. I was just rambling there. Oh, it's fine. Okay, so do you work with little companies
or big companies or medium companies? Do you have a preference? My preference is the medium company
who perhaps has shipped one or two projects and has discovered that the way they've been working
isn't sustainable and needs to revisit their approach,
I would say that's about 25% of our clients. Most of our clients are very, very, very early
stage startups in the two to six employee range who just have an idea or just created their
initial prototype. When you say not sustainable, what kinds of things, what kinds of realizations are they having or what kinds of things are you
finding for them and saying, well, you can't do it this way?
The usual realizations, I guess, are they've released their product and then their customers
are using the product and there's all these bugs. Well, they're also trying to work on a second
version of the product or get a second product in their suite to market.
And they don't know how to effectively manage those different requirements and the different priority levels.
And a lot of times we find that it could be traced back to software design flaws or process steps that are just being skipped that could
be added and even automated in a lot of ways to sort of squash some of the easy bugs before they
get into the field. Do you have a lot of clients, you mentioned that a lot of them seem to have
double E's on staff, but not not firmware engineers. Do you have a lot of uh clients coming to you with like a
prototype or something that they think is product quality um but it's based on an arduino or it's
you know something modular based on arduino or uh is that a big part of your work like oh yeah well
we need to take the step from prototype to to real hardware and i need to help your EE to do that? Or is it mostly, okay, they have the hardware
and it's just a matter of getting the firmware to work?
It's a good mix of the two.
I would say it's probably 50-50 split.
You have the companies who have a prototype,
they've shown the prototype to investors.
It's hacked together as an Arduino
with the off-the-shelf camera module and something else. And then they want to transform
that into product, like into the final product type thing. And they may not know how to do that.
And that's something that we can help guide them for. The other 50% is like you said, they've,
they've got a double E, they've got their form factor decided on, they've got their first version of the circuit board, and their electrical engineer or their software team doesn't know how to write the firmware for it.
And so then we're just handed a hardware design, and usually we'll perform the initial bring up and get them started on features or help them explore what those features should be.
This sounds not very Agile.
Do you have feelings about the agile development process do you mean that my process doesn't sound very agile or the customer's project process doesn't
sound very agile well you mentioned like architecting and processes and design
it that that doesn't sound very agilely
this is not an accusation this is not bad yeah um i have so i started out in um as a defense
contractor so i come from a formal background that was very, very, very stifling, but also did open my eyes to
some more of the upfront design aspects. And then I went to Apple, and I've worked at various
startups as the first firmware hire where there was no upfront design and no process. So my goal
is blending of the two. I think that there is some amount of upfront design that can be done
to answer big questions on paper before you even start coding. You know, drawing things with
a pen or a whiteboard marker and connecting little boxes can unveil a surprising number
of problems before you get into weeks of coding and find out you've made
a solution that's untenable. On the flip side, you could spend all your time in design and
justify that as not starting. And I definitely see teams do that, and that's not good.
Where I get the most phone calls from existing companies is usually along these lines. It's,
hi, Philip, we had an initial design, whether it was a prototype or version one of our product,
and we decided we need to go to a new processor for some reason, power, better radio, different
radio, whatever it might be. And then they have the uncomfortable realization that
they have to rewrite all of their software because they tied everything to the SDK and the RTOS that
vendor A was using. And their new chipset is a totally different SDK with a totally different
RTOS and none of their code can be moved over easily. And usually that's coupled by,
we have a build in a month and we need all of our software up and running by then. And we have no tests and we have no way to determine whether it's successful or not. And I get that phone call
at least twice a month. So when I'm doing upfront design, I'm really trying to make it so we can
understand what parts of the system are going to change and how can we design the software in a way
that isolates most of the system from that
change?
So if you need to make a big critical change, such as even with your processor, that should
be two to four weeks rather than now you need a six-month rewrite of everything.
So that's sort of where I try to blend agility into the design.
I think some design can inform future agility, but you have to invest in that.
I think that's an angle and a question that I see so rarely asked at the start of projects.
What things could we potentially change in the future or swap out? And how does that affect
how we architect it? What modules we have? What are the API layers?
Where do we want to abstract things?
And so a lot of times people do abstraction just for abstraction's sake, which can lead to some really bad software.
But doing it that way and saying, wait, you know, we might change the OS.
Okay, then everything that talks to the OS, you know, we might need a translation layer or we might change out the hardware. Then
we should have, you know, a good driver model that allows us to swap that out. A lot of times
people just don't bother to ask that question. Then you end up like with what you're talking
about, a very brittle thing that you have to start over on. And you mentioned testing. And I think you and I have similar feelings on testing. If it is something that is fixed, something that is important beyond the scope of the processor or the RTOS, if it's the secret sauce, the goo.
The goo! It needs to be tested. It needs to have a way to test so that when you change the other things, it still works.
Do you do a lot of that testing?
I implement sandboxes for people so that their code will work on a PC or a Linux box so they can test their algorithms.
Do you do much of that?
I do exactly that. I actually try as much as
possible to be able to test my embedded code on the PC. And I really try to avoid on-target testing
as much as I can, although it happens and it's valuable and you check different things.
And the hard part is always where's the dividing line? Because I don't necessarily think
that test everything is the right answer. But certainly test your algorithms, test the things
that make your software special and where you're providing value, especially. And the goo, the goo,
as you say, the magic goo. Okay, so your website.
You have, like, my business website is like, yeah, we do stuff.
Contact us if you want to.
And occasionally, if I don't have work, I will make it nicer. But since I'm fully booked for quite a while, I don't really care.
It's a placeholder.
It's a spam sock. But your website goes way beyond that.
You have a blog that you work on often. You have different parts of the site that go to resources
for beginners who embedded systems. It's a lot. Where should I get started if I am just coming
to your website? If you're just coming to the website, we have a welcome page, which is up at
the top on our menu bar. And that provides some orientation to the site with some of the articles
that I'm particularly proud of and some that are popular, as well as explaining some of the different areas of the site.
Because as you say, there's the blog and there's different resources
and we have a glossary of strange embedded terms.
So it's definitely a lot.
And that page is meant to help orient people to the website.
You have a list of development kits you like.
That too.
And I need to keep adding to that.
I've used a lot of embedded development kits now that i haven't written about wow this glossary okay
christopher i'm gonna quiz you now what keep in mind that i do not care
what is a critical section uh it is a section that is very important.
I'll allow it.
It's a section you don't want to be interrupted while it's running
because there's a resource or something that might be shared
and you don't want something else to run that could interrupt it
or change a resource out from under you.
What is COGS?
C-O-G-S.
All capital.
Cost of goods sold.
Yeah.
See, I mean, there's a lot of stuff like that here where, yeah, you might know it,
but every once in a while you come up with something.
People come up with terms and you're, this is nice.
This is nice.
Okay.
One of the things that you mentioned on your welcome page is you have a list of your favorite
articles and then you have a list of your most popular articles.
Right now, those do not overlap at all.
Does that irritate you?
It doesn't irritate me, but it makes me laugh.
I've definitely, I've realized that. And it's a little disappointing because the articles that
I'm proud of are the ones that I spent the most time on or the, their problems that I really
grappled with for a long time. And some of those in the most popular articles section, I put
together in 15 minutes based on a problem that was really annoying me with the client project or a
personal project. And I just published it because I needed something
to publish that week. And then for two years, it's been in the top 10 articles every month.
So that definitely is a surprise. And I just learned that I can't predict what people are
going to find useful. And it changes over time as well, what's popular and what's not. It's
interesting to watch that sort of evolve month to month and see something as popular
for a year and then nobody cares about it for six months.
And then six months after that, it's suddenly the number one article again.
I think that's true of all of us who put things out creatively.
It's like, why do you like that one?
Okay, I guess I don't understand.
But, you know, at least you like something.
Three of the 10 most popular articles are about Jenkins on an embedded blog.
And I do a lot of build server work, but I just wouldn't have guessed that,
you know, 30% of my top articles are going to be on Jenkins.
Oh, and one of them is installing LLVM slash Clang on OSX.
Yeah, that's so embedded.
Or creating a circular buffer in C slash C++ is that that looks like it's your most popular article
it is absolutely the most popular article it is the most commented on
I get a lot of emails about improving it and how to do things differently and why
did I design it that way I get a lot of engagement out of that article
and yet it's something I design it that way? I get a lot of engagement out of that article. And yet it's something, I mean, it's important and yet it's not something,
it's one, it's something you think about once. And then once you have a really good version,
you just stop thinking about it. Circular buffers are circular buffers. You don't need to,
you don't need engagement on that. I agree with you, but I just can't question it.
That's just the way that one goes.
Okay.
So let's go back to your favorite articles instead of the popular ones.
You have number one as embedded rules of thumb.
Is that like, you know, do stuff? Never start a land war in asia yeah what what is the most
important embedded rule of thumb i would say it changes depending on what problem i'm facing but
my current favorite is a rule of thumb from jack gansell which is it's easier and cheaper for you
to completely throw away and rewrite your 5% of problematic functions
that you're always trying to fix the bugs in than it is to try to go back and incrementally fix all the bugs as they come up.
And I think that's an underestimated and underused tactic for cleaning up code that's bad,
is just throw it away and start over if it's bad.
You'll do it better the second time because you have the problem fixed in your mind and you have the flaws fixed in your
mind as well. And, you know, you're just going to come up with a better design than the initial one,
which you weren't necessarily aware of all of the things you were trying to solve and account for.
I think that's great advice and also really tough to manage in large companies. What are you going to do this week? Well, I'm rewriting this section. We need to. No, you need to fix these bugs or yeah, it can be an uphill climb to refactor. But yeah, it's a sunk cost fallacy thing, right? It's like, oh, we spent all this time on this code. We have to keep it. But it's also taking all our time. And I've been through it. I've spent two months trying to clean up a bad module of code
where the entire team acknowledged it was bad
and we knew it was the source of all of our problems,
but we just wouldn't commit to redoing it,
which would have taken a week instead of the two months
it took us to squash all the gopher bugs that started popping up.
In this Rules of thumb post,
you have lots of references to other people.
How do you find this information?
How do you find people who give you good rules of thumbs?
Thumbs?
Thumbs?
Rule of thumbs.
Yeah.
Is that where the strongest thumb wins?
Never start a thumb warning. That's the iron law of thumbs. Oh, okay. Yeah. Is that where the strongest thumb wins? Yes.
Never start a thumb warning.
That's the iron law of thumbs.
Oh, okay.
I get them all mixed up.
Well, I can answer your question in a broader scope,
which is when I was trying to become an embedded systems engineer, I learned on the job, essentially.
And I didn't know where to look.
So I was reading different papers, different articles, looking for other embedded engineers who were putting out quality content, which is hard to find.
And I've read thousands of articles over the past 10 years.
And some pieces of information keep coming up over and over and over again.
And it's like a simple rule of thumb that you'll hear repeated everywhere might be something like comments should not talk about what the code's doing.
It should talk about why you're doing it that way.
So some of those ideas start to stand out over time. And as I've gotten more experience, other ideas that I'm
encountering just ring true, such as the throw out your bad functions and start over. I've spent the
two months dealing with that. And so as I encounter these situations, I just started noting things
down. And mostly they were personal notes. But for the rules of thumb article in particular,
when I had like, I don't
know, 50, 100 rules of thumb in here, I was like, okay, maybe other people would find them either
useful or amusing. Applying them is hard because a lot of them are derived from experience and
you'll probably only agree with them if you've had the experience that gave rise to that particular
rule of thumb. So, whether they're actually useful to people is,
I think, somewhat questionable, but they're certainly very amusing to those of us who
have been in the trenches. You don't have one of my favorites on here.
Every sensor is a temperature sensor. Some sensors measure other things as well.
That's a great rule of thumb. I'm going to add
that. Okay. So you have your rules of thumb, which I agree, a lot of these things you don't
get until you have experienced it. And so it's only in retrospect that you can
use that information. You can't pre-apply it. On the other hand, one of your other favorite articles is improving software with five processes you can adopt this month.
And you go through Jenkins and Builds and...
Let's see. Let me actually do all five.
Fix all your warnings. Yes.
Set up static analysis support, like Clang or PCLint. Measure and tackle
complexity. That's hard. You're going to make me do that in one month?
It's hard. You could start doing that in a week. You could find the problems within a day. You can
get complexity analysis set up in a day. Are you going to fix
it all in a day? No. But you also won't know there's a problem unless you're looking.
Create auto-formatting rules. Why is formatting such a big deal?
For me, I propose auto-formatting because I don't think it's a big deal and I wish teams
would not talk about it as much. And so if you have an auto-formatting system, it's a way to avoid the formatting
arguments. Because I've been a part of too many code reviews that don't focus on the actual
implementation at hand, but only focus on the tabs versus white space argument or the way the braces
are indented. And so for me, this is a strategy to eliminate the
bike shedding that happens with formatting and to help teams focus on the stuff that actually
matters. And you also suggest doing code reviews, which is another thing that I think probably takes
longer than a month to even start. How do you do code reviews given that you kind of work by yourself?
Code reviews are the thing that I miss the most about working in a company. I would say that
that's really has been a major boost to just exposure to new ideas and learning different
approaches to solving problems. So now that I'm a contractor, I try as much as possible to
work with my clients on reviewing the code that I write, which is a surprisingly difficult task.
It's like they don't care.
It's like they don't care. And I've even had plenty of deliverables where I know they don't
care because then three months later, they told me they didn't use it, which is always a sad thing.
But people pay a lot of money for code they won't use, and I will never understand that.
Yeah, I've been paid a lot for code that people resented.
Like, oh, I could have done that. Like, okay, you should have done it.
You know, I'm just a consultant. I did what you told me.
Right.
I have thought sometimes that the consultants that I know should band together to do code reviews exchange, but then you have NDA problems and all of that.
You can't do that, but I do, I miss it. And it is so often that I will do a code review for a client and at the
end, they'll be just like, why did you do this? We could have done a PowerPoint in five minutes.
And I'm like, but you needed to see the code. I have to say in my current situation,
you have like 25 people. You might not miss it.
If you are going through what I go through every week. There is such a thing as too much code review. There's, yeah, there's a sweet spot and
finding that sweet spot I think is the challenge that takes more than a month.
Yeah. Especially. And it takes, it takes discussion and understanding of what a code
review means, what's expected of the reviewer, what's valuable and what's not valuable, what kinds of comments are going to hold up your code or what kinds of comments are things that should be tackled later, and being really clear about all of that stuff.
Because when that's not clear, it's just a free-for-all and people are just throwing food at your code and you don't even know necessarily what things to address
and what things not to and where the important parts are.
So I don't have any experience recently with that.
You have some best practices for code review
and some discussion of the social aspects of code review,
which is often an adventure.
My code is not me. My code is not me.
Your code is not you.
I'm not criticizing you when I criticize your code.
I'm not saying you're stupid.
I'm not saying any of those things.
I'm just saying this code could have been better.
Why are you pointing at me?
I'm just having flashbacks.
Yeah, there's ways of being neutral and still giving comments.
I actually started going to a writing group about a year ago, not necessarily because I really need
help with my writing, but because I wanted to learn how to give criticism better. And that was
what the group focused on. and so I could watch people
give better updates better feedback and that has helped me with code reviews for other people or
design reviews usually and it's not just that I am naturally sort of blunt or naturally sound angry to people, even when I'm not.
It's just the, we aren't cogs.
We aren't replaceable with each other.
We do have feelings and code reviews are hard on feelings.
What are some of the things that you learned from your writing groups about giving
feedback? You can't give too much. I mean, you can't, no, I phrased that weirdly.
I can't. There is too much feedback. If you give too much, none of it will be taken. So you don't necessarily want to give only three pieces of feedback, which is
advice I've heard from other people. Because in code, you can't do that. I mean, more than three
pieces per line is too much. But sometimes you have a big file, you have a bunch of comments.
But you do have to figure out if you can,
instead of pointing out each error, point out the thought process error. Instead of circling
where all of the commas are wrong, you tell the person, this is how you use comments,
commas, and this is how I would like you to use commas. And so you give them the tools to improve
for the future. That may mean this piece, this code, this writing doesn't get the benefit,
but you're helping them become better as a whole instead of making them feel criticized for
circling all these commas and they don't even know why. They don't have a tool to use in the future.
And so that's the thing that I have been a lot more focused on with code reviews is,
okay, there are things wrong here, but what can I say that will allow them in the future
to not make these mistakes as opposed to just fix these right now?
And I think another way to complement that approach is, and this I think
applies more toward code reviews than writing, is I don't think people ask enough why the person
implemented it in that way. So if I think that there's a problem with the implementation,
it would help me significantly to understand why it was implemented a certain way, because I may be criticizing something that I also don't understand, which isn't productive at all.
And if I'm a senior engineer talking to a junior engineer, you know, there's an inherent power and
respect differential that's going to play into that. And I think that we're not always as aware
of that. And asking for a thought process can help slow us down
and prevent some of these easy to have happen mistakes from actually happening and if somebody
figures out their own issue before you have to point it out then the criticism doesn't come from
you and so they don't have to get defensive at you. Yeah, I would just say that that's important, but also has to be phrased correctly.
Oh, yeah.
Like, could you explain your thought process is great.
I'm confused why you did X.
It's not great.
Yeah, that's the famous consultant's line.
We receive a lot.
And actually, that does apply to the writing group.
We do that a lot.
I don't understand why your character does this now.
Could you walk me through your thought process of making this stream of consciousness instead of some reasonable writing style?
Of course, you only say the first half of that sentence.
And do you prioritize your code comments? That's always been the big one for
me that I used to do for the writing group and they don't care anymore. What do you mean by
prioritize my code comments? Well, I separate them into nits, which are things that I need to tell
you about, but you don't need to fix. Maybe, which is things that I think may be wrong, but I'm not sure.
So if you agree, please fix it.
Ought to, which is this is wrong, and if you're not going to fix it, tell me.
Bug, which is this is very wrong.
Stop right now.
Don't proceed here.
Either we need to talk about this or we need to file a
ticket or something. And then I also have kudos, which is I really liked this. And I really try to
use kudos because we forget to say, I mean, it's a review. You do want to encourage the good
behaviors too. I absolutely do that. I absolutely do that. And
I learned that from my first manager at Apple, who I went on to work with at a startup following
that. And he was the most excellent reviewer, particularly because he would give you the kudos
and particularly because he would ask you why you did something before he explained another approach.
And usually it was, I think, his style was, I think there would be a different way to consider
this problem that you might try, such as X. And then that's much less personal. And every team
I've worked on sort of has different categorizations of comments, but I think everything you listed is certainly,
you know, I tend to bucket my review comments in that way.
And I usually summarize at the end of a code review,
just to be clear,
the ones that I expect to actually be fixed before something should land.
Yeah, that helps.
Because it is so easy to just have
little nits all over the place that nobody
really cares about but need to be documented because it isn't following the style guide
i don't want to talk about the style guide but we have one so use it that sort of thing
usually it's we have a style guide that nobody looks at, but you should use it anyway.
Yes.
Two style guides.
Okay, so what are some of your other favorite posts?
The other posts I enjoy a lot are the series on implementing dispatch queues with RTOSs because I think it's a really interesting concept that can clean up a lot of
threading problems that I see across embedded systems. And I also am really proud of the Boeing
article, which started as a one paragraph newsletter update and ended up as a 25 page long harangue about that whole saga. Okay, this is the Boeing 737 Max thing,
where there was a sensor,
and you had to have software to turn off the alarm,
you had to pay extra to turn off the alarm,
and it crashed, and it was bad.
Most of that's wrong, but okay.
Okay.
So tell me what I should have said about it.
What is it about?
There were two crashes involving the 737 MAX with, as you said, seemed to be related to a software system that behaved in a way that the humans didn't understand and prevented them from controlling the plane in
a way that they didn't understand is the base problem. As the situations unfolded, there's a
lot of other contributing factors that have come to light, such as Boeing and Airbus are competitors,
and Airbus announced a new airframe where they were going for improved fuel efficiency,
and Boeing needed to respond to that so they didn't lose business to Airbus.
This competition leads to specific timelines and cost targets and certification requirements
to minimize the cost and training time for airlines,
and a lot of corners appear to have been cut through that process. So Boeing could
maintain their type certification and not have to have pilots go through expensive training to fly
this airplane. The goal was, this is your grandfather's 737. You'll be able to fly it the
same way that you've flown every other 737. The way they achieved fuel efficiency increases was with bigger engines,
which they had to mount differently on the airframe, which caused different aerodynamic
behaviors, especially with regards to stall angles. And so the software system was implemented
to prevent pilots from entering into an angle which could cause the airplane to stall.
That seems to be a big point of contention on whether that should have been allowed,
whether there should have been a type rating change, and how Boeing implemented that specific software also appears to be problematic. There are two angle of attack sensors on the plane,
one on each side.
And the flight computer, which has this new software that prevents the pilot from entering too high of an angle, only reads from the currently active side of the plane.
So if the co-pilot's flying the plane, the co-pilot sensor's red.
If the pilot's flying the plane, the pilot sensor's red. And as you mentioned, there was an optional upgrade that would check both and let you know
if the sensors disagreed. For both of the crashes, it appears that there was a faulty
angle of attack sensor reading off by dozens of degrees, which caused the plane to...
Yeah, dozens of degrees. And they're known to be off or to be easily damaged, which is
another ding against the implementation, I think, is if you have a sensor that is notorious for
being off and you need to disagree, you should probably check both. Or if you're a pilot,
you know, if my AOA sensor is saying, is telling me it's 20 degrees off and I look out
the window and I see that that's not true, that I'm not going to change how the plane's flying.
But a computer getting one data point doesn't necessarily know what's happening and is just
going to make the decisions it's programmed to make. Okay. I don't understand because when I
worked on aircraft stuff, it was just for little planes and we often had to
have three sensors agree or they would vote for things like horizon sensing which is very similar
to angle of attack this should you're not boeing this should have been they should never have been
allowed to certify anything with a single sensor well yeah the certification process is under the under the microscope too oh yeah yeah
and they appear to have been able to take this approach because of a specific
um i don't know the correct aviation term so I'm probably going to get this wrong, but say the failure rating. So it wasn't rated as a catastrophic failure if the sensor is wrong because the pilot could still theoretically control the plane.
Theoretically.
And they were allowed to self-certify some of this, if I recall correctly. Yeah, for the past 20, 30 years, the FAA has been increasingly outsourcing much of the certification work to the different manufacturers.
And so Boeing is responsible for a lot that the FAA wasn't aware of some of
the changes that Boeing made to how the MCAS, this is the software under scrutiny, the MCAS software,
it was changed after the certification process happened. And so when Boeing released data after
the second crash, the FAA was like, hey, we didn't actually know that the plane could
be controlled to this degree by the system. So that's also another eye-opening problem that's,
I'm sure, being looked at by multiple parties. One of the things I liked about your article,
in contrast to another article that was going around the Internet around the same time for my triple A spectrum was that you you seem to correctly characterize it as a multi system, multi organizational failure and not just a the software. Part of that, I think, is we have this idea that software boeing can say we corrected the software we fixed
the problem and a lot of the other systemic problems still exist and the factors that led
us to this still exist right so the it almost seems like the software is being used as a
distraction to the the rest of the problems that have happened under the hood. Yeah, they can fix this and then they can go on to the next
bad decision that does something similar.
Right.
How do we, as a software person,
even if we don't blame the software,
because I agree that this is a systemic business
and environment and quality control and management problem.
But as a software engineer, I do feel like at some point you have to stand up and say,
wait a minute, if this, this, and this happens, we could crash planes.
And I'm not having that happen on my watch.
Is it realistic that you would even know
when a system is complex? Would you be responsible for such a small area that you might not know how
it's all interacting? It's true because it had to do with both the autopilot and the sensor system.
And if I was the sensor system engineer, I would figure that the autopilot people
had their crap together. And if I was an autopilot engineer, I might not know how broken the sensor system is. I see that a lot, even outside of Boeing and organizations I've been a part of.
It doesn't seem like in most organizations, there's a steward of the system as a whole.
And the difficult thing about complex systems is if you take all of their pieces and you
just look at your piece, you can't really extract the behavior of the system out from all of the
pieces. You have to be thinking about the entire thing as one thing that behaves on its own and
probably is going to behave in ways that you're not expecting. And in the case of the MCAS software, I do think
that, yeah, if you're on that team, then it probably would have been the right idea to say,
okay, well, I need more than one reading to really verify this. On the other hand, maybe as the
software developer, you don't actually know the inherent flaws in that particular AOA sensor or
the problems that are seen in the field.
I've certainly seen a lot of disconnect between what the engineering team believes about a product and what the support team believes about a product, how customers use it and what the bugs are.
That is so true. And it isn't even just the support team. Sometimes it's the customer. You're
like, oh, no, we never meant for it to do that. I'm glad it was working for you.. Sometimes it's the customer. You're like, oh no, we never meant
for it to do that. I'm glad it was working for you and now it's broken, but that was never the goal.
What is the most important thing to learn from the saga? I mean, is there like one thing you
can take away? The architecture, there should be a system architect or a systems engineer
who has a view into the whole system.
Are there other things you think we should take away from that?
I don't know that I have answers that are satisfactory, but I think there are lessons
that we can learn that should drive ways that we handle this situation in the future. We're starting to build systems that are so
complex that whole organizations cannot accurately grapple with the consequences of the behavior of
the system or their decisions with relation to the system. And that's not going to change.
Everything we're building is getting more and more complicated. Somehow we're going to have to get a handle on that.
And I don't necessarily know the right way to do that.
And I think the other key takeaway is it's not like Boeing's unique in the decisions
that they made that led them to the outcome.
They're unique in that the decisions that were made led to two crashes, which resulted
in the deaths of 346 people.
And I work with dozens of organizations that make very similar compromises. They make very
similar mistakes. And sure, you might ship your widget and there's no real critical customer
impact, but it's not like Boeing's doing something that every other company isn't doing in
some way. And so it's easy to point fingers and say, oh, look at Boeing and their bad behavior.
But I think we should also take a look within our own organizations and see how we're cutting
corners and what are the consequences of the things that we're focusing on? Because you're
probably not thinking about that. And somebody probably should think about that.
Yeah, risk and hazard analysis are two things that most people don't do.
And just in terms of security with IoT devices and things,
that should be something that every company does.
Because every time you build something like that,
there's a potential for something catastrophic.
You're not going to crash a plane, but you might expose hundreds of thousands of people's private data or whatever.
Or even like some of the alleged Nest hacking attempts where you have a baby in a room and the heater's just turned on full blast.
Whether or not that one is actually substantiated, it's definitely the kind of possibility when you skip out on security for connected devices. questions to put on your risk list by saying, okay, let's pretend it's two years later and
whatever the worst possible thing could happen has happened. How did it happen? And for Boeing,
that actually should be part of their process of let's pretend that a plane crashed because of the software. How did this happen?
And work backwards and then try to minimize the risk at each stage.
That is not that hard of a process to do when you're working with software. I mean,
it doesn't have to be that critical. It can just be the worst thing that happened was we lost all our client data or the worst thing that happened was
someone got shocked. And so you work backwards. How would you get shocked from
the light switch? Okay, well, maybe we shouldn't make it out of metal and electrify it.
You know, you're saying we need black hats for, for trying to crash planes. You need black
hats for the risk assessment. Um, so many of us go forward. Uh, I didn't think about reverse
engineering for a long time. It was something somebody else did, but then I got into a BBA
and I realized it is the same thing I'm doing usually.
It's just debugging something I didn't write, which I do a lot of anyway.
But it's the same process of, it's backwards.
It feels backwards.
And yet it is just as important as what feels naturally.
And it should, with practice, it doesn't feel backwards anymore.
Did that make any sense?
Yeah.
Yeah?
Philip didn't say yeah, but I think we're going to go with it.
My question for you is whether you think we're running up against, just call it a fundamental limit of human psychology,
and how do you train teams to think in a way that's sort of antithetical to the way that we all think by default? For the most part, we're all thinking of the happy case, right? We're all going to make it
rich and we're only going to pick the things that we need to pick to get to the part where we're
going to be rich. And all the things that are going to cause us to not be rich, we'll deal
with them when they come up. I think that's the human tendency. So I wonder if there's a way to
work around that and instill that kind of culture where you are looking at the risks, even though it's not natural and it's not fun and it feels like a step back from getting you towards your end goal.
Yeah, because sometimes that step back, you realize, oh my God, this path doesn't work.
I can't do security on this chip.
And it just becomes untenable and terrifying.
And so you have to give people time.
You can't just ask for the answer.
You can't just ask for the happy path.
You can't just ask for people to fill out all of the features.
You also have to ask them to think about the system.
And that is not something we get a lot of time to do as developers. And so it's sort of time we
have to take and say, look, this is part of my job. This is part of what I do to be a developer,
to be an engineer, to be a programmer, whatever
word you want to use. Even as a hacker, I would like people to consider, okay, I have hacked the
nest to do bad things to people's houses. What's the worst thing that can happen to this? For me,
it was just a fun thing. But if I put it online, what's the worst thing that happens?
And you do take some responsibility for the consequences of your actions.
Christopher's making faces.
I don't know.
No, no, because I'm thinking about QA, right?
And engineers don't test their code well.
No, because we think of how it should work.
Well, we think of how it should work, but also we know how it works.
And so there's an inherent blind spot there. Yes. And so whether or not you're consciously
avoiding doing things that you know might break it because you know how it works or whether that's
unconscious or whether you're just not creative enough at creating something and breaking it.
Yeah. I think that's probably not going to be, it's not going to be super effective to have the engineers doing that.
Because this is why we have QA teams and dedicated testers, because they're better at coming at something with a fresh mind and finding ways to break it.
So I think the same thing applies to risk analysis.
You kind of need somebody who does that, at least for a significant portion of their time, and isn't just an engineer trying to turn their brain upside down for a day a week.
I mean, this is the security model, right?
There's people who are penetration testers and people who, you know, consult on here's what's wrong with here.
Here's what I'm finding are holes with your system.
And they're not the people necessarily writing the code.
And maybe that's what's needed in some organizations where the risk is high is to have, you know, dedicated adversarial people not testing necessarily, but doing what we're talking about, you know, imagining the worst case scenarios and trying to construct a way to get there.
Okay, I have a new idea for business.
We're going to call it Agents of Chaos.
And it's going to be all risk assessment.
Well, wasn't that Netflix's Chaos Monkey thing
where they just deliberately injected automated breakages?
And then they made sure that their system was robust.
But they did that on production.
Chaos Monkey still runs.
Yeah, I can see that sometimes.
Oh, Philip, I'm sorry.
We do sometimes end up talking to ourselves.
It's okay.
It's fun to listen to.
Let me get back to one of the articles.
Actually, we have a question from a listener,
Krusty Auklet.
Krusty Auklet, really?
I've met him. Is he Krusty
or an Auklit? No, no.
Okay.
There are a few articles on Embedded
Artistry, your site, about
retargeting modern C++
using FreeRTOS and
ThreadX.
Krusty Auklit is curious
what you think the pain points are
and or the limits of GCC and Nulib.
For example, it seems like thread,
standard thread,
is not great since you can't tweak
the stack size of threads.
What? You can't?
That's really important.
You cannot.
Okay, that was a pretty long question.
I will just leave you to answer it. Go ahead.
So I'm going to start my answer by prefacing that all of my retargeting C++ experience is using libc++, which is Clang's implementation.
So I don't actually have any comments on the GCC C++ library. But overall, it's very clear to me that the C++
standards committee implemented the threading library support simply based on how Pthread works.
And there's almost like a direct one-to-one mapping for the API set, including like, you don't need
to set thread stack sizes in Pthread because there's a pretty good default thread stack size. And like, you really need to know what you're doing if
you're modifying that. So as the reader mentioned, like, that tends to be a problem. The other types
actually work pretty well. mutex is relatively straightforward to implement. Using it makes
sense. There's a lock, there's an unlock, there's all sorts of useful
helper types that you can lock a lock when you enter a function, and no matter where you return
from the function, it will automatically unlock. Using that is wonderful. Implementing it,
the only difficulty is because of this pthread C++ standard library dependency,
the standard mutex constructor has to be a constant expression,
which I've never actually seen apply for any RTOS or other OS that I've used.
And so that tends to be like the one tricky thing.
You have to check before you lock your lock whether you've initialized this mutex
and if not, call some initialization code.
That's some unneeded overhead.
Yeah.
But, you know, using it once that's done and doing that is straightforward.
Condition variable, also easy to use once it's implemented,
but implementing a condition variable can be a tricky problem space
if your OS doesn't have a primitive already for that
because the obvious approaches to handling it
lead to logical flaws.
So there's a great Microsoft research white paper,
which I'll send you the link to,
and you can add to the show notes,
which goes through an algorithm
for actually implementing that,
where you need a semaphore and a queue
where you keep track of threads
waiting on a condition variable and the
ability to directly suspend and resume threads. And so if you have those things, you can implement
the condition variable in a straightforward way. Wait a minute, let's go back. A condition
variable tells you which threads are running, the state of the thread? No. A condition variable
is a concept where I want to sleep my thread until a specific condition is true.
And that might be waiting for a variable value or some functor to return true or any number of things.
But essentially, when whatever my predicate is returns true, I'm going to wake the thread and resume running. Okay, so a condition variable, for me, when I think about not using this sort of thing,
it's the global variable I set in the interrupt
that tells the loop to go deal with the data I just collected.
That's more of a single flag.
So a condition variable could be a set of flags
or a value that you're waiting for, right?
Yeah, I think it's more in line with like an event group
where my thread is actually sleeping until some condition which I specified in the flag get call
is true. Okay, so it could be a series of flags, or it could be a series of states happening,
things being ready. Okay. Let's see.
So that's condition variables.
So then for thread, like the reader mentioned,
implementing the threading support
is relatively straightforward,
but the APIs are very limited.
So if you need to create a thread
and you have a good system default stack size
and good system default priority,
and you need some generic concept
where just an average old thread will do then it works well but you know unembedded systems you
often want to control your thread priority because i want my thread that's handling my interrupt
callbacks to run before anything else and you might have different priorities for different
hardware component managing managing threads.
And stack sizes are definitely variable.
My LED blinking thread doesn't need a 128 kilobyte thread stack.
So we often want to tune those values.
And my particular solution has been to, I have a, I guess, a separate set of C++ threading concept APIs, which lets you tweak
all of those things. And I'm currently working on validating those APIs by using them for Pthread
and ThreadX and FreeRTOS. And I'll probably pick two other RTOSs to make sure the abstractions
work. And I'll release those as open source interfaces if you wanted a good
C++ RTOS abstraction layer. And then you can always go the route that I went, which is
I have STL support for their types based on these APIs I wrote. And so you sort of get the best of
both worlds. If you have a C++ app developer who wants to use standard mutex in the app layer of your firmware and
doesn't actually care about setting all the other little OS bits that might come with
that, like priority inheritance settings, that really works well and is a huge time
saver and sort of opens up the possibilities of who can work on your system.
Do you have examples of where this might be used?
As you write tests, are you
writing little demo programs too? I have, I have a general, I guess, framework demo thing that I'm
working on, which is sort of nebulous at the moment. I don't actually know how to talk about
that to answer your question in good details. Oh, that's fine. I'm just, some of the concepts you've talked about,
I know what they are, and yet I'm still a little on the fence of, okay, so if I have FreeRTest
running and have C++, how do I glue these Lego blocks together with your code?
The Clang libc++ has a nice external threading abstraction layer. So there's
a set of functions that if you wanted to use the standard C++ types, you just need to supply an
implementation for those particular functions. And so if you wanted to use my code, I have an
external threading header that maps directly to that. And I will be, in the future, releasing some demo apps that do show how to use that
and how to hook the different pieces together.
But that is under development.
Okay, so I have more questions about various blog posts and going more in depth.
But we don't have that much time.
Okay.
So I want to go back to the question of,
you have this website full of pretty useful information.
Why?
I mean, you're a consultant.
You get paid by the hour, I assume, unless you do a lot of fixed bids,
but most of us don't.
So, yeah, why? I mean, you do a lot of fixed bids, but most of us don't. So, yeah, why?
I mean, you could be making money.
I actually started the website before I started my business.
And I alluded to this earlier in the episode.
When I was learning about embedded systems, there wasn't really, there wasn't a class that I could take,
there wasn't a good book to turn to that I could find, there weren't really good websites on it.
And so most of the aspects of how to write firmware and how to debug embedded systems and
do all of the various things I needed to do, I had to do a lot of research for I had to bug senior engineers and try to figure
out how they did it. I had to, you know, read through 1000s of pages of data sheets to figure
out how something worked, read through the ARM processing, architecture requirements to figure
out how arguments were passed, all sorts of things like that. And you slowly piece together
various aspects. And I'm a
pretty prolific note taker. So I ended up with thousands of pages of notes. And one day it just
hit me that I'm probably not the only one in the world struggling with all of this stuff. And
the internet today is a bit better in that you can Google things and get a lot more answers. And
there tends to be more embedded resources. But even those, I think, are geared heavily toward
the hobbyist and maker space. And so I just started going through all of my old notes and
cleaning them up and adding some examples and having my wife edit them. She worked in publishing
for four years. So that's a handy person to have on your team. And just slowly releasing those.
I mean, over time, I think the first year, 200 people read my website for the entire year.
And now we're hitting 50,000 people a month.
So it really seems like there's a need.
And I really derive a lot of enjoyment from taking what I learn and the things that I'm struggling with and that I'm researching and
just writing it down and publishing them and then seeing that other people find that useful and
helpful. I understand that feeling. And I still get a little confused about it.
I mean, it is time. It's time we could be doing other things. Why are we spending our time helping other people for free? Do you ever, do you worry about that? Or do you just enjoy the sharing enough? I would like to find a way to make money doing that because I enjoy that. I enjoy
publishing the information and helping other developers learn a lot more than I enjoy doing
consulting contracts, which quite honestly end up being the same thing with a different
set of requirements, maybe. But most of that work is the same. And that can get old
after a while. And the exploration of the unknown is certainly more exciting and helping people is
certainly more exciting. So I have struggled on how to find a way to have more of my income
come in from doing the website without relying on ads. Because, I mean, you look at the big embedded in
electrical engineering news sites, they're mostly product placement ads. And it's really disappointing
to me. Even the articles, it's just super disappointing to see. And I don't want to
go down that route. But I understand why they do because you get offers, you can make money, but it's just that part would be sad to go
that route. So it bugs me a lot to answer your question. And something I think about a lot how I
could make more income from that and be able to focus my, you know, my daily work activities on
it. You already spend quite a bit of each day working on the posts.
About how long does it take per week? I spend, I would say, an hour and an hour and a half to
an hour and a half every morning researching or writing. Depending on the length of the post,
most of them take five to 10 days plus four hours of editing. So let's call that 20 hours, you know,
20 hours, I think on a post of any significant length. So that, you know, it certainly adds up,
like you were saying, time is money. It is tough. It's tough to have that mental
shift of giving things away. But it is sort of advertising, isn't it?
Do you consider it advertising for your consulting services?
I do.
I've certainly received work from my website.
And I would actually say the most rewarding projects
that I've worked on came from my website
because they were from clients
who shared the same values that I shared
and who cared about what I had to
say rather than just thinking that I was some generic firmware developer contractor who would
sit in a desk for them and type code at a specific number of hours a week. So it certainly has had
it, it's paid its dividends beyond just, you know, getting enjoyment from it. And it does land me
work and introductions,
and I meet a lot of great engineers who reach out to me about the site.
There's all sorts of nice network effects that come from that.
Yeah, I understand that.
Have you considered writing a book?
I would love to write a book,
but I'm not sure yet what I need to put in a book that I can't put into an essay.
And for me, that's the bar.
And I've had some publishers reach out to me about publishing a book on embedded systems or embedded C++,
but they always seem to have strange, almost startup-esque deadlines where it's like,
we need you to write this book in three months and we're going to pay you $5,000 to do that,
which is a full-time job and who knows if that could actually be done in three months, and we're going to pay you $5,000 to do that, which, you know, is a full-time job. And who knows if that could actually be done in three months. And $5,000
certainly isn't enough to live on for three months while you're doing that work. So, that's also
been presented, but I haven't found the right thing yet.
We should talk more offline.
Would love to. I do think I should
let you go. We are about
out of time. And now
I get to ask you this
final thoughts, questions, or
last thoughts. People didn't like...
Last thoughts? That's even worse.
On my way to the grave?
Philip, before you
go,
do you have any thoughts you'd like to... Do you have any last words last words do you have thoughts you'd like to leave us with um yes i would like to say that i publish a monthly newsletter
it comes out on the first monday of every month if you're interested in keeping up with
ongoing developments in the embedded systems industry, you can sign up on our website at
embeddedartistry.com slash newsletter. A modified version of our Boeing essay is going to be
published in the June edition of the Software Quality Professional Journal, so keep an eye out
for that. And I want to end on a serious note, so, you know, I'm going to get real for a second, related to the Boeing discussion we had,
which is an idea that's just been sitting in my mind since I've been researching the Boeing 737
MAX crashes. And it's rooted in my studies in philosophy and psychology. And it's a simple
idea that I think all of us really understand deep in our bones, which is where we're aiming at is the most important factor
in determining where we're going to end up.
And I'm not going to say what the aim should be.
It's certainly up for debate.
But the goals and the values that we select
and that we focus on change how our brain sees the world
and how we make decisions.
So I just want to ask all the listeners to take a few moments and think about where you're aiming
in your life and where your organizations are aiming. And are you happy with where that's
going to end up? And if you're not happy with the consequences, or even if you are,
I hope that you'll take the time to pick a new aim, raise your aim a little
bit higher. And I think we'll all be pleasantly surprised at the positive outcomes that that
will have on our own lives and the lives of the people around us. Very nice. Our guest has been
Philip Johnston, founder and principal at Embedded Artistry, an embedded consulting firm in San Francisco, California. You really should check
out his blog, embeddedartistry.com slash blog. Thank you for being with us, Philip.
Thanks for having me. It was really a pleasure and I hope we get to do this again sometime.
Thanks, Philip.
Thank you also to Christopher for producing and co-hosting. And thank you all for your
patience over the last few weeks as we missed a few shows.
Also, thank you for listening.
You can contact us once again
at show at embedded.fm
or hit the contact link on embedded.fm.
Welcome back embedded.fm.
Now a quote to leave you with.
This I am getting from
Philip's Rules of Thumb post,
and it's from Jacky Cancel.
It's one of my favorite quotes from him.
Study after study shows that commercial code,
in all of the realities of its undocumented chaos,
costs $15 to $30 per line.
A lousy thousand lines of code, and it's hard to A lousy thousand lines of code,
and it's hard to do much in a thousand lines of code,
has a very real cost of perhaps $30,000.
The old saw, it's only a software change,
is equivalent to, it's only a brick of gold bullion.
Embedded is an independently produced radio show that focuses on the many aspects of engineering. Thank you.