Embedded - 335: Patching on the Surface of Mars
Episode Date: June 25, 2020Joel Sherrill (JoelSherrill) spoke with us about choosing embedded operating systems and why open source RTEMS (RTEMS_OAR) is a good choice. Embedded #307: Big While Loop: Chris and Elecia talk about ...when and where they’d use RTOSs Embedded #93: Delicious Gumbo: Joel gave an introduction to the RTEMS RTOS Joel works at OAR Corp (oarcorp.com) on RTEMS (rtems.org). RTEMS runs on many development boards including the BeagleBone, Raspberry Pi, and two FPGA boards: ARM ZYNQ-7000 and the Arty Board. Joel recommends the operating systems book by Alan Burns and Andy Wellens. It comes in many flavors and editions including Real Time Systems and Programming Languages: Ada 95, Real-Time Java and Real-Time C/POSIX (3rd Edition). NASA Core Flight System (https://cfs.gsfc.nasa.gov/) Experimental Physics and Industrial Control System (EPICS) (https://epics-controls.org/)
Transcript
Discussion (0)
Welcome to Embedded. I am Alessia White. I'm here with Christopher White. Last October,
in episode 307, Chris and I talked about how and when to use an operating system. Of course,
we were wrong. We have only what we know to work with and other perspectives are valid and interesting so i am
super happy to welcome joel cheryl of artem's back to talk about our tosses hey joel welcome back
hi thanks for having me back it has been 200 episodes since we've had you. Is that a unit of time? It is to me. Oh my, that is a long time.
Could you tell us about yourself?
I'm Joel Sherrill.
I've spent the past three decades, at least part-time, working with Artemis.
But I've also worked on a number of open standards in aviation and space and other domains.
So basically doing a lot of systems type things. A lot of that's fed back into RTEMS with support for like POSIX standards and for the FACE technical standard.
Okay. And if people haven't heard of RTEMS?
RTEMS is a, the simplest way to think of it is if Linux was the open source alternative to traditional Unixes, RTEMS is like the open source alternative to VxWorks.
That sounds like a tall order.
But RTEMS has an enormous number of features, including about 85% of the full POSIX API. It includes the free BSD TCP IP stack, including IPv4, IPv6.
We have multiple file systems from everything from RAM to JFFS2 to DOS to NFS.
Symmetric multiprocessing support for about 18 processor architectures.
And when we say that, we mean ARM 32-bit is one of those 18.
So we've been really lucky to have been adopted by NASA and the European Space Agency.
So we are on a lot of their missions, especially their high-profile ones.
I think both of the solar projects right now are heavily based on Artemis.
So, I mean, we've been really lucky to have been a part of some scientific discoveries along the way.
And being in space, I mean, that's kind of a situation where an RTOS shouldn't go out to lunch.
No, and it's kind of interesting because we have the traditional open source history starting as a research project that was released.
And our users have tended to gravitate toward really long lifespan projects like building automation, building safety systems, space, large science, where your development and deployment lifecycle might
be decades.
So they've kind of taken on the attitude and adopted our philosophy that open source
allows you to kind of control your own destiny for a long period of time.
And that's actually led to our new effort, which is being sponsored at the moment by the European Space Agency,
to actually bring in the requirements traceability into the open source project.
So the European Space Agency, we know, has done three level B flight qualifications on Artemis,
but they've never shared the artifacts back with the community.
This time, we'll have the technical data required to help put it through a safety certification.
And I guess I should say, when I say level B, I'm thinking in terms of the DO-178 FAA flight
certifications, where level A is man-rated and level B generally means either there's no human on board, which is true for
science satellites, or I think, what is it, the plane will still land under control. I think it's
a serious hazard, but not enough to take the plane out of the sky, I think, is a general way to look
at it. I think, if I remember right, you've mentioned the medical ones run opposite of that,
right? Don't some of the other safety standards, A, is not as critical as B?
I don't remember they used those numbers.
I don't.
Do you remember?
I did FAA, so I actually don't remember medical right now either.
That's sort of embarrassing.
All of those safety standards, even though they have different kind of rules, they all boil down to the same core process.
Yeah, it's classes with FDA.
So class one, class two.
But, I mean, it's all about requirements, traceability, high coverage testing, proper process and review. And surprisingly, open source projects and the open source infrastructure
is pretty good at most of that, except the requirements part and the requirements traceability.
So we're unfortunately doing what a lot of projects do, and we are going back and writing
requirements for 30 years of history. But they're not terribly difficult because we were largely based on open standards.
So hopefully we'll see some requirements tracing tools from our project back in the open source.
Well, they are now.
The tools we have are completely available.
So I'm hoping this sets a bar for other open source projects to come in and try to improve things.
But it's pretty amazing what you can do now with open source tools
and GitHub or GitLab compared to what you could do 20 years ago. You can automate coverage testing
and things that were just impossible 20 years ago. So many things have changed. Before we get
more into operating systems, you listen to the show, so you are familiar with Lightning Round.
And somewhat terrified by it.
Thought you were going
to get away without it.
All right, Joel,
what is in gumbo?
In gumbo?
Okra,
tomatoes,
and something else.
But I usually like etouffee,
so I kind of lean toward
the brown roux with rice etouffee. So I kind of lean toward the brown root with rice and, and, uh, crawfish.
So how much, how much of that did I get?
I don't know.
I'm going to go with, you got it exactly right.
No, there's another ingredient. I can't remember what it is.
There's at least one other.
Uh, which Sesame street character best represents you?
I think probably Big Bird or either Ernie or Bird.
But I'm not that tall, so Big Bird's kind of out there.
But I generally try to be happy and have a good disposition.
And I don't have a bottle cap collection, so that eliminates Ernie, right?
Right.
Or Bird.
I think it was Ernie.
What planet or moon would you visit if you could, provided you survive?
Both ways.
Both ways.
The both ways is pretty important.
That one-way journey to Mars, wasn't that a European thing a couple years ago?
I think, what is it, IO that's frozen,
that we think there may be life in the seas.
I mean, that just sounds so much more interesting than some of the others.
But I've also seen some of the science fiction movies
where there's this giant reptilian snake thing
that jumps out and eats everybody.
So that is a little intimidating.
It was in the Europa Report, was that right?
I think that was it.
Yeah, that single camera blare witch project
in space thing favorite ice cream flavor i generally like the white bases with a lot of
chocolate candy in them so it's probably uh some kind of cookies and cream do you have a favorite
bug not an insect like something that went wrong. You shouldn't have specified.
You could have gone either way.
Do you have a favorite insect or computer bug?
My dad owned a pest control company and I still couldn't have answered that. bug that only worked because the string was allocated in units of, it appeared to be four
bytes by the heap manager we were using. So it only failed if your input string happened to be
the right exact number of characters. Otherwise you had it. And I spent about, somebody asked me
to step in and help. And I spent about half a day feeding different inputs into it until I realized that it was like 11, 12, 13, 14.
There was some combination on a multiple of four where it would work.
And then if you put the wrong number of characters in, it failed.
And it's like, ah, I know that one.
There's some inner block buffering, buffer space left.
And it was an off by one bug.
It was a pretty hard one to find.
A lot of that debugging is really something
that's hard to teach.
That's just something, if you've seen enough bugs,
you start to smell them.
8051 or 6502?
I'd probably lean toward the 6502,
but I actually used the 6800 series back in the dark ages.
So I was more of a 6803, 6809 type person.
Do you have a tip that everyone should know?
Assume you're going to make a mistake and plan for what you're going to do to find it and not let it cause problems.
Wow, thinking ahead.
Thinking ahead.
I had a friend that I picked on over the years, and when he worked on his roof,
he would assume he was going to drop something or like he was cleaning his gutters.
So he would cover his shrubs with a tarp.
So if he dropped something, it would be easier to find.
I mean, that's kind of perfect defensive programming.
It's assuming you're going to make a mistake, and how can I make it
easier to find the nail or the hammer in the bushes?
Genius. Or paranoid.
Still genius. So
last fall, last October, we talked about our TASAs and when we thought people should use them and we answered some questions.
Happily, we've forgotten everything we said.
So even if we were completely wrong, we no longer take any responsibility.
Exactly.
But you sent me an email after the show
and pointed out some things we missed
and I want to go over them.
But I think my favorite part was the end,
which was selection of an embedded OS is hard.
And so I want to ask you,
what makes it hard?
Well, it's almost like
you're at least going into a long-term relationship if you're not getting married to it.
I mean, if you pick the wrong RTOS, it could ultimately end up killing your project.
I mean, you could pick something where some feature that you need is not there, but you didn't realize it.
Or if you really make a boneheaded mistake, you find out it doesn't support your
processor or something like that, or maybe it doesn't have a feature and you're responsible
for integrating some third-party library, which I think now we've seen at least two
serious security bugs from third-party TCP IP stacks. So there's a lot of things that can go wrong. And I mean,
one of the things that longevity and support, there's one commercial RTOS that has been through
a few acquisitions. And there is a high-profile space project that has a unique version of that
RTOS. And there apparently is no one left in the company who actually knows about it.
So they actually go out and have to hire the people who used to work for the company.
So, I mean, you can find yourself for various reasons just in a bad spot
because of something you didn't see.
I mean, as a trivial example, a lot of vendors port GCC and the toolchain to their architecture, and some are responsible and submit the port upstream.
But sometimes when you download the vendor's toolchain, you'll notice that it's six or seven years old and it's never been updated, and they haven't submitted it.
Well, that's probably a hint that that's not a good toolchain to go with because the vendor doesn't even care that you have current tools.
So I generally encourage people,
you have to kind of evaluate the whole ecosystem. There's a lot of things that can go wrong.
I mean, you guys know that. You've seen plenty of things that have gone wrong over the years.
Yeah, I've been the cause of many of them.
And selecting something because it's new
and cool is almost always the wrong answer. But it's so shiny. I've worked on projects where new
and shiny is what you have to aim for, because if you've got a five-plus-year development cycle,
you kind of do have to take that risk, because what looks stable today is probably not going to be available in five or
seven years if that's how long your development cycle is so yeah there's risks both ways so
sometimes having the wrong RTOS can cause all kinds of problems and after learning the chip
the peripherals your application the tool chain it seems like learning a whole new RTOS is another thing.
It seems like a burden.
Well, if you're lucky, then you don't have to learn as much about that because the RTOS came with support for that.
And I know like with RTEMS, all of our toolch chains are based on the GNU tools or LLVM,
so they're quite familiar. One of the things when I teach an RTEMS class, I show how similar using
the debugger is embedded versus using it on a native program. I think the only thing that matters
is how do you connect to the target and how do you load the code. Once you're debugging,
it's exactly the same so yeah i mean if
you're switching after you've done a lot of development that could be a problem but if you
start out and you end up getting all the drivers and already you're already past the startup code
and you've got a working console and time passage and network stack, sure. I mean, that can save you the effort if the RTOS comes with that.
And I guess that's one of the distinguishing.
You guys were right.
Everything I think you said in the when do you switch to an RTOS was correct.
You kind of outgrow rolling your own tool chain or using some vendor-supplied tool chain,
and then you realize you need threads or a file system or a TCP IP stack or something else,
and you either end up being responsible for everything on your own
or you somehow either buy that support so you're not as responsible for it
or you find something open source that has an active community, so you're sharing in the
responsibility of that. In your email, you said
it just saves you from being solely responsible for something.
Is that a good thing?
Yeah, I think it is, because I noticed
just on the Artem's mailing list that people will just ask general questions and you'll get answers.
I've seen people ask questions about very specific parts and somebody will come back and ask, which revision is it?
And, you know, it does give you a place to ask knowledge. But the other thing I've noticed
over the years is the first, everybody gets so much time to make something work. So the first
person does the device driver, the second person may port it to a different architecture or optimize
it. And then the third person usually does the other of those. And then at that point, the code tends to be
almost incredibly rock stable for years.
So you get it for free.
So say you got two weeks
to bring up a driver.
If it comes up easily,
you'll probably improve it
because you got two weeks to work on it.
I mean, our DOS file system
is a good example of that.
The first person used it
for light duty input of data.
And a few years later,
somebody used it on a data logging project, which had really high data rate requirements. And they
tuned the file system to get those rates out. So you build on the community's work and you share
in it and you can share in the knowledge.
And believe it or not, nobody, most people in the Artemis community will never let you know what they're working on.
They use Gmail or some other anonymous email.
And and that way, I mean, so we may know you're on an arm zinc or some other board, but you're asking such generic questions that it's impossible to know what you're making.
I do like the idea of somebody else writing my drivers.
I mean, I personally like that.
I think one of the things that taught me that I don't like to do that is I had written three or four Z8530 UART drivers and made the same mistake in every one.
And I just decided I'd never wanted to do it again.
So it is, I mean, you have to buy into the fact that it's a shared experience and where you have
to buy the business case that your value is developing the application. And eventually,
it's hard with being a consultant because you move from thing to thing. Like when you built toys, your goal was to have a toy.
Your goal was not necessarily to have a generic analog driver or a toy OS.
So you can focus on where your value is.
I think that's a great point. It's one I've used at companies a lot, especially when people
come up with like, oh, we should try this, or we should write our own this, or hey, we thought
about doing this in this language, we could rewrite this whole thing. And my comeback usually,
which I try to convey to management is, okay, what is the company's job? What is it that we do here?
Do we make compilers? Do we make device drivers?
No, we're making X product.
And sometimes that's a useful framing, like you said, to make everybody take a step back and say,
okay, this would be fun to do, but it's not actually helping our end customer or the company to produce the product.
Right. And one of the things that's kind of what we've done with Artemis is
we've used a permissive license and have gently convinced people to contribute. And this has
worked over our three decades. And the goal is, is it really your secret sauce that you did a driver
for a commercial park that everybody in the world can drive or can buy? Or should you just share that?
And then hopefully you'll pitch in a little bit as people use it. But next time, the driver's just
there and it's updated to the next version of the OS. And there's a number of Artem's users who have
built up corporate infrastructure across multiple projects, and then they will pick whether
they're going to use, although these are old, a Coldfire or an ARM or a PowerPC board for
a specific application, but all of their code comes over because the core infrastructure
base is the same.
And that's really where you get the value.
You're not locked into one particular processor architecture. And we've seen a lot of
RTEMS users now with a requirement to host as much of the code on Linux as possible and RTEMS.
And that is generally quite easy to do if you just stay away from the things that shouldn't work.
Like, I mean, you're not going to run X11 on RTEMs, although we have graphics packages.
So if you stay away from
the things you probably shouldn't
be doing anyway in the embedded system,
your code moves pretty well.
It's hard, though, because I think people,
especially firmware developers, have this sense
that they're supposed to be writing device drivers.
That's embedded development, application
development. Well, that's for somebody else.
Right?
Right.
But I guess if you find it fun to write device drivers, I'm certainly not going to criticize you.
But I think it's interesting to help people on different applications.
And listening to you guys over the years, you've worked on a lot of different applications.
And I think that's where it's interesting and fun is to, you know, one day you're talking to somebody doing a medical device and the next day you're talking to somebody doing a toy and or, you know, going to Mars or something.
And every application is so different.
But the core of it is you read inputs under certain conditions, perhaps time constraints.
You do some computations and some control algorithms with some time constraints.
And you write outputs.
And you do your best to validate inputs and ensure you meet those deadlines.
And that's common across all of these embedded real-time systems.
And that's where picking the infrastructure can help you.
I mean, one of the things RTEMS has that is a little unique to it, it's been there since the early days, is we have support for making true periodic threads that you can use to perform rate monotonic scheduling and rate monotonic analysis.
They actually inherently ensure that you're scheduled on the period, and they keep up with how much CPU time
and how much wall time did you take each period, and did you actually overrun your period? And this
can be used to do rate monotonic analysis to do and ensure that all your threads that are critical
will meet their deadlines. And I don't think you want, anybody wants to build that infrastructure,
you just want to use it.
Yes.
Yes, so much.
I mean, there's probably a couple people who like building that stuff, but that's their application.
They're scheduler people.
Yeah, and I'd say that's one of the things that's interesting about our teams is we haven't focused as much on hand optimizations as we have on algorithmic optimizations. So we've focused a lot on the academic literature and algorithms that are either constant execution time or bounded by
either a design constraint in RTEMS or by a design constraint in your application.
So for example, one of the common ones is we can configure the number of priorities.
But if you've got 256 priority, a log2n algorithm has got a max of eight comparisons.
So that's bounded.
So that's one of our real focuses. of the type of users we have on the high end, particularly the space industry, they have
focused on some of the formal analysis and academic viewpoint to make sure that we have
done things that are analyzable.
So, I mean, that's a big benefit.
Are there chips or boards or dev kits that are particularly useful for getting started with RTEMS?
It's funny how even though we support 175 or more board support packages and 18 architectures, a few are very popular. There are a lot of users on the ARM Zinc, and there's support for the full support with SMP on the regular Zinc, like the Z boards and that kind of thing.
There's a lot of people on that.
There's also, surprisingly, I don't know that I've heard it mentioned much.
There is a Spark Leon V7, V8 processor that's used in space applications a lot.
And we have a lot of users on that.
And you actually can get, I think it's an RD board, an FPGA board.
I think you can get that for $100, $150 US.
And that's easy to work on.
You can also use things like Beagle boards.
You can use PCs.
So there's the AT-SAMPV, the STM micros, a lot of those common boards that are above, say, 128K of total RAM.
Generally, you can do a system in 64K, but you're pushing the limits.
If you want to have some real application code, you're going to need a little more space. But if you've got one of the embedded controllers with, say, 256K or
512K code space and 64K RAM, you can certainly field a small application. You mentioned Beagle
Board. Is Raspberry Pi on the list, too, then? Oh, that's the one I forgot. Yes, the Raspberry
Pi is on the list. And I think the Beagle and the Pi both, we even have small embedded graphics toolkits for those.
That's kind of nice.
I've used those for prototyping, but then I'm always a little leery of sending anything out even for big demos that are Linux-based. based. Well, and there's also the issue that I think, is that hardware built to withstand any
type of vibration or heat or longevity, power spikes? I mean, it seems to be very reliable
when you use it as a consumer, but when you start talking industrial or long-term applications,
you start worrying about those things. I know there are hardened Beagle boards. I don't know about Raspberry Pi. I imagine there must be,
but Beagle boards can be built by anybody. So getting them hardened just requires you to find
the right people who are building them. And the Beagle was sponsored by TI,
which would make more sense that it had a path toward industrialization.
So one of the things I didn't mention is we actually, our symmetric
multiprocessing is completely optional. You build with or without it. It's one of the few features
that's not just LinkedIn enabled. But we actually have been deployed on 24 core systems at this
point that I know of. So there's the 24 core PowerPC QR IQ series. So there's a there's a widespread of
deployed systems
out there
from really tiny
arms
to very large
systems
that
and
I'm still
kind of amazed
of how
the embedded space
has shifted
over time.
Yeah, we get real processors
with real amounts
of memory now.
It's amazing.
And we're using things
I learned in
distributed programming classes in college that I thought were cool, but I'd never see in work.
Imagine, let's put a Raspberry Pi and a couple of BeagleBoards against a Cray.
And so you mentioned symmetric multiprocessing, SMP. And then you said 24 cores. Does it just mean you can run multiple cores and the scheduler
knows how to deal with it? Or is there more to symmetric multiprocessing?
That's the core of it. We have multiple ways to address symmetric multiprocessing. At the
base level, you can just run one scheduler that runs, say, the 24 highest priority threads and spreads them out.
We also have the ability to do what's called pinning threads to particular cores.
We have the thread affinity.
Pinning is a single core.
Affinity is a set.
And pinning, it's impossible to create forced thread migration.
With Affinity, there are ways to allocate the threads to cores where you can unintentionally have migration,
and it's a little more complicated to schedule.
But that also is available under Linux and FreeBSD with the same APIs you have on our teams. But one of the cool things we have is we have the ability to do what's called clustered or partition scheduling,
where you can assign a subset of cores to a scheduler instance.
So let's say in a four-core system, you could have a uniprocessor scheduler for core zero
and perhaps put all of the I.O. and first order I.O. processing on one core
and then perhaps use the other three just for computation and use different scheduler instances.
So, and you can do that either to help the schedulability, the analyzability, you can do it
to help the cache locality so you're not moving data in and out across the bus.
I mean, that's one of the real problems in SMP systems is when you start getting cache contention
and then you end up basically thrashing the memory bus.
You can end up having really horrible performance or really good performance in an application and scalability
depending on how well you partition your
computations across the cores. So, related computations should tend to be on the same
cores. And that ignores the complexity of, you know, how many cache levels. If you're on a
four-core system that's actually two four-core modules, you tend to want to stay on a core module.
You don't want to schedule for, say, the fourth and fifth core.
You don't want to have threads spread across that because that actually is a hardware boundary inside the processor.
So that's one of the unfortunate things is now to be really efficient and to take advantage of your hardware properly,
you have to know how the cores are actually, and the cache is designed a little bit. But it is really cool
with the clustered scheduling. It is a really cool feature and I really don't know that anybody
else has done that. There's a lot of literature on it, but it works pretty well.
One of the things with pinning is you can put the other processors into low power states.
Do you support that sort of thing?
The way this works in SMP is there's an idle thread per core, and the idle thread by default will go into some low power state. You actually can have a
processor model or BSP specific or application specific override of the core. So you could shut down as much as you wanted to watch how long the system had been idle and start turning off
peripherals if you wanted to. But by default, if there's a low power instruction or sleep or something,
the default idle thread is going to take advantage of that if it's there.
SMP was on your list of parameters that we missed.
When we talked about choosing an RTOS is a large parametric search,
trying to figure out what's important and what you need and what things they advertise but you probably don't need.
On that list also included priority inheritance protocol and promotion.
I know we talked a little bit about priority inversion, but could you tell me more about that?
Sure. The priority inversion is a problem really when you use mutexes in multi-threaded systems,
and you end up with a low-priority thread holding onto something that a high-priority thread wants.
And there have been fielded systems where that actually has happened. I recall that one of the Mars rovers actually had this problem and it had
to be fixed remotely. The solutions generally are called, the one most, probably most of us are
familiar with is priority inheritance. And so, the low priority thread will temporarily inherit the priority
of the higher priority thread it's blocking. So as an example, if you've got a student intern
who's using the copier, the department head may not be able to interrupt the job on the copier,
but they can make sure nobody else comes in the way of that student finishing their job.
So the idea is to let the low priority thread execute to completion and give up the resource
and not take it away. So if you don't do that, you can have extremely long delays in the system
that don't make sense. My debugger is just going and printing all the messages, even though everything
else is halted and waiting for it to finish. Yeah, and I will admit to one that was really
surprisingly easy to create. We had a system where the highest priority thread in the system
read all of the analogs and discrets, and the lowest priority thread in the system would go out and
read various values and write them out to logging. And occasionally, you would see a hiccup,
and mainly we would see it in the rate monotonic statistics where the highest priority threads
period would be overrun. What would happen was the low priority logging thread would be accessing a variable
that was protected with a mutex while the high priority thread wanted to update it.
So you would begin to write the log right before the high priority thread ran, read a new value,
and wanted to set it. So the logging thread was holding off the highest priority thread in the
system. And in that particular case, you would see a thread that normally took about a millisecond to run,
take about, and was the highest priority thread in the system, take on the order of 17 milliseconds
to actually get one millisecond of CPU time. And it normally took its one millisecond immediately
because it was the highest priority in the thread. They're real easy problems to create. Priority inheritance will avoid that. You can be very
careful and try to design against it, and that's always if you can avoid it. But even when you're
really experienced, it's a pretty easy problem to trip into. The other one is priority ceiling protocol, which is similar,
but instead of waiting for a conflict,
priority ceiling moves the low-priority thread to a high priority that's predetermined,
and that's the highest priority of any thread that's allowed to access the resource.
These two are fairly common in the higher-end RTOSs,
but there are a lot of lower-end RTOSs that don't have these.
These two are pretty well analyzed and proven correct if you implement them correctly.
There's a lot of literature on these, particularly from the early 90s.
There was a resurgence in real-time theory. It's funny because I get the impression, and maybe it's not correct anymore, but people use multithreading just as a matter of course now.
And I remember in the 90s when it was kind of a newer thing for most processors to even be able to do this.
You had operating systems with processes, but threads were something that felt a little scary and like, well, do you
really want to make your process multi-threaded? You know, here's the problems with it. And people,
people would shy away. At least I would, I felt like that was something new and different and
scary. And, you know, do some research on these kinds of problems that can come up. And I feel
like now a lot of times people just, you know, multi-threading is just a thing you do and,
oh, there's these problems. Oh, well, I didn't, you know, multi-threading is just a thing you do. And, oh, there's these problems.
Oh, well, I didn't, you know, when you run into a bug, that happens.
And the bugs are really, the bugs can be very difficult.
But the other side of that is we also have the benefit, at least if you're willing to use a little bit of C++, you can put a thread inside a class and then have, or mutual exclusion.
And we're all used to writing gets and set functions.
Well, now just put a mutex inside the class.
And perhaps there's a thread in the background.
So when you ask for something to get done, perhaps you send a message to a thread that's completely hidden inside the class.
There's still the system integration issue of how all these threads behave together, but a good object-oriented programming can hide a lot of the things that lead to those coupling problems.
But you still have to make it work.
Yeah, and I think people get a little bit intimidated by priorities.
Like, okay, I have this system.
How do I arrange the – I have 256 options for each
thread. Do I need to use them all? Do I
divide it into two
priority domains? Well, that doesn't work because maybe
I need a third one. And it gets
a little tricky, especially
if you haven't done it before or
you're kind of intimidated by all the options,
right, to say, oh,
where do I put the UI? Should that be
low or high priority? Well, it needs to be responsive to the user, so it should be high priority. But on the other hand, I have to do these sensor reads and I don't want the UI getting in the way of that. So that I think is scheduling, is it has a priority assignment rule.
And the priority assignment rule is that it encourages you to think of things as either
periodic or how can I make them periodic. And then the things that run at the highest rate,
the shortest period, they get the highest priority. So you end up with this band of critical periodic tasks.
And below that, you end up with less critical tasks. And below that, you end up with,
I'm assuming, non-critical tasks. So if they don't run and make periodically, they're okay.
And then below that, you get things like background tests and things like that. But the cool thing about this is one of the early large OpenGL interpreter cards ran RTEMS,
and they had a requirement for a certain number of frames a second.
So they used rate monotonic period to drive the frame.
So that set them up, and if they ever missed their frame rate, they got an error.
So thinking in that way.
So in that case, perhaps you want the display
refreshed at a particular rate
for whatever technology you've got.
Then the user input may have mechanical limits
on how often it can happen anyway,
because you can only press buttons every so often.
There's a physical limit to how
fast a human input device can give you input if it's working properly. And you can just start
taking those factors into account and bounding and bucketing things. And I think that's really
important as we often don't think in terms of the physical characteristics of something and how fast it can really change.
I've worked on systems where I think when it comes to mind as a temperature sensor,
well, if it can only read a half degree, how often, how fast can the sensor actually change?
Not very fast.
Unless there's a catastrophe.
Right.
And or a vehicle like a Roomba, but it's moving a small distance. Let's take something larger, like an outdoor robot. If it can only move five miles an hour, depending on
your resolution of its location, you don't need to check it that often. It just is not going to
move that fast in computer terms. So we don't think about things like that very often. So sometimes you just
have to analyze your inputs and how frequently you need to control them. I worked on a robotic
system where when you analyzed it, the person had, the engineer who did the sensing of the rotation
of the wheels gave us millimeter accuracy on the rotation, which was actually insane because it was a tracked vehicle.
So they couldn't have given millimeter accuracy on the rotations on concrete.
And they were giving us 4,000 interrupts a second.
So we talked him down to like a centimeter and a half or something.
And it's like that still was probably unrealistic in the real world, but we weren't getting
4,000 interrupts a second anymore.
That's the difference between precision and accuracy.
Yeah.
You can be very precise, but not accurate.
And this is computer math. I mean, ultimately, there are limits to what you can do.
And knowing those limits is super important when you're designing a system.
Right. That also usually helps you avoid tripping into some weird behavior because
the inherent nature of the system has limitations that you didn't understand. Rate monotonic scheduling. I feel like this was something I should know. And I have used
the idea of shorter things should be higher priority before, but I didn't know there was
a word for it or a phrase for it. I didn't know this was a thing. How come nobody
told me that this was a good way to prioritize threads? Because I certainly have sat there and
gone, okay, what thread should be highest priority? And how do I figure out what it depends on so that
I don't do the priority inversion thing we're all so afraid of. The thing that's almost embarrassing is,
how old do you think the paper is that first proposed this?
Early 70s. I have the Wikipedia page open.
Oh, that's cheating. I think it's in the Journal of the ACM in 1973 by Lew and Leyland.
And if my memory serves, it was funded by the Apollo program. So this is,
that makes it 50 years old, almost, probably? No, no, no, no, no was funded by the Apollo program. So this is – that makes it 50 years old almost probably?
No, no, no, no, no, no, no, no, no.
Well, the paper's not, but their research – Okay, that's fair.
So, I mean, we probably should be seeing some of this stuff in school.
But, I mean, we've hired some new people over the past year,
and I've had to explain why you use namespaces in C++ to segregate code because they never did anything large enough to use namespaces.
So, I mean, there's a bit of programming in the large that's not taught.
I mean, the idea of having requirements for software was surprising to some of them.
So there seems to be some disconnect between what happens in really critical systems and
what gets taught in school.
It seems like it would be good to teach the best practices and then if you don't use them
on a non-critical application, well, at least you knew the best way to do something.
But that may be because a lot of people don't know those anyway. Sometimes there's a track between academic
and industry. And there's a disconnect between academic and industry. And I know in physics,
there was always the question of, are you preparing them to be physics academics,
or are you preparing them to do physics in the real world? And I think that's been true for CS as well.
Yeah, and I think CS has also suffered because at one point there was just computer science,
and then it split into computer engineering and information systems. And so some of the
focus has been shifted. Now, in fairness, there are some
really good textbooks out there that go over this. I helped out on a textbook, which I hope
falls into this category, but it's a large field. As a general plug, almost anything from
Andy Wellings and Alan Burns is going to go into some of these topics really well if it's one of their books in this area.
But you have to have people who can teach the classes on real-time and safety-critical systems before you get there, or even just general good multi-threaded design.
So it's hard.
It's hard to teach.
And not everybody needs it. I mean, if you're making websites, it may not be important.
Well, they have other, I mean...
Yeah, they have a whole list of other things they need to consider.
Stuff beyond embedded has whole different things in. Let me tell you what's out there
in the application world.
There's another odd thing that RTEMS has.
RTEMS actually comes with a web server, a Telnet daemon, and all of this is completely optional, and a shell. The shell is kind of like BusyBox, and I think it's got about 150 commands, some of which are ported over from FreeBSD.
So our current TCP IP stack actually is the FreeBSD from the current release.
We've figured out a way to automatically import updates,
and we have their entire TCP IP stack, all of the drivers, the USB stack.
We have support for USB removable media because of this.
Some of the graphics support. There's other things in there.
But the cool thing is the network stack is
configured exactly like the FreeBSD system administration handbook.
So we actually have the IF config and route commands and net
stat and ping from FreeBSD.
That's really nice. I'm used to LWIP and stuff like that.
I mean, we also support LWIP too,
but a lot of the embedded systems are large enough and want that.
And the FreeBSD comes with a lot of drivers
and has really high performance.
So you get a lot of features.
They track the vulnerabilities, and we just pick them up.
So another part of that, doing what you're good at.
We focused on the kernel, and we implemented the FreeBSD device driver kernel interface.
So the code comes over unchanged.
And that let us adopt their code, which they're really good at, and ends up being a nice integration.
So that's kind of part of doing what you're good at and leveraging work from others.
We should be starting to use larger, more capable middleware and libraries, right?
The embedded chips are getting more capable and larger, so sticking with the things that used to fit on, I don't know, an 8051 or an 8-bit or even a 16-bit microcontroller,
when we're using 32-bit with lots of RAM and, well, comparatively lots of RAM and code space, it's time to move to more capable things, I think.
And I thought it was interesting, a few years ago, QT had some things on their blog and some videos of an embedded Qt on RTEMS.
And what they considered a small system actually on the large side for an RTEMS system, but they were running GUI demos and GUI applications on RTEMS.
And that was the whole point. The person from Qt never asked a question on the RTEMS list
because the features are just the POSIX APIs that they expected were just there.
So apparently the Linux port came over pretty easy. They turned off the things that obviously
weren't going to work in Qt, like access to printer services and things like that. But
there's just value in having a common core base that doesn't particularly tie
you to some completely unique API in your application. And that's worked out really well
because it allows, there's a couple of core frameworks. The NASA has something called the
Core Flight System that works out of the box with RTEMS. And the science instrument and high energy
physics community has something called EPICS, the Experimental Physics Industrial Control System,
which allows you to build these huge science things like cyclotrons and linear accelerators
and remote and telescopes, radio telescopes and things like that. And there's a large number of RTEMS users at these
research facilities because they buy equipment, they deploy it, and it stays there for decades.
And they need to be able to share their applications and their infrastructure with
other researchers. And it's just an affordable solution that matches the requirements of having something a long time and having to share it.
POSIX compliance was one of the parametric things that we didn't talk about that probably should be
on most people's search for an RTOS. It basically means that the APIs to things like drivers are
well understood. They're open, closed, read, write, I octal.
But there are a whole bunch of POSIX APIs.
And we had a question recently on the Patreon Slack about what API should I look at in order to build something people will understand.
And POSIX wasn't quite the answer, but it should have been the answer.
POSIX wasn't quite the answer, but it should have been the answer. POSIX is huge.
There are about 1,300 APIs in it.
But one of the things that's important is it includes the entire C library.
So that gets you on the order of 300 to 400 APIs.
And RTEMS uses the same core C library that SIGWIN uses,
which has a strong heritage in embedded systems.
It's actually the same C library, I think, that ships with the ARM tools.
So there's a core user base there.
Plus, then you get standard things like there's a lot of POSIX thread APIs,
condition variables, mutexes, and you're right.
On top of that, we forget that Term IOs is the only standard API for accessing serial ports and doing device management.
You get standard IOCTLs.
So we have standard device driver infrastructure stacks to bring up a serial port and have it be fully Termios conformance.
And you have different driver stacks.
I mean, we've got a block device driver infrastructure that gives you caching and buffering.
And it's things like that that you forget.
And that actually,
that buffering is a good example.
It was implemented and a couple of years later,
somebody came back and went,
so what's the best algorithm right now
for buffer management?
And they found it
and they implemented it.
And that kind of thing
is what you get
from the open source community
because I don't think anybody doing that for their own project would – they would do what they did the first time and they probably would never get the opportunity to revisit it.
And so you're kind of getting the collective wisdom of the masses.
The many eyes makes bugs shallow.
Or you could look at it as you got the Borg.
So, I mean, whichever – however you want it to be, nice or evil.
I have some listener questions that we tried to answer before that we'd like you to try for.
I'm sorry, that didn't come out well.
That we'd like you to answer. How do I set up my RTOS to avoid common problems like race conditions, non-re-entrancy?
And what are some other common problems I need to watch out for?
And which is the best RTOS?
This is all from Nathan Jones.
So I'll start with which is the best RTOS. From working on the FACE technical standard, there are like five large RTOS vendors involved in even when you take that small set and that small market, they are different, and they have different characteristics and different licensing models, and you have to evaluate them.
And that's only a space of five.
There's probably a half dozen or so top-tier, top-feature RTOSs with a lot of integrated features. And then there are dozens, if not hundreds below that,
that if you start considering whether they're architecture specific
or somebody did it for a single project
and threw it out on GitHub
or do all the features come with it
or are there third-party packages
from another vendor?
So I usually just couch my answer
in terms of there are probably
a half dozen at the top
that are really, really good
and worth considering
because I'm not prone
to trashing competition,
especially when I actually have
on other projects,
consulting on projects
that have used a few of those.
And every one of us,
every one of them has the same problem that if they don't support your
hardware and your board, you're stuck having to write a device driver. And that is the same pain
for every OS. And it's usually a matter of how well the hardware is documented and if it behaves
or is buggy. So the other one was, how do you set up the RTOS to avoid common problems like race
conditions and reentrancy? So the OS itself hopefully doesn't have any race conditions,
and it supports APIs that are reentrant. The C library in particular has some APIs that are not
reentrant, and you should avoid them. But a lot of this comes down to you're writing your own code
and you have to avoid designing in race conditions and reentrancy issues.
And the race conditions, the answer is to either use like mutexes
to lock data when you're accessing it and follow the rules for those.
There is a newer class of data structures called lockless,
which I would not pretend to be an expert on,
but those are definitely worth looking at,
particularly in the area of symmetric multiprocessing.
There is a lot of research and development
on algorithms and data structures
which allow multiple cores to access them at the same time
without having lock contention.
And when I teach the RTEMS class, one of the things I emphasize is how many times we have reworked the internal time data structure in RTEMS.
And I thought it was done when we went to 64-bit nanoseconds since 1970 because that has like a lifespan of 500 years, which I thought solved the problem
forever. And then it turned out that somebody in the FreeBSD community had a lockless SMP safe
time data structure. And it's like, oh, well, I guess when you start dealing with time,
system time across a couple dozen cores, you need something that's lockless. So,
you know, that's a very important
thing to consider. Okay, so another question from a listener from Jakey Poo. Is there some middle
ground between big loop programming and RTOSs? And why is that wrong? Why is the middle ground
wrong or big loop wrong? Why is the middle ground wrong? Well, so the big executive control loop, I mean, is a technique which has
been used a lot. What you end up doing is you have essentially a cycle in your system where you say,
for every 100 milliseconds, I'm going to do the same activity. And you partition all the
computations in that 100 millisecond time into hopefully discrete steps
that can occur in a fixed order. And sometimes if things get complicated, you have to break
logical computations down into arbitrary subparts. So the big loop, as it grows grows it's a very hand scheduled timeline that is decomposed the actions are
decomposed not on a functional or object basis but on how long they take to execute which is
very dependent on your hardware so the the execution is scheduling is very hand-tuned and
arbitrarily impacts the design and layout of your software.
The next step is to try to look at it like, so really each of those activities in that big loop
are running generally, unless they're executed multiple times in that 100 millisecond period,
are executed at certain rates. And you start to partition the activities into logical components.
Like I need to read that sensor every 20 milliseconds.
So you set up a thread that runs every 20 milliseconds.
That's really the, I don't know what the middle ground is,
because a control loop is essentially taking a set of periodic activities and combining them into one loop, which breaks one of the rules of freight monotonic scheduling.
You're not supposed to combine activities of different timescales into the same loop. So you want to break them into independent control activities, independent threads of activity, and prioritize them accordingly.
And then make sure you've accounted for data dependencies between the threads.
So I'm not sure that there is a middle ground between – once you decide to break the single thread of control up, you've got multiple threads.
And then you probably have some OS, I think.
Or a mess.
I've seen the mess.
It exists.
I guess you can do an event-driven loop,
and that effectively is writing your own scheduler.
And if that gets complicated enough, you might have developed your own mini OS.
A bad one.
Well, usually. your own mini OS. A bad one. Oh. Usually.
Don't you remember the days when every project started with writing your own OS?
It felt like that for so long. I don't think I did that.
There was certainly a lot of choosing an OS and then trying to beat it in submission.
There were a lot of projects I took over and tossed the OS and bought one because...
It's always nice to have someone to yell at.
Yeah.
One of the things I definitely missed on our...
didn't predict correctly on our team
is our team actually has what we call a super core.
And all of the public APIs for threading and synchronization
are essentially thin wrappers or facades on top of that.
And I, from day one, thought someone would write their own facade comparable to the POSIX
that would be there to migrate their own in-house RTOS without changing the application.
But I have never heard of anyone doing that.
But I certainly remember a lot of in-house RTOSes, but I haven't seen one in a long time.
One more listener question from Thomas.
Are there RTOS solutions that provide ways to upgrade in place and manage project components individually,
or do I have to build slash buy a bootloader and component management if I'm using an RTOS?
Some RTOSs actually support dynamic loading,
kind of usually used to statically load libraries. It's part of the boot up.
I know RTEMS and VXWorks support this.
So this was envisioned, the support in RTEMS was envisioned that you could
break your code into subsystems and you would build essentially libraries that you could upgrade
individual libraries. So in that sense, you can build a base application that has all the
standard device drivers, required device driver support.
And then you just replace those dynamic libraries in your non-volatile memory.
So there are ways to do it.
But ultimately, at some level, if you have to replace the base image that loads other images,
then you're going to have to come up with some
magic below that layer. And there's some really cool presentations from something called the
Flight Software Workshop. A few years ago, there was one of 10 years of patching software on Mars
that was very interesting on different techniques for doing this. And you come out with new respect
for the guys at JPL and how they haven't bricked a rover or something
in years of binary patching software on the surface
of Mars. So using the
Artemis Dynamic Loader, which I don't know how many people do,
is a technique for doing that.
Chris, do you have a question?
I have one slightly mean question,
and it comes from a place of,
I haven't been in a position to select an OS
for a project in a long time,
but having talked to you and read about RTEMS,
it seems like it would be on the top of my list
to explore, were I doing that?
I mean, it sounds like it has a lot of great features.
Why wouldn't someone choose RTEMS? What's the reason that RTEMS might not be the best choice?
Let's say, for example, you had some requirement to fork and exec processes. RTEMS doesn't do that.
Okay.
So that's one. I actually have a slide where I show there's a spectrum of operating systems. And at one end, you've got
these embedded OSs that are basically single process, multi-threaded. And there's a series
of types of real-time OSs. And if you go far enough in the other spectrum, you end up on
general purpose OSs, which don't try to be real time.
And you really just want to pick the OS that's right for your application. I mean, you don't want to be the only real time program on an OS. That's probably the worst thing you could do.
Or if the OS doesn't promote some particular type of application, I'm trying to think of something like how we're doing enterprise computing.
I don't think you want to host a Mongo database or try to replace Amazon Web Services with RTEMS.
A lot of it is making sure you pick the right kind of thing.
You don't want to go pick up a scoop of it is making sure you pick the right kind of thing.
You don't want to go pick up a scoop of sand in a Miata, I mean, either.
So, I mean, we have different vehicles for different reasons, and there's different OSs for different reasons.
Yeah, okay. That makes great sense. The one thing you will get with RTEMS is you're going to get easy-supported GNU or LLVM-based tools that will come up on standard hardware platforms, and then you'll get all the library support you want. We support C, C++, and Ada generally out of the box.
So that stuff just builds where possible.
We know of open source simulators, so you can run on simulators.
And we do a lot of automated testing.
That's one of the things we've really emphasized.
I haven't used the word reproducible.
We provide the full test suite and we encourage people to reproduce test results on their own hardware, which that's something you can do when something's open source.
I think it's important to step across the API boundary
and just go, oh, I passed a bad priority.
That's why it's returning me that invalid number or invalid priority
or whatever the error is.
So it's handy and it's good to know the value of open source
is that we can share with you the knowledge of the internals, and there's no NDA.
It's just there.
You mentioned LLVM.
Have there been folks using Rust and other non-C C++ languages to target RTMs?
There have been Ada applications.
There's interest in Rust, but I don't know of anybody trying that.
We did the first embedded port of RTEMS.
One of our Google Summer Code students did that a few years ago.
So GNU Fortran works on RTEMS, of all things.
That's probably esoteric. So, eventually Rust will be supported as long as the LLVM support matures.
I'm sure somebody will bite it off.
But it's open source.
It has to be somebody interested enough to do it or pay for it, which is how I generally make a living is either people get us to do things because they know our areas of expertise,
either for operating system, embedded systems. Believe it or not, we've got somebody on the
SysML standard body committee. We do a lot of consulting and model-based engineering.
And so some of the work we do is directly RTEMS or RTEMS application related. Some of it's just because we've done a lot of embedded system design and are very aware
of standards and processes and just consult on those things.
And I guess, like you guys know, if the work is interesting and somebody's paying the bill,
there's a good chance you'll take the work.
I'm looking at the RITimes page
and it looks like Python and MicroPython
are supported, as well
as Fortran, Erlang,
Ada, and C++.
Oh man, I left off Erlang and Python.
That's horrible. Well, I think
Python was the big one to leave off.
Yeah.
Yeah,
MicroPython is a pretty good embedded Python. Full Python works, but I think you have to be careful about some of the packages. There's at least a couple of fielded applications that provide some scripting capabilities.
I'm thinking of a security camera that the scripting of the scan and sweep patterns for the cameras
is ultimately done in Python.
I mean, I can't just go import TensorFlow, NumPy, SciPy,
and it'll work the way I go.
Matplotlib.
Matplotlib.
You know, some of that actually would work
because a lot of that is just math, and it's going to work.
That's true, yeah.
I mean, that's where you have to go, oh, okay,
that's just a math library. It's probably going to
work on the embedded system.
But, you know,
if you're going to fork
a process
or expect to have a full desktop
GUI, that's where things
you know, those are kind of the big boundaries.
The Erlang Erlang is a kind of, I guess, an esoteric embedded systems language that – it's a functional language that came out of Ericsson. and promoter, he has a system that is used in the automotive industry in factories to
program multiple microcontrollers as they come down the production line.
So they always get the latest version of firmware.
And Erlang is such a fault-tolerant language based on the telecoms.
That was one of its goals, came out of the telecom industry.
One of the stories he tells is they had one of its goals, came out of the telecom industry, that one of the stories he
tells is they had one of the units, the power supply was failing, but the system would reboot
and the program would take over reprogramming right where it did. And they thought there was
a software problem until they realized it was the power supply failing and the software was
actually recovering as well as it could given the power supply failing, which is pretty phenomenal. I mean, that's a credit to him
to recover from that kind of stuff, though.
And Lua is also supported.
And I've actually used TCL from the
old, old Dusty Ages.
Anything that's meant to work in a core POSIX environment
is probably going to come over fairly easy.
I mean, we actually support the standard NTP,
Network Time Protocol code, the standard SNMP, Google ProtoBuf.
There's a lot of add-on libraries that just work.
If somebody wants to start with R times, what is their first step?
So one of the nice things, I'll give a shout out to the Google open source program office.
We've always tried to focus on on-ramping people, but since we're not starting out anymore, we don't tend to see the pitfalls.
We've worked with college and the
Google high school programs. And so we now have a Hello World and our user's guide is focused on
getting you up to speed and running. There's an example of trying to get to the point where you
can run Hello World on a simulator. That's what we usually recommend. So you don't get involved
in the treacherous world of cabling and hardware issues that don't have anything to do with RTEMS itself and show you how to use GDB on that.
And depending on the speed of your computer and your network link, you can generally download, build all the tools in a couple hours.
I think my computer, I can do it in an hour, an hour and 15 minutes for one of the architectures. One of the things there is we have something that actually called the Artem Source Builder.
And once you clone that Git repository, you execute one command and you can actually build an entire software tool chain,
including every supported add-on library that's in the source builder at this point right out of the box.
So the one that I didn't mention, like the curl library, is included in that standard build.
And if it's not one of the full-up network BSPs, you can build one command to build a tool chain,
one command to build RTEMs, and then you're off.
So it's generally, we've tried to make it easy. High school students
have done it and we have tried to gear it toward that level. The other nice thing about being open
source is, and encouraging you to build from sources, you end up with, we want you to have
the source and any patches so you can put it under configuration control on your project if you want to. So we have the ability to help you make your own snapshots and distributions
because we know people do that.
We just encourage people to do that.
We also have had many users submit things for other OSs,
like we have build instructions for the Mac, Linux, Windows, Raspberry Pi.
Somebody did that.
It took a while.
To build on a Raspberry Pi?
Build RTEMs on a Pi?
Yes.
Yes, built RTEMs on a Pi.
Actually ran a processor simulator.
And the amazing thing about that is the build speed was about the same speed as a mid-90s Spark Station.
Yeah, I was going to say.
Remember how things used to be.
Yeah.
That allows us to support host operating systems and target processor boards
that honestly are probably not commercially viable.
That's another odd thing.
It allows us to,
as long as there's a few users
who are willing to pipe up periodically and test
and say it works,
there's no reason to remove something.
It's not like we're trying to,
if we don't have 50 users on a platform, it's going to get deleted because it's not profitable because it's not what
it's there for. It's just a different business model.
Self-training, self-support, services,
custom development, and the software is there to use
and actually give us all a sense
of pride that it's been used in so many
cool systems.
Joel, are there any thoughts you'd
like to leave us with?
I guess that everybody needs to remember that
you want to have fun doing this.
If embedded
systems are your thing, then just
have fun doing them and
keep being good for people in the world
and throw yourself out there. That seems like good advice. Our guest has been Joel Sherrill,
Director of R&D at Orcorp and Benevolent Dictator for Life of Artemis. Thanks, Joel.
Thank you. Thank you to Christopher for producing and co-hosting. Thank
you to our Patreons for Joel's mic. And thank you for listening. You can always contact us at
show at embedded.fm or hit the contact link on embedded.fm. I actually don't have a quote to
leave you with. Let's see. It appears from genetic research that it isn't that male birds have gotten more colorful over time in order to stand out, but that female birds have gotten less colorful over time.
Embedded is an independently produced radio show that focuses on the many aspects of engineering.
It is a production of Logical Elegance, an embedded software consulting company in California.
If there are advertisements in the show, we did not put them there and do not receive money from them.
At this time, our sponsors are Logical Elegance and listeners like you.