Embedded - 477: One Thousand New Instructions
Episode Date: May 16, 2024Kwabena Agyeman joined Chris and Elecia to talk about optimization, cameras, machine learning, and vision systems. Kwabena is the head of OpenMV (openmv.io), an open source and open hardware system ...that runs machine learning algorithms on vision data. It uses MicroPython as a development environment so getting started is easy. Their github repositories are under github.com/openmv. You can find some of the SIMD details we talked about on the show: 150% faster: openmv/src/omv/imlib/binary.c 1000% faster: openmv/src/omv/imlib/filter.c Double Pumping: openmv/src/omv/modules/py_tv.c  Kwabena has been creating a spreadsheet of different algorithms in camera frames per second (FPS) for Arm processors: Performance Benchmarks - Google Sheets. As time moves on, it will grow. Note: this is a link on the OpenMV website under About. When M55 stuff hits the market expect 4-8x speed gains. The OpenMV YouTube channel is also a good place to get more information about the system (and vision algorithms). Kwabena spoke with us about (the beginnings of) OpenMV on Embedded 212: You Are in Seaworld. Transcript Elecia is giving a free talk for O'Reilly to advertise her Making Embedded Systems, 2nd Edition book. The talk will be an introduction to embedded systems, geared towards software engineers who are suddenly holding a device and want to program it. The talk is May 23, 2024 at 9:00 AM PDT. Sign up here. A video will be available afterward for folks who sign up.Â
Transcript
Discussion (0)
Before we get started, I want to let you know that I'm giving a free talk for O'Reilly
on an introduction to embedded systems.
If you or one of your colleagues or managers are interested, it is Thursday, May 23rd,
2024 at 9 a.m. Pacific.
It will be recorded if you miss it.
But if you go there live,
you can ask questions. There will be a sign-up link in the show notes.
Welcome to Embedded. I am Eliseo White alongside Christopher White. Our guest this week is Kwabena Ajiman, and we're going to talk about my plans to create a WASP identifying camera system.
Hi, Kwabena. Thanks for coming back after being on the show already and knowing what we're all about.
Yeah, it's been seven long years, but, you know, I thought it was time.
Could you tell us about yourself as if we had never met. Gotcha. Gotcha. Well, nice to meet you. I'm
Kwabena. I run a company called OpenMV. We do computer vision on microcontrollers.
Seven years ago, we were one of the first companies thinking about this kind of idea.
Back then, we were deploying computer vision algorithms on a Cortex-M4, and we had
just upgraded to the Cortex-M7, which was kind of the hottest thing that was coming out back in
2017 timeframe. Since then, we moved on to a Cortex, sorry, the STM32H7, which is a higher performance processor, and now the IMX-RT, which is even
more performance. And so I am excited about the future and plan to talk to you all today about
how these things are going to get even faster. That seems unlikely, but okay.
It seems unlikely.
I heard Moore's Law was over.
Well, not for microcontrollers.
For microcontrollers, it's still happening, but the gains aren't like 2%.
It's like 400% each generation, actually.
Microcontrollers are still on like, you know, 50 nanometer process.
So we have a long way to go.
Oh, not even that.
Not even that.
The latest ones are coming out at 12 to 16 nanometer for MCU.
Oh, okay.
So it has moved down it has moved down yeah
yeah but that used to be like top of the line processors and now it's coming it's coming to
you for for nothing all right so clearly we have a lot to talk about including wasps yes um but
first we have a lightning round are you ready okay yeah hardware or software? Both. Python or C?
C.
Marketing or engineering?
Both.
Cameras or machine learning?
Cameras.
AI or ML?
ML.
Favorite vision algorithm?
April tags.
Okay, we're going to break the rules.
What is April tags?
What is April tags? So What is April Tags?
So you ever seen those QR code-like things that the Boston Dynamics robots look at to figure out what's a fridge and what's a door and such?
And so they put them all around.
That's called an April Tag.
It's like a QR code, but easier to read.
And it also tells you your translation and rotation away from it.
So if you see the code, you can tell, given where you are,
how it's rotated in 3D space and translated in 3D space.
Oh, that's very cool.
It's like little fiducial markers for the world.
Yeah, each one of them encodes just the number,
but then you just know, okay, number zero means coffee machine,
number one means door frame or something.
And that's how they get robots to, like, navigate around without actually fully understanding the environment.
Oh.
Complete one project or start a dozen?
One.
Favorite fictional robot?
I like WALL-E.
I love the movie.
You have a tip everyone should know.
Tip everyone should know.
You should learn SIMD performance optimization.
We're going to talk about that today.
It's something that blew my mind, and I think everyone should really think about it more often.
You can double or triple the speed of the process you're working on very easily if you just put a little work in.
Okay, what is SIMD?
Single instruction, multiple data.
Okay, that sounds interesting for machine learning things,
but can I actually use it?
Yes, you can.
The way to say it would be,
so back seven years ago when we were working,
when I was doing OpenMV,
I thought it was a hotshot programmer.
And I wrote vision algorithms to run
on these microcontrollers. And I kind of wrote them, you know, straightforward. I just wrote
them in a way that be the textbook answer. And just assumed, okay, that's the performance that
it runs at. That's the speed that it runs at. Good enough. Let me move on. And wait,
these algorithms, they're like FFT and convolution and like what?
Yeah, kind of stuff like that.
The best one example would be something called a median filter.
Sure.
So the median filter is basically take a single pixel and then look at the pixels around it, the neighborhood.
So let's say you look to the left, right, up, down.
So all eight directions.
So there's eight pixels around it for a 3x3 median filter.
And then you just sort those, and you take the middle number,
and then you replace the pixel there with that.
The median filter has a nice effect on that.
It blurs the image.
It kind of gets rid of sharp, jaggy things, but it keeps lines.
So strong lines in the image still remain.
They don't get blurred. But then it blurs the areas that don't have keeps lines. So like strong lines in the image still remain, they don't get blurred,
but then it blurs the areas that don't have strong lines. And so it produces a really nice,
beautiful effect. It's a nonlinear filter though. So the mathematics to make it happen are a little
bit tricky. But yeah, that's the median filter. And so when I wrote this originally on the OpenMV
cam and we had it running seven years ago, that ran at one frame a second.
And I was like, yeah, that's how fast these things are.
Can't do any better.
I know what I'm doing.
And then I hired a performance optimizer, Larry Bank.
I don't know if he's been on the show, but this guy blew my mind.
I asked him, hey, can you make this better?
And he got it a thousand percent
performance increase on me. One thousand percent. So that's about 16x. That algorithm went from
going from one frame a second to 16. And I was blown away. When someone kind of does that and
is able to beat you by that badly, it's kind of like, you know, you have to wake up and start thinking, what am I leaving on the table?
And he just did two things to make the algorithm go faster.
One, I was doing boundary checks to make sure I wasn't running off the edge of the image every single pixel.
Oh, no.
Yeah.
Yeah.
So he just made a loop.
He made two loops.
One that checks to see, are you near the edge, that it does boundary checks.
And one that if you're not near the edge, it doesn't do boundary checks.
Massive performance gain.
Second loop.
The second change he made was to do something called a histogram kind of.
So when you're doing a sorted list of numbers, let me back up.
How do I explain this?
The median requires a sorted list.
Yes.
Because you need to know which is in the middle.
So you need to know which is higher and which is lower.
And so you end up, even if at three by three, you still have to order all three numbers.
Yeah. So instead of doing that, what we did is you can kind of maintain this thing called a histogram.
So a bunch of bins, right? And what you do is when you're at the edge
of the image, you initialize the histogram with all the pixels in that three by three. And then
you walk the histogram really quick to figure out where the middle pixel is. So you just start at
the beginning and kind of walk it and do this thing called the CDF, where you kind of sum up
the bins until you see which one is bigger than 50% of how large it
could be. And that tells you the middle pixel. And that can be done pretty quickly. You still
have to do this every pixel, but it's a nice, fast, linear for loop. So processors can execute
that really quick. But the big change he did was instead of initializing the histogram every pixel,
you just drop a column and add a column yeah and so what
this does is it separates even if like your kernel becomes 11 by 11 or something like that
you just drop a column at a column and so you're not doing the work of reinitializing the histogram
every pixel and that's where the 16x performance came in by just doing that little change and going from o uh o squared i think o of
n squared to uh a two n kind of o of two n um just massive difference in performance and on these mcus
that matters uh on a desktop you can kind of code this stuff with the uh the minimal effort approach
i originally took and it won't and what doesn't doesn't cost you anything, but on an MCU, putting that effort in to actually do this well, it really is a huge game changer.
But then here's where the SIMD comes in.
It turns out you can actually compute two columns or up to four columns of that histogram at the same time. Because on a Cortex-M4,
there is an instruction that allows you to grab a single long
and basically add that, split it into four bytes at a time,
and add those four bytes as accumulators to each other.
So it'll do four additions in parallel
and not have them overflow into each other.
Ooh, okay.
Yeah, yeah.
It's existed there since the Cortex-M4, so I swear it's in every single processor that
has been shipped for a decade now.
It's just been sitting there and no one uses this stuff.
But if you break out the manual, it's available.
So I've known about SIMD instructions on various processors for a while.
This sounds like a really standard one for ARM CoreTech. I know ARM has another whole set of things called Neon,
which is, I think, is this part of Neon,
or is Neon a bigger set of even more SMD things?
No, Neon's a bigger set.
That only runs on their application processors.
Oh, okay, okay, right.
So on their MCUs, they have this very,
technically these instructions are also available
on their desktop CPUs also.
They're not utilized heavily.
What ARM did is they wanted to make DSP a little bit more accessible, well, faster.
And so there's a lot of stuff in the ARM architecture
that allows you to do something called double pumping,
where basically you can split the 32 bits into 16 bits.
And they have something called SAD16,
so you can add the two 16 bits at the same time or subtract. There's an instruction that'll take two registers,
two 32-bit registers, split them into 16 bits, and then multiply the bottom 16 bits by the bottom 16
bits of one register and the top 16 bits by the top 16 bits of the other register, and then add
them together. And then also do another add from a accumulator. So you'll get two multiplies and two adds in the same clock cycle.
Do I have to write an assembly to do this? Or are there C compilers that are smart?
Or are there Cortex libraries that I should be using?
GCC just has intrinsics. So it's just like a function call where you just pass it a 32-bit
number. And then it just gets compiled down to a
single assembly instruction, basically. So you're kind of able to write normal C code, and then when
you get to the reactor core of a loop, like the innermost loop of something, you can just
kind of sprinkle these in there and get that massive performance. One of them that's really
valuable is something called USAT ASR. So did you know that whenever you need to do like clamping,
like the min and max comparisons for a value,
ARM has an assembly instruction that'll do that for you in one clock.
But you shouldn't go out and use this unless that is part of your loop.
You shouldn't do the min-max during your initialization of your processor
with this fancy instruction.
No, it's pointless.
You should limit these to optimizations of things that you need to run faster.
Yeah.
Not optimizations because they're fun.
But the sigh was because I wished I'd known about this for several projects in the past.
Well, it's just kind of like, here's an example.
We actually have something I wrote recently,
which I'm really proud of,
is we actually have a bilinear,
sorry, a nearest neighbor, a bilinear,
and a bicubic image scaler in our code base now.
And I scoured the internet looking for someone
who had written a free version of this
that was actually performant, and none such existed. So we're like the only company that
actually has bothered to create this for a microcontroller. And, you know, it actually
does allow you to scale an image at, you know, up or down at any resolution using bicubic image
scaling and bilinear. So bicubic basically can take like a really pixelated image
and then produce like these nice colorful regions. Like it'll do, you know, it really blends things
well. It looks beautiful when you use it. And to do that though, it's like doing a crazy amount
of math per pixel. And so being able to use the SIMD instructions when you're doing the blend
operation for upscaling these things, it made a huge amount of difference.
Like, if you don't do this, the code runs 2 to 4x slower.
I've always known that graphics code is an area where there are a lot of optimizations that are non-obvious.
I mean, there's the Stanford graphics page about optimization hacks.
Have you seen that?
Maybe. I've seen like the
bit, if you've ever like had
fun searching around stuff, there's always like the
bit magic things where you know
if you've ever seen, what is it
Doom
the original Doom, there was one thing which was
like the inverse square. The magic number
yeah. Well and there's
Hackers Delight that's similar. Yeah magic number, yeah. Well, and there's hackers to light that's similar.
Yeah, hackers to light.
So, is this, are these algorithms optimizable because they're graphics?
No, I mean, the instruction he was talking about with breaking up the 32-bit into 16
and doing that, I mean, it sounds like matrix stuff.
That'd be easily applicable to matrix stuff
or FIR filters, I would assume.
Yeah, no, I think it's meant for FIR filters.
And also, it's not FFT so much,
maybe if you were doing Fixpoint.
But definitely for audio, though.
Like if you wanted to mix two audio channels together,
for example, it would be probably good for that.
You could set a gain on each channel, right?
And then it would automatically mix them as long as you were doing both in two 16-bit samples at the same time.
So the cool thing is you can set all this up.
Your DMA system to receive audio could be producing audio chunks in a format that actually is applicable to the processor than churning through it this way.
One of the tricks you have to know what to do when you're trying to do this stuff
is the data does have to be set up
to feed well into these instructions.
You can't actually utilize them
if you have to reformat the data constantly
because then your speed gain will be lost
on data movement, more or less.
You just said fixed point. Of course
you're doing fixed point, aren't you?
Yeah, it's all in fixed point.
Oh, okay.
Yeah, but no, the reason why
this is so cool is that as I
got into this,
when I first was doing the microcontroller
stuff, it was just kind of, hey,
we're just having fun here, just trying out cool
things. Is this going to go anywhere? Not really
sure.
Performance isn't there, speed isn't there, usability isn't there.
And honestly, after I met Larry and we started actually making things go faster,
it started to dawn upon me that, well, hey, if you're getting a thousand percent speed up here,
and then this is before we added the SIMD part.
Once you add that again, you can get even another 2x speed up here and then i'm this is before we added the simd part once you
add that again um you can get even another 2x speed up on top of that so 2000 percent
speed up you know basically you're going from you know one frame a second to now you're at 30
at uh 320 by 240 right that's on a microcontroller um now let's say the microcontroller's clock speed
doubles and now being not now, instead of being at
400 megahertz, you're at 800 megahertz. And then there's this new thing called ARM Helium that's
coming out. And ARM Helium offers an additional 4 to 8x speedup on all algorithms. And this is for the new Cortex-M55 MCUs that are coming.
And ARM Helium is actually closer to ARM Neon,
so it's not a very limited DSP set,
but it's actually like a thousand new instructions
that will allow you to do up to 8 or 16 elements at a time math.
And it also works in Floating Point too. You can do doubles, floats, four floats at a time math. And it also works in floating point too.
You can do doubles, floats, four floats at a time,
even does 16-bit floating point too.
I love these RISC CPUs with thousands of instructions.
The R is just there to make the...
I mean, if they were just called ISC,
everybody was like, what does that even mean? But RISK sounds
cool. So yeah, the helium stuff was what I
was actually thinking of when I said neon that
I had read a little bit about because we're using
M55s on a project.
Have you actually gotten into using
some of the new stuff yet? No, I was just
reading through data sheets and saying,
oh, that looks cool. I don't know what to do with it yet.
But yeah,
that's amazing. It reminds't know what to do with it yet. But yeah, no, that's amazing.
It reminds me,
I've always felt like
we're leaving performance on the table
because we tend to,
these days things move so quickly,
we tend to just wait for the next CPU
if we can't do something fast enough.
Whereas I'm going to,
you know, back in the old days,
you know, people were doing amazing things
because they only had a 6502 or an 8080 or something.
And, well, I need this to, there's no other computer.
I need this to go as fast as it can.
So I'm going to hand optimize assembly.
And I'm not suggesting people hand optimize assembly, but looking for the optimizations that the vendors have already provided and that people just don't know about, I think, is something people miss.
Oh, yeah. But it's actually really, really huge, though. It's much bigger than you think.
Okay, so let me say it like this. We actually got one algorithm called erode and dilate.
I have that running at, it's able to hit 60 FPS now on our latest gen system, the OpenMVCAM RT1062. And it runs at 60 FPS at VGA.
So when we get to a helium-based processor system for our next generation-based OpenMVCAMs,
we're going to be able to 4x that number. So now you're talking 1280 by 960 at 60 FPS, right? And then that's like a 1.3 megapixel image.
Two megapixel is 1080p, right?
So if we go to 30 FPS,
now we're talking we're able to run
1080p image processing
on a $4 microcontroller.
That's really insane.
Now you're running into,
do I have enough RAM for this?
Well, that's the thing.
Microcontrollers are also coming out.
There's the Aleph Ensemble, for example.
The thing has 10 megabytes of RAM
on chip. Does it now?
10 megabytes of RAM
on chip. I'm sorry. I'm using that
right now, so I have to
keep my mouth shut.
But this is the
thing, though.
We're kind of going to be crossing this chasm where these MCUs are really going to be able to do things that previously needed a Linux application processor, an OpenCV, to do.
Like, once you get to 1080p, that's good enough for most people.
They don't really care to even have more resolution.
I mean, even 1280.
I mean, the Nintendo Switch, it's sold quite well. And that's a 1280 by 720 system.
Like, you don't need to go to 4K or 8K to have something that will be in the market.
Unless you talk to my other client.
Pushing all of Christopher's buttons. It's great.
What if we had
four or six 4K cameras?
Anyway.
Okay.
Yeah, pushing the pixels.
But OpenMV,
if I remember,
works with MicroPython,
which is really cool,
and I loved working with MicroPython,
and yet it didn't... It wasn't the fastest loved working with MicroPython, and yet it wasn't the fastest.
Well, the thing is that we don't write the algorithms in Python.
Python is just a layer to assemble things together.
So we actually have the algorithms, they're written in C,
and then we use the SIMD intrinsics,
and we try to write things so that they're fast in C.
And what MicroPython is really just doing is being an orchestration layer to tie everything together. And it really, really
helps the use case because what we're trying to do is to pull in all the non-embedded engineers
to give embedded systems a try. So if you say like, hey, you need to learn how to use all these
crazy tools, get the JTAG out.
By the way, you got to buy that and it's going to be $1,000 just to program it.
And here's a crazy make file build system and all this other stuff.
It's really going to run into a lot of brick walls for most people.
But when you get someone who's worked on the desktop, they're used to Python scripts, and you say, here's the library, here's the API.
You can basically write normal Python
language and just look at the API for what you're allowed to call. It's a much easier transition for
folks. And it's part of the key on why our product's been successful. It's also been nice in
that a lot of middleware Python libraries now can run on the system. What we recently, so we have a system
called ULab that actually runs on board.
ULab gives you a
NumPy-like programming interface.
So if you want to do NumPy-like operations,
you can actually do that.
Yeah, yeah. And so it supports stuff like
matrix multiplies.
They're adding singular vector
decomposition right now.
And so you have that on board.
So you can actually write a lot of the standard matrix math you would have on a desktop and just port it right over.
Additionally, we also have a sockets library and a system for Bluetooth.
And so the sockets library allows you to write pretty much a desktop app that would normally control low-level sockets.
And you can port that onto the OpenMV cam,
and now you can connect to the internet,
do Python U-request, and do API calls,
and so on and so forth.
So it makes it really powerful, actually.
So we had a listener, Tim, ask a question
that seems really relevant at this point.
Is this intended for genuine production usage
or more for hobbyist prototype work?
The OpenMV homepage and docs
have a heavy emphasis on MicroPython.
Are there plans to provide a C or C++ API?
Yeah, so we're never going to provide
really a C API.
You can take our firmware
and actually just write C code directly.
So what we found is once...
So a lot of customers who look at it and think,
oh, yeah, this is one thing, and then don't give it a try.
And then we have a lot of customers who say, this is amazing,
and go and modify it for whatever ways and shapes they need.
So what we see a lot of times is a customer will take the system,
and if they don't like MicroPython,
they'll just literally rip that out of the code base
since everything's Makefile-based,
and they'll just shove it into whatever they're actually using.
Since it's open source,
you can kind of actually do Frankenstein edits like that.
And as long as you follow some good practices
and don't completely edit our code and have an unmaintained fork,
you can do a pretty decent job of staying in sync with upstream
while not having a totally broken system.
But no, we plan to keep it in MicroPython.
And the reason for that, as I mentioned,
we want to get the larger developers
who are not working in embedded systems
to kind of jump on board.
But also, we found that it's pretty usable in production
for a lot of people with the Python interface.
So I was just talking to a customer this week, actually,
who's putting these things in power plants.
And they're loving it, actually.
For them, they just needed to do some basic editing of,
they were just using it as a sensor
that would connect to their infrastructure and then do some remote sensing.
And I don't want to mention exactly what they're doing to not spill the beans.
They're monitoring the nuclear fuel rods.
Yeah, but they loved how we had a system that was flexible.
One of the big things for them is they didn't want a black box system
that could not quite do what they needed. They wanted something that was open and available for them is they didn't want a black box system that could not quite do what they needed.
They wanted something that was open and available for them to tweak it in any ways they needed.
It's called OpenMV, so you'd think that people would recognize that it's open source.
But it is actually open source and open hardware and all of these instructions and GCC intrinsics
we're talking about. You can go to the code and look up.
Yeah, you can.
You can also, we should also rename ourselves Closed AI, I guess, maybe. Or Closed AI.
What is the, I know we've probably talked about this in the past,
but people probably ask, what license,
what open source license does OpenMV come under?
Yeah, so we're actually under the MIT license for most of our code.
Yay.
We do have, yeah, yeah.
Party time.
Honestly, trying to enforce this, it'd be insane, right?
I mean, people, it has led to some weird situations.
So we're actually really popular in China.
And so much so that people actually take our IDE, which is our changes, our MIT license, but the IDE base is GPL.
So you do have to keep that to be open source.
But for that system, we actually see people who want to compete with us.
They actually take our IDE, take the source code, they remove our logo, put
their name and logo on it, and then sell a product that has the same similar use situation as us
and actually try to compete with us with our own tools. It's crazy.
And by crazy, you mean exceedingly frustrating?
Exceedingly frustrating. But hey, you know, it's kind of flattery, though, at the same time.
Like, they like our stuff so much, they're not going to build their own.
They're going to just copy and paste what we've been doing.
So you said closed AI.
Are you closed AI?
Where was I headed with that?
I think that was just a joke.
That's a joke on open AI.
Oh, I see.
Okay.
I mean, we've talked a little bit about some of the graphics things with the erode and dilate,
but you have a whole bunch of machine learning stuff too.
Yeah, yeah.
So we integrated TensorFlow Lite for microcontrollers a long time ago,
and we've been working with Edge Impulse, and that's been great.
They basically enabled people back when this was super hard,
and we didn't have this back when I did an interview with you, seven years ago also.
They made it easy for folks to
basically train a neural network model
and get it deployed on the system.
At first, we started with image classification,
but that's moved on to something called
FOMO.
I think it's called Faster Objects,
More Objects, but obviously it's a play
on YOLO. Right, right. I remember that. Yeah. Okay. Yeah. I knew it was connected to YOLO somehow,
but I forgot. Yeah. Yeah. It allows you to, it basically does image segmentation. So let's say
it'll take an image and like, it'll take a 96 by 96 input image and uh then basically it figures out the
centroids of objects in that image and will output a 16 by 16 pixel array where the centroid of a
object is and this allows you to do multi-object tracking on an mcu and you can do 30 FPS. Okay, where do the Jetson TXs fit in here?
I mean, I
thought those were super
powerful and did all the super
amazing things, but now he's
got microcontrollers doing this.
Jetson would be able to do
much larger image than 96x96.
Oh, yeah, absolutely.
Of course, but Jetson's
also running Linux and doing a bunch of other stuff that slows it down.
Right, but it's a big real GPU.
I don't think it slows it down.
No, no, Jetsons are awesome.
I think here's the problem.
So if you look at the latest Orin, right?
I'm looking at it right now.
Yeah, the high-end one.
So I was actually, my last thing I was doing,
so by the way, I only recently went full-time on OpenMV.
I was side hustling this forever.
When we started this company, it was always a side hustle.
And I recently, last year,
the company I was working for, Embark Trucks,
they had gone public and I had joined them at like employee 30
and ridden with them for five years.
I was a ride or die employee there.
And they went public for $5 billion in 2021, part of a SPAC process.
And the company shut down then in 2023 and was sold for $78 million.
So I got to see...
So in the meantime, there were constant parties in the Caribbean?
Yeah, no.
For the engineers, we were just working hard to make that stock go back up.
But it was an interesting ride.
I say that, though, because I'm at full-time now on OpenMV.
But one of my last jobs at Embark before we shut down
was I was trying to figure out how to get an NVIDIA Orin into our system.
And that thing's amazing. It can replace so much. before we shut down was I was trying to figure out how to get an NVIDIA Orin into our system.
And that thing's amazing.
It can replace so much.
But it's also $1,000 plus?
$2,000 on Amazon.
Yeah, so here's the thing.
Also has a 60-watt power supply
or something like that.
Yeah, you need that.
I was doing serious engineering.
I was actually building a $10,000 PCB board, by the way.
$10,000.
More than two layers?
Like 18 or something.
It was crazy.
We had lots of fun stuff.
Not going to mention more than that.
But it was an amazing system.
We were really pushing the limit.
I was like, this is an incredible system for what we're trying to do. Self-driving truck brain? Yes, absolutely. But the challenge is when you have a system that costs that much, this means your final sale price for your robot or whatever you're going to need to sell a $10,000 system at minimum
to make some cash back.
It's really hard to kind of make those margins
make sense if you're not selling it
in a high-priced system.
Yeah.
And the thing is,
I mean, there's a lot of other costs
that go with that.
Like, you know,
if you're building a system
with something that powerful,
power becomes a big issue,
especially if you're on batteries.
And, you know, weight and size or and they do come in modules that are you can get smaller carriers for but it's not
the same as building a custom pcb with a cortex m55 on it or something if you can get away with
that but oh yeah no it's this is so i i've actually heard from some of our suppliers that NVIDIA's position is that they're focused on really the money for them, and that's in the cloud.
The rise of what ARM is doing with TinyML and all these other processors, it's really going to be the future.
There's an E Times article where the previous CEO of Movidius, he now works at ST running all of their microcontrollers.
And his position is that there's a wave of tiny ML coming.
And it's basically from microcontrollers becoming super, super powerful.
Like when you're, this is why I'm going full-time on OpenMV,
because I see this wave happening where, you know,
what does it mean when your MCU can now process 1080p video
and cost $4 and has instant on capabilities?
It draws less power.
It produces less heat.
It doesn't need SDRAM or eMMC.
So the bill of materials is like $10 off
from what you'd pay for a Linux-based system.
And it's also less physical size
because you're now down to one chip versus three.
So you've got four wins.
How do you compete against that?
Again, also, it can go into low power on demand
and wake up instantly.
And this is that future
where these things are becoming really, really powerful.
And what they need is a software library.
And so that's what we're focused on is really building out that algorithm base.
So instead of you having to sit down and say, how do I write efficient SIM decode that makes this algorithm go super fast?
It's already built.
And you can just use an OpenMV cam to do what you want to do.
Okay.
Talking about what I want to do.
Cool, let's go.
Okay, so I have an application idea.
It's a terrible idea, but I want to try it anyway.
And mostly this is an exercise in how would I actually use OpenMV
to accomplish my goals and possibly to make my own product,
and where do I make the decisions.
Okay.
Okay.
So we say I want to find and identify wasps.
I have a big book of wasps.
A listener, Brent, noted that his spouse wrote a big book of wasps
after I said I liked bees.
And it's very comprehensive.
I have many, many pictures.
My desktop wasp ID in TensorFlow works fine.
Now what I want to do is I want to mount it on my roof,
and I want it to identify all the wasps in the forest.
In one direction or multi-directions?
All the directions.
Okay.
So you have a 360-degree wasp scanner.
Right.
All right. Okay. Okay. Question you have a 360-degree wasp scanner. Right. All right.
Okay. Okay. Question for you. How good of an image of a wasp do you have? Do you have, like, nice high-resolution images where you can see, like, the hair on a wasp?
Yeah. They have little back legs. They have little serrations. At least some wasps. I mean, there's so many wasps. But then they can use that
to wipe off the fungus that tries to attack them and take over their brains.
Oh, yeah, I've heard about that.
They definitely have high-quality images, not only the hairs, but the serration on the hairs.
Okay, so even before you get into OpenMV, I think this is the problem setup thing you have
to ask yourself, which is, how do you actually take an image that's that high quality of a wasp that's flying around in your backyard?
That's the first question.
Are we talking a DSLR image that's on top of your roof, just kind of pointing at wasps and then snapping really awesome pictures with a super great lens?
Is that what we're looking at?
No, no. then snapping like really awesome pictures with like a super great lens is that what we're looking at no no i want just to know i don't i want i don't want more pictures of wasps i want wasp
identification right but if you need a feature size that's very fine to identify one wasp with
another that informs how high resolution your camera has to be. Or how close the wasp has to be to the camera.
Yes, yeah.
Right.
Because if I can tell you how to find wasps...
How many pixels on wasp do you need to identify a wasp?
Yes, that's the best way to say it.
Thank you.
And so, yeah, this is a good concept.
So, and Chris said it really well,
but maybe we need to think about the visual here.
If a wasp is far away, it may only take up four pixels and we won't be able to see very much
about it because it's far away. Just because the camera resolution, if I had a higher resolution
camera, it would take up more pixels. Or if the wasp came closer, then it would take up more pixels. And so what was the phrase you used?
Pixels on WASP.
The pixels on the item of identification is really important.
Or P-O-W.
Pixels on WASP.
And so, yeah, that is a big choice for me is do I want higher resolution cameras or am I willing to accept things to be closer?
Well, I think it's actually both really because a lot of times more pixels in an image doesn't actually do anything for you.
Most cameras can't resolve optically a lot of the extra pixels. They just become noise.
So it's really about the quality of the extra pixels. They just become noise.
So it's really about the quality of the optics that you're dealing with.
Like, can they actually produce an image that's focused and sharp for every pixel?
Because you can shove an 8-megapixel or 12- or 43-megapixel camera in with a bad lens,
and you'll have no better image quality than if you actually just improve the lens itself.
I swear this is like talking to myself in a meeting three weeks ago.
Okay, I don't want to buy $10,000 cameras.
So then you're going to want to have some zoom action.
That's kind of what needs to happen.
I think if you want to identify WASP, you're going to need to do two things. You're going to need to have one camera that has a really nice quality lens that can do ranging,
where it can zoom in on the wasp, and then it can track it and follow it.
So I have one that identifies flying objects from my background,
and one camera that I say, go there, take a good picture.
And then I send that to my wasP identification as opposed to my motion identification.
Yeah, but now you need a gimbal.
Yep, you need a gimbal.
This is getting expensive, Alicia.
What about an array, a larger array of crappier cameras?
Yeah, like a WASP's eyeball, a compound eyeball for cameras. OpenMV
compound. I think that would work
too. You could do a bunch
of zoomed-in cameras
that would be like a detection field
where if a wasp flew in front of them,
you could see what's going
on.
You're going to need...
If you're not doing a gimbal, I would
say it's probably out of the spec of our system now.
But you probably need like an NVIDIA system on this.
But even then, it's still going to be challenging.
Because at the end of the day, I think the gimbal system is the most likely to happen.
But if you wanted to do something like you just had a bunch of cameras and you create like a detection field.
The challenge is each of them has like a different zoom and area they can see.
So then you'll need like multiple cameras, like at different focal lengths.
You'll need to have one that's wide angle and one that's, you know, more zoom and one that's more zoom and et cetera to kind of see every position and such.
So getting away, I think the gimbal is actually better because you've got a gimbal with like a zoom lens. That would probably do the best job.
What if?
Okay, so I like that, but I also don't want it to have a moving part.
So gimbals are probably out.
What if I didn't have them have multiple zooms?
What if I had a fixed zoom on all of them, but this allows me to look in lots of directions and have them
be slightly overlapping at their edges. You could do that. It's really hard to set up.
A better way would be, could you force the wasps to walk into something where they're going to be
in a fixed focal distance? Could you do a little hive or something where the wasps have to fly
through? Then they'll all be in the same area and about the same size.
And that really simplifies the problem at that point.
Yeah, I think a lot of ML problems, people haven't thought about the social engineering aspect of it.
The social engineering of the wasps.
Yeah.
And then he's going to tell me that I just need to have good lighting and to have them go one by one walking through like some sort of little wasp fashion show okay okay i don't i don't want to corral the wasps for one thing they sometimes
eat each other uh or or do weird mind control or just lay babies in each other so we don't want
that um we want the wasps to be free-flying.
But it sounds like because I don't have enough pixels on the wasps,
this won't
happen unless I can...
Unless you have more processing,
higher resolution, better optics.
But I really like the idea of having
a whole little
360,
eight cameras, and each one identifies a wasp the problem with the world
is it's large oh and it's so then there's pixels and they have to go on the world which is large
and then too many pixels well i think you probably could do it i mean if the what like you could do
like one of those 360 camera things i've seen people with an nvidia jetsons do that where where they have two cameras that are mounted back-to-back, and they're doing a 360 view.
But the challenge is the level of detail.
But the optical resolution.
Yeah.
The opticals are not as good, and the resolution is not enough, and your pixels on identified object are just too small.
Yeah.
So it could tell you that a wasp was flying around or like a thing
was flying around, but it couldn't actually tell you what version of that thing was. Okay. So two
large cameras in this instance with fisheye lenses is better than eight small ones because it just
changes your field of view. Yeah. You get like a 360 field of view, but then your challenge is how
close are you to the particular wasp? That's what's really going to matter. So like maybe
if they're within the distance of like a foot or like maybe three feet, you might be able to see
them if you have enough resolution on the cameras and then you could possibly do it.
But then, I mean, of course you want to put this on your roof though. And so like the wasp aren't
even going to get near it.
That's the challenge. So you need to get
that wash fashion show thing again
and have some bait to get them
to fly near you. That's like those bird feeders that identify
birds for you. They're bird feeders.
Well, yes. All I need
for wasps, really, is a tuna can.
Gross.
Won't a bird come in and eat that?
Or a cat.
Yes.
Or a raccoon, more likely.
This is going to be the greatest video set ever.
Okay, so let's say I go ahead and I have the little,
I have some area where I can cover,
and it's not exactly the Wasp fashion show, but they're, they're a foot
to three feet apart away from my camera. And I have, I don't know, let's say I don't want to
shell out for the huge cameras. I have like four open MVs and I've, I've pointed them and
I just want to do the best I can. Okay. What algorithms am I looking at here?
Yeah. Yeah. So there's a few different things you
can do if you want to work with this. Yeah, as you mentioned, does the lighting have to be good?
Yeah. So if you actually want to be able to take nice pictures in the dark or in the day, you're
going to need to have some good lighting on that. And then there's, of course, the problem that the
wasps are going to fly into the light. So what do you do there? There is
something you can do with thermal cameras to see them. That's like a really easy way to pick out
wasps during day or night because they're going to pick up, they're going to be visible in the
background. There's also something to do with event cameras. So we have some customers right
now. Yeah, I was reading about these. Please tell me. Yes. me yes oh yeah yeah um there's a company called prophecy for example they're making a event camera and um more or less these things run at
literally whatever fps you want if you want a thousand frames a second they can do that
and they literally just give you an image that is so for every pixel what they're doing is they
check to see if the charge go up or did the charge go down?
And then based on that, they produce a image. And so they can actually, they're kind of like HDR in
the sense that even if they're staring into the sun, they can still detect if a pixel increased
in charge or decreased in charge. And so it doesn't really matter what's going on in the
background or et cetera. They basically just give you a difference image of what kind of moved around and such.
And that actually creates these interesting kind of convex halls of things.
So you can really see like blobs moving very, very easily because of that.
It's not going to be useful though for identifying what the wasp is per se,
but it will tell you like there's a wasp though walking there.
But then you can easily overlay that with the regular colored image and you can tell what's going on there. Or you can do everything directly from the color image itself. It's just going to
be harder when it gets nighttime and you don't have lighting because then you'll need to somehow
boost that image quality to see still.
What was the name of the, I heard event?
Yeah, event cameras, I think.
Event cameras, yeah.
They basically do, it's like the architecture of the camera itself does the things you would do in software to do motion vectors.
I'm making stuff up, but.
Frame differencing.
Yeah, frame differencing, and then figuring out motion directly.
So it's just done in the camera, and it has such a high frame rate
that it can do that much better than, say,
doing that on a 60-frame-per-second camera in software.
Yeah, well, the benefit is that camera can sample
at one microsecond each pixel.
Yeah.
And so you can actually go beyond 1,000 FPS if you want.
Yeah, it was something crazy, yeah.
So it's technically a million FPS,
but you probably couldn't read out the data that quickly.
But it allows you to do really, really fast object tracking.
That's the best way to say it.
So this will allow you to actually find the wasp in the image that they're flying around and actually track them with such precision you know exactly where they are.
The trick is, though, then the color camera can't keep up with that. So now you're back to, you have convex hulls of wasps flying around, but at least you could see them in the daytime and nighttime. And here's the interesting thing, though. Assuming the wasps
are all about the same size, then if you just wanted to identify whether or not you had a bigger
wasp versus a smaller wasp, you could probably do that on board. Because you'd have this outline of them.
Would it be... How would I be able to tell a close
wasp from a far bird? Well, they're all in this
wasp thing, right? Where they're corralled.
Oh, we're back to corralling. Oh, right. We're back to corralling. Okay. Sorry.
I was thinking open sky. Well, you could probablyalling. Oh, right. We're back to corralling. Okay. Sorry. I was thinking open sky.
Well, you could probably do open sky, too.
Yeah.
You could train your model on shapes.
Well, yes, but with the event camera.
Yeah.
It gives you a shape.
It gives you a shape, but they're blobby shapes, aren't they?
Or are they pretty crisp?
I think it's the outline.
Oh, yeah.
I don't know.
It depends on the...
It's the outline.
This is what Prophecy is actually trying to sell on,
is that they believe that you don't actually need the full image, the full color.
They say you can do everything from outline.
And it's not wrong.
I remember back at my day job at Embark,
we actually did vehicle identification and stuff all based on the LIDAR scans from
objects. And LIDAR scans didn't contain anything but like, you know, if you hit the back of a
truck, you'd only have, you know, like a crescent shape, right, to see part of a vehicle. You
wouldn't see the entire shape of it. And so we actually had neural networks that ran on board
that identified what was a truck, what was a car, what was a motorcycle,
all based on just partial side scans of them. Okay. So this would be really awesome for tracking
paths and for identifying things without having to worry about light. And it's outlined. So
it's again, going to have some number of pixels on the WASP. I think they're pretty low resolution
right now, if I remember.
They're 320x240.
That's for the cheaper thing
that they're selling, but
they also have some 1280x720
cameras.
But don't ask about the price on that one,
because you can't afford it.
And we mentioned frame differencing, which is something that i think would
be really useful if i'm dealing with things that are flying around or moving quickly in ways that
i don't expect yeah that's that's simply where you just have one image in ram and the next image
comes in and you just subtract the two and boom by the way um on the ARM Cortex-M4 processors, there's an instruction that basically takes four bytes of a word
and another four bytes of another word,
subtracts every four bytes from each other,
and does absolute value on it,
and then adds them all together in one instruction.
So if you want to do frame differencing on the Cortex-M4,
we can do that very, very fast on the OpenMPK.
So super easy to get the
max FPS on a large resolution
thanks to features like that.
But
you still run into a challenge
that the camera itself
is going to have a limited frame rate and it'll
cause a lot of motion blur. And so you're really
going to want a global shutter imager at that point.
But then you run into
now the lighting needs to be improved to go faster the constant upselling yes so when when i saw on your website
frame differencing and i was thinking about how to track things i went straight to convolution
which is a far more expensive algorithmic process.
We actually had a customer for that.
It seems a lot more accurate than what you're talking about.
Yeah, no, it is.
So frame differencing is just one way.
I'll say this. We had a customer that I actually put some work into doing SIMD optimizations
for our morph algorithm, which lets you do custom convolutions on the OpenMVCAM. And we're capable of doing about 200 frames a second at 160 by 120.
And yes, we can do that. And we had to do this for this customer because they wanted to track
a single pixel of an object with background noise.
And so it turns out you can do something called a masked filter,
which is kind of like, it basically is a convolution
that suppresses all pixels that aren't just a single bright pixel.
And this allowed us to track a IR LED in the daytime. So imagine an IR LED in the daytime, the sun emits IR light.
Yes, yes.
Very, very hard, but we managed to do it. So we could see this object moving around.
So that is something you can use. It is, though, very specific, I would say.
It was a good use case for their algorithm, for their problem.
I don't know if it'll work, though, that well for WASP, since WASP might be more than one pixel.
Yeah, I wouldn't think it would be a single pixel.
I mean, that means I have no features and it might as well be a speck of dust.
What if you put IR emitting LEDs on wasps?
Yes.
Or those April tags.
That would be very useful.
They could just carry around little billboards.
That would be so much easier than what I'm talking about. We just April tag all the wasps and each one will have its own little number and reference on where it is in the world.
I will say this.
Someone did actually use the FOMO algorithm with a regular color camera to let count bees.
So that's definitely possible.
Their goal, though, wasn't to identify the difference between bees, though.
They just wanted to know where they're like objects of a similar size flying by in the image and so i think edge impulse had like a um a tutorial about this they had the
raspberry pi running with fomo and it was totally capable of checking like bee movements and and
seeing and counting the number of bees entering and exiting a hive excuse me while i Google bee FOMO.
Okay, so I guess I had questions here about LSTM ML algorithms and trying to track my wasps, but I feel like I'm on a totally wrong path here with my wasp identification project.
I think it's possible. You just have to solve the physics problems first. And this is unrelated to the compute. It's just first, you got to get an image that's high enough quality of these really,
really small things. And that's the challenge is that the wasps are so small. If you had like,
you know, if you're trying to track badgers running around in the fields or a groundhog, that'd be a lot easier.
I think this is true of a lot of machine learning problems, is that we get so excited about how computers can do so much and how machine learning empowers things and forget that, oh, physics, that thing.
Who cares about physics when we have machine learning?
Well, it's a continuum, right?
Because you can apply lots of compute to bad data sometimes,
like he was talking about the IR light.
But if you want to, you can make the computer think really hard and try to clean
up bad images sometimes. Or you can spend more on getting good data and do less work. Yes.
But you also have the problem that sometimes what you want to do
is so not really suitable for the hardware you're working on. Right.
Well, I mean, that is the case.
I think it's all about just the problem setup first.
I mean, this is something I see a lot of our customers and people wanting to do computer vision is folks just like,
they have an idea on what they want to do.
And you haven't taken the step of sitting down and like thinking,
okay, what does this look like exactly on what am I trying to answer?
And that's always really important for any one of these problems, and especially in vision, that you have to go through that setup of trying to do the work of actually engineering what is actually reasonable and what I'm trying to accomplish.
And it does involve that physics axe point. I think we've seen a lot of demos that show off really, really strong ML happening.
But even for back in Embark, our computer systems cost as much as a house, right?
So unlimited budget.
And even then, though, you had the best engineers working on this stuff, running the biggest algorithms with the biggest GPUs.
And it was still challenging.
And that was unlimited power, unlimited budget, unlimited compute, unlimited image resolution.
But you still had to actually make an ML algorithm perform and do a good job at segmenting these images well and locking on to what objects were being tracked.
Reliably.
Yeah.
And it's like drawing a bounding box that jitters all over the place
isn't really good for your self-driving truck, right? It's got to be like super locked, no jitter,
really, really high quality. And so labeling, figuring out what's bad data versus good data,
lighting situations, like it, you know, in the real use cases, even when you have,
you know, enough power to do anything, you still have to work really, really hard on getting good data in. As you are now full-time at OpenMV,
how much of your time is spent trying to help people with applications and convince them that
what they want to do isn't exactly suitable versus being able to say, oh yeah, I can help you with
that? It's about 50-50.
We have a lot of folks who will ask us random questions
and I don't want to waste their time
and I don't want my time to be wasted.
So I try to make sure we steer them in the right direction.
If they need a higher end system,
they should go forward to that.
I'm also driving though towards the image
and future I want to create.
And so right now it's a lot of engineering
work and developing trying to build out the company and build out what we're trying to do
uh truly uh last year was a lot of like pulling the company out of the ditch to be honest
um well embark was my sole focus i kind of went a wall on open mv kind of like 2021 to 2022 i
wasn't really on at the helm of the. Let's just say it like that.
So we were just doing,
you know, we were staying alive,
but we were out of stock
because of the chip shortage.
I hadn't foreseen how bad that was going to be.
I don't know if any of y'all
tried to buy STM32s ever.
But that was some unobtainium
for about three years right
so um that that really hurt um but luckily the end of last year we managed to do two things
one an order of about 5 000 stm32h7 chips finally arrived after waiting for two and a half years
so we managed to get back in stock finally. And then we also pivoted and supported
NXP's IMX RT. And so this gives us then two verticals. Now we're not dependent on just the
STM32, but now we have NXP also. And this allowed us to produce the new OpenMVCAM RT1060. And
because of learnings we had with our partnership with Arduino, we tried
to really include a lot of features that we saw customers really wanting on this system. So built
in Wi-Fi and Bluetooth. And I'm also proud of myself recently because we're going through FCC
certification and CE and other certifications for the product. And so far, it looks like it's going to pass. So we'll have a certified Wi-Fi and Bluetooth enabled product. But we also built
in things like battery charging and low power. One of the biggest features on board is being
able to drop down to 30 microamperes on demand and then wake up on a IOPin toggling. And so we had a lot of customers ask for such things
so that they can deploy this in low-power environments.
But we also added Ethernet support now.
So you can actually, we have a PoE shield
we're selling on our website,
and this allows it to connect and get online that way.
So this is a PoE-powered microcontroller
if you want to make that.
And we do have an RTSP video streamer, so if you want to stream
1080p JPEGs to
VLC or FFmpeg,
we've got demo code that shows it able
to do that. This is what your Raspberry Pi
was doing back in 2013, so
we're kind of at that level of performance
now. But even farther,
like Raspberry Pi 2,
not quite 3, but about
1 to 2 with our current system.
Are GSP streaming from a Cortex?
Yeah, I know. Crazy, right?
But yeah, no, totally legit.
We are sending Ethernet packets or Wi-Fi packets and streaming video.
Yeah, the future is coming.
I'm telling you.
Do I have to still use GStreamer, though?
Yes, you still have to use GStreamer.
He said FFmpeg, but he meant GStreamer. When can I stop using GStreamer? That's what I want to still use GStreamer, though? Yes, you still have to use GStreamer. He said FFmpeg, but he meant GStreamer.
When can I stop using GStreamer?
That's what I want to know.
At least on the device, you don't have to use it.
Yeah, right.
It's actually kind of funny,
because one of the first things I had to do at Embark
was I had to produce a driver interface camera for our trucks.
Basically, we wanted to know what the driver was
doing in the vehicle, right? Oh, right. Okay. Yeah. Yeah. So we had to have a camera that just
sat inside the cab and looked at people and would record video of the driver. And so I was like,
sure, this is easy. Went online, went to Amazon. There was like one company that said, hey, 4K HDR
webcam, $100. I'm like, cool, we buy it.
You have to go into their like GUI and figure out how to set it up to stream RTSP.
It's a little annoying.
There's like a mandatory password, you know, of course, mandatory password, which means that, you know, your techs assembling these are going to have to go through this like hour long process to get these things set up. And, you know, of course, it has to be on its own. Like, you have to have it on the public network first and then use their tool that, you know,
uses some identification thing before you can log into its GUI and then having to set it in its GUI, then you can set it to be on a static IP and then force it to stream.
So a lot of setup just to get these things working.
Then we deploy it in the truck.
And it turns out you hit a bump and the Ethernet would just drop.
Like the connection would just drop.
Like the connection would just disconnect, go down and then come back.
And my boss was like, hey, Kwab, you know, it's like gone for like a second or two every now and then.
I'm like, ah, interesting.
Huh.
It's like we could have an accident in that second or two.
That's a pretty big liability for the company.
We need a new webcam. So this ends up being a multi-luff-mong project of me trying out many different webcams,
the same annoying GUI setup, trying to get them to stream video. And finally, we settle on a $700
IP cam, not $100. From Axis, right? They're always from Axis.
No, Opticon.
Opticon makes webcams
that can literally survive explosions.
We didn't
buy the one that cost that much.
That was like a $2,000 one.
But we ended up buying the $700 one,
which is still very expensive compared to
the $100 cam.
But it didn't drop its connection. Rock solid.
And so I say that story to say that it's funny that I'm able to replicate
now what I spent all that time on, on a microcontroller.
Could I buy an OpenMV for
my truck so that I can watch people? I'm just wondering
should I get his product now that
he doesn't?
Does OpenMV drop
Ethernet packets?
No. I think he's going to say no.
No, it doesn't.
It doesn't. That's one of the nice things.
So that's the focus there.
No, it doesn't at all.
And even if it did, hey, you can at least go in the firmware and figure out why. Right. So that's the focus there. No, it doesn't at all.
And even if it did, hey, you can at least go in the firmware and figure out why.
That's the big thing.
It's kind of like with the previous system, it was kind of like, huh, we're going to have to go to each truck and physically remove these and put a new thing in.
Wish I could just do a firmware update or ask the manufacturer what's wrong.
But that was not possible. But anyway, we built in a lot of features
into this system just to make it
easier for customers to really build
things they want. And so that's
what we're excited about. But then moving
forward, there's
these new Cortex M55 processors
coming out. And so that's the
exciting thing. I actually want to ask Chris
about that. What do you think, Chris?
You've been playing around with one.
I've been playing around with one at a very high level.
So I haven't really explored the feature set
and I'm using vendor tools and things.
So, I mean, it seems fine.
It's very capable.
I mean, they're very high clock.
I mean, I'm used to Cortex-M3s and M4s. These are clocked, I think, I'm used to Cortex M3s and M4s. These are
clocked, I think, at 4600 or something
like that. Yeah.
So it's a bit unusual to be using something that's
not Linux, running
that fast, and putting
Zephyr or FreeR
ThreadX or whatever on it.
It feels like
overkill in
some ways. It is a very big hammer.
But the stuff we're doing with it, which I can't talk about,
needs a lot of processing.
So then data throughput and things like that.
Yeah.
Well, I'm excited about the new ones that are coming along.
Like 2025 is going to be an exciting year.
Just imagine doubling the clock speed of what you just mentioned.
Integrated hardware modules
you would only see on application processors,
so actual video encoding in hardware.
Yeah.
That kind of stuff,
like large-resolution camera support,
and then even more ML.
One of the coolest things is, you know,
we were talking about all this processor performance that's coming,
but that's not even the important bit.
The important bit is the ARM Ethos kind of processors.
These offer like hundreds of gigaflops of compute now
for these microcontrollers.
And so what that means is if you wanted to run a neural network on board
some of these chips,
they'll actually outperform the Raspberry Pi 5
and they won't draw any power either.
Like something like the Aleph,
that has a 200 gigaflop neural network accelerator.
And so if you ran the Raspberry Pi 5
at 100%, every single core pegged to the limit,
you get 100 gigaflops of performance.
And it would, like, catch fire.
And so one of these MCUs will draw 20 milliamps, 20 to 30 milliamps of power of 200 gigaflops, thanks to onboard neural network accelerators. Do you really think it's machine learning that's important here, or do you think these other features like outline detection and hog algorithms and convolution, do you think it's the features?
Well, that's stuff that's not, as far as I know, the neural engines aren't applicable for.
Well, no, these would feed into that.
Yeah, yeah, yeah.
I just, how much time should we be doing straight ML on raw camera frames versus giving them
some hints?
Doing some actual image processing.
Algorithms and heuristics and all of that.
Yeah, it's actually both.
So, I mean, definitely you want to use ML as much as you can.
Transformer networks being one of the new things
that people are really excited about.
Those require a little bit more RAM, though.
Like a lot of these network accelerators,
they're really good at adding that, you know,
we needed more compute.
And so these offer literally 100x compute
of where things were previously.
And so now you can actually run these large networks.
But then you need more RAM if you actually want to run
the new transformer models,
which are dynamically creating weights on the fly, more or less.
But you still need to do a pre-processing for these things.
Like an example being, if you want to do audio,
you don't want to feed the network the raw PCM audio 16-bit samples. You want to take an FFT first and then
feed it slices of an FFT that are overlapping each other. And so that's where the processing
performance with the Cortex-M55 comes in and having that extra oomph there. That allows you
to churn out these FFTs and to generate those slices so that you can feed the neural network processor something that's going to be directly usable for it.
And being able to compose these two things together is what really brings the awesome performance and power that you're going to see.
Similarly, for video data, you has to happen before you feed it to the accelerator, since your resolution is going to need to be at some limit, like maybe 200 or 300 pixels by 300 pixels.
It's not going to be the full 1080p, though, of what the camera sees.
And transformer. I missed transformers. I've been out of machine learning for a couple of years, which means that I'm 5 million years out of date.
I think it's what the LLMs use.
It's less training in no recurrent units, even though it seems like it's sort of a recurrent
neural architecture.
So not as much feedback.
No feedback?
It does have feedback kind of internally, I guess, though.
I'm not an expert on this stuff either.
I'll say it I guess, though. I'm not an expert on this stuff either. I'll say it like this, though.
From what I've learned so far,
more or less,
they look at the data inputs coming in
and then dynamically adjust their weights
based on what they see.
So the network isn't static.
It is dynamically adjusting what it's doing
based on what it's seeing.
And it can remember that through a stream of tokens that are coming in.
So whatever is being sent to it is tokenized.
And that stream of tokens is used to dynamically update relationships
between those tokens while it's running.
So it doesn't necessarily have memory inside of it,
but the memory comes from the tokens that are going to it.
The T in chat GPT is transformer.
And they're also used for translation,
language translation and things like that.
The big thing is that they're just like a bulldozer.
Pretty much every problem people have been trying to solve
is now solved instantly by them.
Okay.
So there were LSTMs and then LLMs and Transformers are after that.
And I need to read up some.
Okay.
Cool, cool, cool.
Me too.
Well, what do you wish you had told yourself when you were on the show last in 2017 on episode?
219. Was it 219 2 2 12 darn uh which i believe at that point we were in sea world you are in sea world i think is the title of that show
what i wish i had told myself what do you wish you could or what do you think we should have
told you about starting a business or Or what would have been good information?
I don't know.
It's been like a really weird random walk, just kind of doing OpenMV.
I would say definitely it was a good idea not to go full time too early on it.
There's definitely like a window of opportunity when you're trying to run a business.
Now I see that opening with these new faster processors coming down the line where we can really do some amazing things with the concept that OpenMV has.
If you tried to do this earlier, I think it would have been just kind of like pain and suffering to the max, especially when the chip shortage happened.
So I'm glad I didn't go full throttle on it originally.
Honestly, I think the performance thing is the biggest.
It's just, if I had known about this beforehand,
maybe it would have created less random code
and really focused more on things that had a lot of value.
Kind of like, I think at the beginning when we were doing OpenMV,
we were just trying to write as much stuff as possible
and throwing things at the wall to see what sticks.
And now it's a little bit more focused on actually providing good features that people
really, really want and making those work really well. And so like putting more
time into one thing versus trying to spread it all over the place.
And Gwabana, do you have any thoughts you'd like to leave us with?
So I want to ask you guys.
It's been seven years of running Embedded FM.
So what episode are we on now?
400.
470?
68?
This will be 477.
This will be 477.
Awesome.
Awesome.
So tell me about your experience with Embedded FM.
I want to know.
It's been good. I mean, I still meet interesting people and you've given us a lot to think about.
And you mentioned hearing Ralph Hempel be on and he gave us a lot to think about.
And I like that. But we've both talked openly about burnout and our disillusionment with some of the AI features that are happening.
Which is not necessarily disillusioned with ML writ large.
I actually work on ML stuff and I enjoy it.
But there's parts of, quote, AI that have been bothering me.
How it's being used, yes.
And so that's, I don't know.
I like doing the show because I like talking to people,
but we've gone to every other week, which has been really good.
I suspect we'll go to once a month at some point in the next year or two.
Really?
I don't know.
We haven't really talked about that.
What about you, Christopher?
You hate everything because your computer died this morning.
Well, computers have been bothering me since 20, since 1984. No, I don't know. You know,
this stuff objectively is very exciting to me. It's cool to see the capabilities,
these microcontrollers getting so much more power in a very short amount of time.
Since, I mean, it wasn't that long ago that, you know,
an Atmel AVR was the microcontroller of the day and some picks, right?
And now we're talking about close to gigahertz.
Megabytes of RAM.
A few bucks for something that's close to a gigahertz.
There's literally a gigahertz processor if you want to.
They have one.
But I also feel like, well, I mean, maybe it's time for me to let other people do that stuff.
Because I miss the small processors.
I mean, on the one hand, it's extremely exciting and it's a cool way you can do.
But I feel like, yeah, but 128K of RAM is kind of fun to try to make something happen in.
I don't get to optimize nearly as much anymore.
It's a lot of trying to figure out what vendors are doing and putting together their Lego blocks so they work.
And then optimizing little pieces of it.
But I never get to sit down and think, oh, here's a new algorithm.
How can I make it go as fast as possible?
And how can I learn the chip deeply enough to find the instructions?
Like, I remember you talked about SIMD, and I remember working on a TIDSP, oh, probably 2001, maybe 2002.
And it had some neat caching systems, but it was all pipelined and you had to do everything manually.
And so I wrote a program in C and optimized the assembly by modifying my C so it would use the caches the way I wanted them to.
And it meant really understanding what was happening with the processor and the RAM and what the algorithm, not what the client told me the algorithm was
supposed to do, but what they actually wanted it to do. And I liked that piece. And I kind of miss
the deep optimizations that I haven't gotten to do lately. But that's partially client selection.
Yeah. And as much as I complain every week almost about, I don't want to write another
linked list or I don't want to write another spy driver for something.
On the other hand, I've been using Zephyr
lately, and most
of the coding has been, you know, editing
config files. It's like, oh, okay,
I need a driver for this.
Well, it has that. It's
buried in a directory somewhere, and you just have to edit the right
DTS file, and then suddenly, and you don't even have to write
a program. You just pull
up the shell and make sure that the thing works by doing, you know, sensor get thing and it automatically just works. Anyway, I mean, it does feel like things are getting easier, which is good. But it's a little bit of a shock for people who've been working in this industry for a very long time, because it's a change. I mean, I loved learning MicroPython
and developing C modules for it.
It was amazing.
But I didn't...
It's different work.
It's very different work than needing to optimize things.
And it's probably good.
Embedded shouldn't be the way it has been.
Yes.
And everything's going in the right direction.
But we existed in the right direction but we existed in
the wrong direction for so long that we convinced ourselves it was fun yes we convinced ourselves
it was fun so now it's hard to change yeah well this is what i got started on so my first processor
was the basic stamp oh yeah yeah yeah okay i had of those. And then I graduated to the propeller chip.
Right. Lots of course.
And I loved writing stuff. Yeah, no, it was so cool.
Yeah. No, I actually did a whole bunch of drivers in pure assembly
where I would... I think the most proud one I was of is...
Okay, I did a pixel driver.
It could drive 640 by 480 VGA with one core running at 20 megahertz.
And what it would do is it would have a character.
So the propeller chip had a character map of like 32 kilobytes of ROM
mass that had like characters.
Okay.
Yeah.
The characters were like 16 by 32, well, 32 by 16 or something.
And so the way the image about the frame buffer worked the frame
buffer just encoded a single byte that told you where you know what what uh character to look up
right and that meant the frame buffer was way smaller than encoding every pixel as three bytes
or so right so and because you only had 32 kilobytes of ram to play with so you you didn't have much
memory at all and so that was the way the frame buffer was worked you you'd have a byte that told
you what character to present and then the system would in assembly um you had to feed a video output
system that was like a shift register that had a certain frequency so you had to your loop had to
hit a certain frequency.
Right, right.
There was literally, it was like the difference between one assembly instruction, too much, it broke.
One, two less, it worked.
And so you were just staring, how do I get rid of one assembly instruction to make this function?
And here's the thing I did, though, that was so awesome. So during the vertical sync time, I would download the pixel maps on where... So you have a mouse cursor, right? I want to do the mouse cursor on them, and then re-uploaded them back to the main memory and put them somewhere where the system could seamlessly swap out the character glyphs of the actual character to the ones that I had blitted a mouse cursor on. And so it looked like there was a mouse cursor being overlaid on the image, but the actual
frame buffer didn't include any mouse cursor in it. I would just look up the XY position of a
mouse cursor and present that. And it was the coolest thing because I was actually able to
make a GUI with text boxes where you can move a mouse cursor and click a button and actually do this.
And it was 32 kilobytes of RAM.
This is the kind of stuff I'm talking about.
Thinking about it, I miss having to do the math analytical puzzle where now I spend a lot more time reading and digesting information, which is also good and interesting and useful, but I prefer the information about wasps.
And so I guess maybe I was never that interested in computers.
No, I am, but I really do miss the sitting down and thinking about the analytics.
The challenge of there's limits.
The limits.
And I have no limits now.
I think removing the limits
makes it less interesting for us in some ways.
Yeah, it's all about the puzzle.
Like what really made me super happy
with the SIMD optimization on OpenMV
is just like,
it's just like when you unlock that 200,
okay, like the most recent one I mentioned
of a road in Dialate,
we've had the same code there sitting forever.
And so this year, I was just like, I need to optimize stuff.
We actually scored a contract of ARM,
where ARM has recognized what we're doing is so interesting
that we're actually getting paid to optimize some of our code
and produce benchmarks to show, hey, you can do serious things with these MCUs.
These are real algorithms that people actually use being optimized
and not just the seamless DSP library,
which sometimes is lower performant than what you could write yourself.
Definitely.
Yeah, that's the thing.
But it's useful.
These things move so fast, and I said it already in the show,
but if things move slower,
people would squeeze out all the performance that they could out of these.
And now you don't have to as much
because there's always going to be something faster, right?
I don't know. I don't even know what my point is.
But what you're talking about is like, ARM didn't bother
to squeeze out all the performance of their own things in CMSIS.
They didn't have to.
Yeah. Well, they should.
It's their thing.
Well, I think it's just they didn't have the market response,
right? I think that's the new thing
happening now, though, is now that people are going to be able to see that, oh, hey, you could actually replace a more complex system with one of these things.
Now there's actually some juice behind, oh, maybe we should actually try to see what we could do with these.
But back to the algorithms I was mentioning, I got a 150% speedup on it using some of the Cortex-MD. 150%... Sorry,
more than that. No, no, yeah, 150%. That's two and a half times performance. Two and a half times
speedup. I mean, you go from, oh man, it's kind of slow to, wow. Being able to pull out two and a
half times and 4x speedups on things, it's, I mean, you know, what do we get for processors nowadays?
It's like a 7% speedup and
people are like amazed by that.
And it's like, you know, when
you're at the 150% level,
it's like that's a whole different chip at that point.
Anyway.
Well, we should go. It is
time for us to eat.
And if we start talking
again, we will just start talking again for quite to eat. And if we start talking again, we will just start
talking again for quite a while.
We'll have you back sooner next time.
Yes, I think that's the key.
Absolutely.
Absolutely. I super enjoyed this. It's awesome
talking to folks who've been in embedded systems
for a long time and equally
enjoy kind of making the hard
things happen and solving these
puzzles. Our guest has been Kwabena Ajemen, President, CEO, and co-founder of OpenMV.
Thanks, Kwabena.
Thank you.
Thank you to Christopher for producing and co-hosting.
Thank you to our Patreon listener Slack group for your questions.
Sorry to John and Tom, whose questions I did not get to.
And of course,
thank you for listening. You can always contact us at show at embedded.fm or hit the contact link on Embedded FM. And now a quote to leave you with. I have a nice quote that I actually like to read.
So the founder of Luxonis this is my friend Brandon he
passed away recently
but he has one
from Theodore Roosevelt
that I absolutely love it's a little bit long
but I'd like to say it
it is not the critic who counts
not the man who points out how the strong man
stumbles or the doer of deeds
could have done them better the credit belongs to the man who is actually in the arena,
who's faced his mart of dust and sweat and blood,
who strives valiantly, who errs and comes short again and again,
because there is no effort without error and shortcoming,
but who does actually strive to do the deeds,
who knows great enthusiasm, the great devotions,
who spends himself in a worthy cause,
who at the best knows in the end triumph of high achievement,
and who at the worst, if he fails, at least he fails daring greatly,
so that his place shall never be with those cold and timid souls
who neither know victory nor defeat.
And that's how it feels when you optimize C code.