Embedded - 477: One Thousand New Instructions

Starting point is 00:00:00 Before we get started, I want to let you know that I'm giving a free talk for O'Reilly on an introduction to embedded systems. If you or one of your colleagues or managers are interested, it is Thursday, May 23rd, 2024 at 9 a.m. Pacific. It will be recorded if you miss it. But if you go there live, you can ask questions. There will be a sign-up link in the show notes. Welcome to Embedded. I am Eliseo White alongside Christopher White. Our guest this week is Kwabena Ajiman, and we're going to talk about my plans to create a WASP identifying camera system.

Starting point is 00:00:49 Hi, Kwabena. Thanks for coming back after being on the show already and knowing what we're all about. Yeah, it's been seven long years, but, you know, I thought it was time. Could you tell us about yourself as if we had never met. Gotcha. Gotcha. Well, nice to meet you. I'm Kwabena. I run a company called OpenMV. We do computer vision on microcontrollers. Seven years ago, we were one of the first companies thinking about this kind of idea. Back then, we were deploying computer vision algorithms on a Cortex-M4, and we had just upgraded to the Cortex-M7, which was kind of the hottest thing that was coming out back in 2017 timeframe. Since then, we moved on to a Cortex, sorry, the STM32H7, which is a higher performance processor, and now the IMX-RT, which is even

Starting point is 00:01:48 more performance. And so I am excited about the future and plan to talk to you all today about how these things are going to get even faster. That seems unlikely, but okay. It seems unlikely. I heard Moore's Law was over. Well, not for microcontrollers. For microcontrollers, it's still happening, but the gains aren't like 2%. It's like 400% each generation, actually. Microcontrollers are still on like, you know, 50 nanometer process.

Starting point is 00:02:16 So we have a long way to go. Oh, not even that. Not even that. The latest ones are coming out at 12 to 16 nanometer for MCU. Oh, okay. So it has moved down it has moved down yeah yeah but that used to be like top of the line processors and now it's coming it's coming to you for for nothing all right so clearly we have a lot to talk about including wasps yes um but

Starting point is 00:02:37 first we have a lightning round are you ready okay yeah hardware or software? Both. Python or C? C. Marketing or engineering? Both. Cameras or machine learning? Cameras. AI or ML? ML.

Starting point is 00:02:57 Favorite vision algorithm? April tags. Okay, we're going to break the rules. What is April tags? What is April tags? So What is April Tags? So you ever seen those QR code-like things that the Boston Dynamics robots look at to figure out what's a fridge and what's a door and such? And so they put them all around. That's called an April Tag.

Starting point is 00:03:16 It's like a QR code, but easier to read. And it also tells you your translation and rotation away from it. So if you see the code, you can tell, given where you are, how it's rotated in 3D space and translated in 3D space. Oh, that's very cool. It's like little fiducial markers for the world. Yeah, each one of them encodes just the number, but then you just know, okay, number zero means coffee machine,

Starting point is 00:03:44 number one means door frame or something. And that's how they get robots to, like, navigate around without actually fully understanding the environment. Oh. Complete one project or start a dozen? One. Favorite fictional robot? I like WALL-E. I love the movie.

Starting point is 00:04:01 You have a tip everyone should know. Tip everyone should know. You should learn SIMD performance optimization. We're going to talk about that today. It's something that blew my mind, and I think everyone should really think about it more often. You can double or triple the speed of the process you're working on very easily if you just put a little work in. Okay, what is SIMD? Single instruction, multiple data.

Starting point is 00:04:25 Okay, that sounds interesting for machine learning things, but can I actually use it? Yes, you can. The way to say it would be, so back seven years ago when we were working, when I was doing OpenMV, I thought it was a hotshot programmer. And I wrote vision algorithms to run

Starting point is 00:04:47 on these microcontrollers. And I kind of wrote them, you know, straightforward. I just wrote them in a way that be the textbook answer. And just assumed, okay, that's the performance that it runs at. That's the speed that it runs at. Good enough. Let me move on. And wait, these algorithms, they're like FFT and convolution and like what? Yeah, kind of stuff like that. The best one example would be something called a median filter. Sure. So the median filter is basically take a single pixel and then look at the pixels around it, the neighborhood.

Starting point is 00:05:19 So let's say you look to the left, right, up, down. So all eight directions. So there's eight pixels around it for a 3x3 median filter. And then you just sort those, and you take the middle number, and then you replace the pixel there with that. The median filter has a nice effect on that. It blurs the image. It kind of gets rid of sharp, jaggy things, but it keeps lines.

Starting point is 00:05:41 So strong lines in the image still remain. They don't get blurred. But then it blurs the areas that don't have keeps lines. So like strong lines in the image still remain, they don't get blurred, but then it blurs the areas that don't have strong lines. And so it produces a really nice, beautiful effect. It's a nonlinear filter though. So the mathematics to make it happen are a little bit tricky. But yeah, that's the median filter. And so when I wrote this originally on the OpenMV cam and we had it running seven years ago, that ran at one frame a second. And I was like, yeah, that's how fast these things are. Can't do any better.

Starting point is 00:06:11 I know what I'm doing. And then I hired a performance optimizer, Larry Bank. I don't know if he's been on the show, but this guy blew my mind. I asked him, hey, can you make this better? And he got it a thousand percent performance increase on me. One thousand percent. So that's about 16x. That algorithm went from going from one frame a second to 16. And I was blown away. When someone kind of does that and is able to beat you by that badly, it's kind of like, you know, you have to wake up and start thinking, what am I leaving on the table?

Starting point is 00:06:48 And he just did two things to make the algorithm go faster. One, I was doing boundary checks to make sure I wasn't running off the edge of the image every single pixel. Oh, no. Yeah. Yeah. So he just made a loop. He made two loops. One that checks to see, are you near the edge, that it does boundary checks.

Starting point is 00:07:07 And one that if you're not near the edge, it doesn't do boundary checks. Massive performance gain. Second loop. The second change he made was to do something called a histogram kind of. So when you're doing a sorted list of numbers, let me back up. How do I explain this? The median requires a sorted list. Yes.

Starting point is 00:07:25 Because you need to know which is in the middle. So you need to know which is higher and which is lower. And so you end up, even if at three by three, you still have to order all three numbers. Yeah. So instead of doing that, what we did is you can kind of maintain this thing called a histogram. So a bunch of bins, right? And what you do is when you're at the edge of the image, you initialize the histogram with all the pixels in that three by three. And then you walk the histogram really quick to figure out where the middle pixel is. So you just start at the beginning and kind of walk it and do this thing called the CDF, where you kind of sum up

Starting point is 00:08:01 the bins until you see which one is bigger than 50% of how large it could be. And that tells you the middle pixel. And that can be done pretty quickly. You still have to do this every pixel, but it's a nice, fast, linear for loop. So processors can execute that really quick. But the big change he did was instead of initializing the histogram every pixel, you just drop a column and add a column yeah and so what this does is it separates even if like your kernel becomes 11 by 11 or something like that you just drop a column at a column and so you're not doing the work of reinitializing the histogram every pixel and that's where the 16x performance came in by just doing that little change and going from o uh o squared i think o of

Starting point is 00:08:46 n squared to uh a two n kind of o of two n um just massive difference in performance and on these mcus that matters uh on a desktop you can kind of code this stuff with the uh the minimal effort approach i originally took and it won't and what doesn't doesn't cost you anything, but on an MCU, putting that effort in to actually do this well, it really is a huge game changer. But then here's where the SIMD comes in. It turns out you can actually compute two columns or up to four columns of that histogram at the same time. Because on a Cortex-M4, there is an instruction that allows you to grab a single long and basically add that, split it into four bytes at a time, and add those four bytes as accumulators to each other.

Starting point is 00:09:37 So it'll do four additions in parallel and not have them overflow into each other. Ooh, okay. Yeah, yeah. It's existed there since the Cortex-M4, so I swear it's in every single processor that has been shipped for a decade now. It's just been sitting there and no one uses this stuff. But if you break out the manual, it's available.

Starting point is 00:09:56 So I've known about SIMD instructions on various processors for a while. This sounds like a really standard one for ARM CoreTech. I know ARM has another whole set of things called Neon, which is, I think, is this part of Neon, or is Neon a bigger set of even more SMD things? No, Neon's a bigger set. That only runs on their application processors. Oh, okay, okay, right. So on their MCUs, they have this very,

Starting point is 00:10:22 technically these instructions are also available on their desktop CPUs also. They're not utilized heavily. What ARM did is they wanted to make DSP a little bit more accessible, well, faster. And so there's a lot of stuff in the ARM architecture that allows you to do something called double pumping, where basically you can split the 32 bits into 16 bits. And they have something called SAD16,

Starting point is 00:10:46 so you can add the two 16 bits at the same time or subtract. There's an instruction that'll take two registers, two 32-bit registers, split them into 16 bits, and then multiply the bottom 16 bits by the bottom 16 bits of one register and the top 16 bits by the top 16 bits of the other register, and then add them together. And then also do another add from a accumulator. So you'll get two multiplies and two adds in the same clock cycle. Do I have to write an assembly to do this? Or are there C compilers that are smart? Or are there Cortex libraries that I should be using? GCC just has intrinsics. So it's just like a function call where you just pass it a 32-bit number. And then it just gets compiled down to a

Starting point is 00:11:26 single assembly instruction, basically. So you're kind of able to write normal C code, and then when you get to the reactor core of a loop, like the innermost loop of something, you can just kind of sprinkle these in there and get that massive performance. One of them that's really valuable is something called USAT ASR. So did you know that whenever you need to do like clamping, like the min and max comparisons for a value, ARM has an assembly instruction that'll do that for you in one clock. But you shouldn't go out and use this unless that is part of your loop. You shouldn't do the min-max during your initialization of your processor

Starting point is 00:12:03 with this fancy instruction. No, it's pointless. You should limit these to optimizations of things that you need to run faster. Yeah. Not optimizations because they're fun. But the sigh was because I wished I'd known about this for several projects in the past. Well, it's just kind of like, here's an example. We actually have something I wrote recently,

Starting point is 00:12:29 which I'm really proud of, is we actually have a bilinear, sorry, a nearest neighbor, a bilinear, and a bicubic image scaler in our code base now. And I scoured the internet looking for someone who had written a free version of this that was actually performant, and none such existed. So we're like the only company that actually has bothered to create this for a microcontroller. And, you know, it actually

Starting point is 00:12:53 does allow you to scale an image at, you know, up or down at any resolution using bicubic image scaling and bilinear. So bicubic basically can take like a really pixelated image and then produce like these nice colorful regions. Like it'll do, you know, it really blends things well. It looks beautiful when you use it. And to do that though, it's like doing a crazy amount of math per pixel. And so being able to use the SIMD instructions when you're doing the blend operation for upscaling these things, it made a huge amount of difference. Like, if you don't do this, the code runs 2 to 4x slower. I've always known that graphics code is an area where there are a lot of optimizations that are non-obvious.

Starting point is 00:13:36 I mean, there's the Stanford graphics page about optimization hacks. Have you seen that? Maybe. I've seen like the bit, if you've ever like had fun searching around stuff, there's always like the bit magic things where you know if you've ever seen, what is it Doom

Starting point is 00:13:57 the original Doom, there was one thing which was like the inverse square. The magic number yeah. Well and there's Hackers Delight that's similar. Yeah magic number, yeah. Well, and there's hackers to light that's similar. Yeah, hackers to light. So, is this, are these algorithms optimizable because they're graphics? No, I mean, the instruction he was talking about with breaking up the 32-bit into 16 and doing that, I mean, it sounds like matrix stuff.

Starting point is 00:14:26 That'd be easily applicable to matrix stuff or FIR filters, I would assume. Yeah, no, I think it's meant for FIR filters. And also, it's not FFT so much, maybe if you were doing Fixpoint. But definitely for audio, though. Like if you wanted to mix two audio channels together, for example, it would be probably good for that.

Starting point is 00:14:45 You could set a gain on each channel, right? And then it would automatically mix them as long as you were doing both in two 16-bit samples at the same time. So the cool thing is you can set all this up. Your DMA system to receive audio could be producing audio chunks in a format that actually is applicable to the processor than churning through it this way. One of the tricks you have to know what to do when you're trying to do this stuff is the data does have to be set up to feed well into these instructions. You can't actually utilize them

Starting point is 00:15:15 if you have to reformat the data constantly because then your speed gain will be lost on data movement, more or less. You just said fixed point. Of course you're doing fixed point, aren't you? Yeah, it's all in fixed point. Oh, okay. Yeah, but no, the reason why

Starting point is 00:15:32 this is so cool is that as I got into this, when I first was doing the microcontroller stuff, it was just kind of, hey, we're just having fun here, just trying out cool things. Is this going to go anywhere? Not really sure. Performance isn't there, speed isn't there, usability isn't there.

Starting point is 00:15:50 And honestly, after I met Larry and we started actually making things go faster, it started to dawn upon me that, well, hey, if you're getting a thousand percent speed up here, and then this is before we added the SIMD part. Once you add that again, you can get even another 2x speed up here and then i'm this is before we added the simd part once you add that again um you can get even another 2x speed up on top of that so 2000 percent speed up you know basically you're going from you know one frame a second to now you're at 30 at uh 320 by 240 right that's on a microcontroller um now let's say the microcontroller's clock speed doubles and now being not now, instead of being at

Starting point is 00:16:27 400 megahertz, you're at 800 megahertz. And then there's this new thing called ARM Helium that's coming out. And ARM Helium offers an additional 4 to 8x speedup on all algorithms. And this is for the new Cortex-M55 MCUs that are coming. And ARM Helium is actually closer to ARM Neon, so it's not a very limited DSP set, but it's actually like a thousand new instructions that will allow you to do up to 8 or 16 elements at a time math. And it also works in Floating Point too. You can do doubles, floats, four floats at a time math. And it also works in floating point too. You can do doubles, floats, four floats at a time,

Starting point is 00:17:10 even does 16-bit floating point too. I love these RISC CPUs with thousands of instructions. The R is just there to make the... I mean, if they were just called ISC, everybody was like, what does that even mean? But RISK sounds cool. So yeah, the helium stuff was what I was actually thinking of when I said neon that I had read a little bit about because we're using

Starting point is 00:17:32 M55s on a project. Have you actually gotten into using some of the new stuff yet? No, I was just reading through data sheets and saying, oh, that looks cool. I don't know what to do with it yet. But yeah, that's amazing. It reminds't know what to do with it yet. But yeah, no, that's amazing. It reminds me,

Starting point is 00:17:47 I've always felt like we're leaving performance on the table because we tend to, these days things move so quickly, we tend to just wait for the next CPU if we can't do something fast enough. Whereas I'm going to, you know, back in the old days,

Starting point is 00:18:01 you know, people were doing amazing things because they only had a 6502 or an 8080 or something. And, well, I need this to, there's no other computer. I need this to go as fast as it can. So I'm going to hand optimize assembly. And I'm not suggesting people hand optimize assembly, but looking for the optimizations that the vendors have already provided and that people just don't know about, I think, is something people miss. Oh, yeah. But it's actually really, really huge, though. It's much bigger than you think. Okay, so let me say it like this. We actually got one algorithm called erode and dilate.

Starting point is 00:18:37 I have that running at, it's able to hit 60 FPS now on our latest gen system, the OpenMVCAM RT1062. And it runs at 60 FPS at VGA. So when we get to a helium-based processor system for our next generation-based OpenMVCAMs, we're going to be able to 4x that number. So now you're talking 1280 by 960 at 60 FPS, right? And then that's like a 1.3 megapixel image. Two megapixel is 1080p, right? So if we go to 30 FPS, now we're talking we're able to run 1080p image processing on a $4 microcontroller.

Starting point is 00:19:17 That's really insane. Now you're running into, do I have enough RAM for this? Well, that's the thing. Microcontrollers are also coming out. There's the Aleph Ensemble, for example. The thing has 10 megabytes of RAM on chip. Does it now?

Starting point is 00:19:32 10 megabytes of RAM on chip. I'm sorry. I'm using that right now, so I have to keep my mouth shut. But this is the thing, though. We're kind of going to be crossing this chasm where these MCUs are really going to be able to do things that previously needed a Linux application processor, an OpenCV, to do. Like, once you get to 1080p, that's good enough for most people.

Starting point is 00:20:00 They don't really care to even have more resolution. I mean, even 1280. I mean, the Nintendo Switch, it's sold quite well. And that's a 1280 by 720 system. Like, you don't need to go to 4K or 8K to have something that will be in the market. Unless you talk to my other client. Pushing all of Christopher's buttons. It's great. What if we had four or six 4K cameras?

Starting point is 00:20:29 Anyway. Okay. Yeah, pushing the pixels. But OpenMV, if I remember, works with MicroPython, which is really cool, and I loved working with MicroPython,

Starting point is 00:20:44 and yet it didn't... It wasn't the fastest loved working with MicroPython, and yet it wasn't the fastest. Well, the thing is that we don't write the algorithms in Python. Python is just a layer to assemble things together. So we actually have the algorithms, they're written in C, and then we use the SIMD intrinsics, and we try to write things so that they're fast in C. And what MicroPython is really just doing is being an orchestration layer to tie everything together. And it really, really helps the use case because what we're trying to do is to pull in all the non-embedded engineers

Starting point is 00:21:16 to give embedded systems a try. So if you say like, hey, you need to learn how to use all these crazy tools, get the JTAG out. By the way, you got to buy that and it's going to be $1,000 just to program it. And here's a crazy make file build system and all this other stuff. It's really going to run into a lot of brick walls for most people. But when you get someone who's worked on the desktop, they're used to Python scripts, and you say, here's the library, here's the API. You can basically write normal Python language and just look at the API for what you're allowed to call. It's a much easier transition for

Starting point is 00:21:50 folks. And it's part of the key on why our product's been successful. It's also been nice in that a lot of middleware Python libraries now can run on the system. What we recently, so we have a system called ULab that actually runs on board. ULab gives you a NumPy-like programming interface. So if you want to do NumPy-like operations, you can actually do that. Yeah, yeah. And so it supports stuff like

Starting point is 00:22:17 matrix multiplies. They're adding singular vector decomposition right now. And so you have that on board. So you can actually write a lot of the standard matrix math you would have on a desktop and just port it right over. Additionally, we also have a sockets library and a system for Bluetooth. And so the sockets library allows you to write pretty much a desktop app that would normally control low-level sockets. And you can port that onto the OpenMV cam,

Starting point is 00:22:45 and now you can connect to the internet, do Python U-request, and do API calls, and so on and so forth. So it makes it really powerful, actually. So we had a listener, Tim, ask a question that seems really relevant at this point. Is this intended for genuine production usage or more for hobbyist prototype work?

Starting point is 00:23:07 The OpenMV homepage and docs have a heavy emphasis on MicroPython. Are there plans to provide a C or C++ API? Yeah, so we're never going to provide really a C API. You can take our firmware and actually just write C code directly. So what we found is once...

Starting point is 00:23:27 So a lot of customers who look at it and think, oh, yeah, this is one thing, and then don't give it a try. And then we have a lot of customers who say, this is amazing, and go and modify it for whatever ways and shapes they need. So what we see a lot of times is a customer will take the system, and if they don't like MicroPython, they'll just literally rip that out of the code base since everything's Makefile-based,

Starting point is 00:23:48 and they'll just shove it into whatever they're actually using. Since it's open source, you can kind of actually do Frankenstein edits like that. And as long as you follow some good practices and don't completely edit our code and have an unmaintained fork, you can do a pretty decent job of staying in sync with upstream while not having a totally broken system. But no, we plan to keep it in MicroPython.

Starting point is 00:24:13 And the reason for that, as I mentioned, we want to get the larger developers who are not working in embedded systems to kind of jump on board. But also, we found that it's pretty usable in production for a lot of people with the Python interface. So I was just talking to a customer this week, actually, who's putting these things in power plants.

Starting point is 00:24:36 And they're loving it, actually. For them, they just needed to do some basic editing of, they were just using it as a sensor that would connect to their infrastructure and then do some remote sensing. And I don't want to mention exactly what they're doing to not spill the beans. They're monitoring the nuclear fuel rods. Yeah, but they loved how we had a system that was flexible. One of the big things for them is they didn't want a black box system

Starting point is 00:25:02 that could not quite do what they needed. They wanted something that was open and available for them is they didn't want a black box system that could not quite do what they needed. They wanted something that was open and available for them to tweak it in any ways they needed. It's called OpenMV, so you'd think that people would recognize that it's open source. But it is actually open source and open hardware and all of these instructions and GCC intrinsics we're talking about. You can go to the code and look up. Yeah, you can. You can also, we should also rename ourselves Closed AI, I guess, maybe. Or Closed AI. What is the, I know we've probably talked about this in the past,

Starting point is 00:25:37 but people probably ask, what license, what open source license does OpenMV come under? Yeah, so we're actually under the MIT license for most of our code. Yay. We do have, yeah, yeah. Party time. Honestly, trying to enforce this, it'd be insane, right? I mean, people, it has led to some weird situations.

Starting point is 00:25:59 So we're actually really popular in China. And so much so that people actually take our IDE, which is our changes, our MIT license, but the IDE base is GPL. So you do have to keep that to be open source. But for that system, we actually see people who want to compete with us. They actually take our IDE, take the source code, they remove our logo, put their name and logo on it, and then sell a product that has the same similar use situation as us and actually try to compete with us with our own tools. It's crazy. And by crazy, you mean exceedingly frustrating?

Starting point is 00:26:43 Exceedingly frustrating. But hey, you know, it's kind of flattery, though, at the same time. Like, they like our stuff so much, they're not going to build their own. They're going to just copy and paste what we've been doing. So you said closed AI. Are you closed AI? Where was I headed with that? I think that was just a joke. That's a joke on open AI.

Starting point is 00:27:03 Oh, I see. Okay. I mean, we've talked a little bit about some of the graphics things with the erode and dilate, but you have a whole bunch of machine learning stuff too. Yeah, yeah. So we integrated TensorFlow Lite for microcontrollers a long time ago, and we've been working with Edge Impulse, and that's been great. They basically enabled people back when this was super hard,

Starting point is 00:27:28 and we didn't have this back when I did an interview with you, seven years ago also. They made it easy for folks to basically train a neural network model and get it deployed on the system. At first, we started with image classification, but that's moved on to something called FOMO. I think it's called Faster Objects,

Starting point is 00:27:44 More Objects, but obviously it's a play on YOLO. Right, right. I remember that. Yeah. Okay. Yeah. I knew it was connected to YOLO somehow, but I forgot. Yeah. Yeah. It allows you to, it basically does image segmentation. So let's say it'll take an image and like, it'll take a 96 by 96 input image and uh then basically it figures out the centroids of objects in that image and will output a 16 by 16 pixel array where the centroid of a object is and this allows you to do multi-object tracking on an mcu and you can do 30 FPS. Okay, where do the Jetson TXs fit in here? I mean, I thought those were super

Starting point is 00:28:30 powerful and did all the super amazing things, but now he's got microcontrollers doing this. Jetson would be able to do much larger image than 96x96. Oh, yeah, absolutely. Of course, but Jetson's also running Linux and doing a bunch of other stuff that slows it down.

Starting point is 00:28:46 Right, but it's a big real GPU. I don't think it slows it down. No, no, Jetsons are awesome. I think here's the problem. So if you look at the latest Orin, right? I'm looking at it right now. Yeah, the high-end one. So I was actually, my last thing I was doing,

Starting point is 00:29:04 so by the way, I only recently went full-time on OpenMV. I was side hustling this forever. When we started this company, it was always a side hustle. And I recently, last year, the company I was working for, Embark Trucks, they had gone public and I had joined them at like employee 30 and ridden with them for five years. I was a ride or die employee there.

Starting point is 00:29:28 And they went public for $5 billion in 2021, part of a SPAC process. And the company shut down then in 2023 and was sold for $78 million. So I got to see... So in the meantime, there were constant parties in the Caribbean? Yeah, no. For the engineers, we were just working hard to make that stock go back up. But it was an interesting ride. I say that, though, because I'm at full-time now on OpenMV.

Starting point is 00:29:57 But one of my last jobs at Embark before we shut down was I was trying to figure out how to get an NVIDIA Orin into our system. And that thing's amazing. It can replace so much. before we shut down was I was trying to figure out how to get an NVIDIA Orin into our system. And that thing's amazing. It can replace so much. But it's also $1,000 plus? $2,000 on Amazon. Yeah, so here's the thing.

Starting point is 00:30:16 Also has a 60-watt power supply or something like that. Yeah, you need that. I was doing serious engineering. I was actually building a $10,000 PCB board, by the way. $10,000. More than two layers? Like 18 or something.

Starting point is 00:30:32 It was crazy. We had lots of fun stuff. Not going to mention more than that. But it was an amazing system. We were really pushing the limit. I was like, this is an incredible system for what we're trying to do. Self-driving truck brain? Yes, absolutely. But the challenge is when you have a system that costs that much, this means your final sale price for your robot or whatever you're going to need to sell a $10,000 system at minimum to make some cash back. It's really hard to kind of make those margins

Starting point is 00:31:09 make sense if you're not selling it in a high-priced system. Yeah. And the thing is, I mean, there's a lot of other costs that go with that. Like, you know, if you're building a system

Starting point is 00:31:18 with something that powerful, power becomes a big issue, especially if you're on batteries. And, you know, weight and size or and they do come in modules that are you can get smaller carriers for but it's not the same as building a custom pcb with a cortex m55 on it or something if you can get away with that but oh yeah no it's this is so i i've actually heard from some of our suppliers that NVIDIA's position is that they're focused on really the money for them, and that's in the cloud. The rise of what ARM is doing with TinyML and all these other processors, it's really going to be the future. There's an E Times article where the previous CEO of Movidius, he now works at ST running all of their microcontrollers.

Starting point is 00:32:08 And his position is that there's a wave of tiny ML coming. And it's basically from microcontrollers becoming super, super powerful. Like when you're, this is why I'm going full-time on OpenMV, because I see this wave happening where, you know, what does it mean when your MCU can now process 1080p video and cost $4 and has instant on capabilities? It draws less power. It produces less heat.

Starting point is 00:32:35 It doesn't need SDRAM or eMMC. So the bill of materials is like $10 off from what you'd pay for a Linux-based system. And it's also less physical size because you're now down to one chip versus three. So you've got four wins. How do you compete against that? Again, also, it can go into low power on demand

Starting point is 00:32:56 and wake up instantly. And this is that future where these things are becoming really, really powerful. And what they need is a software library. And so that's what we're focused on is really building out that algorithm base. So instead of you having to sit down and say, how do I write efficient SIM decode that makes this algorithm go super fast? It's already built. And you can just use an OpenMV cam to do what you want to do.

Starting point is 00:33:23 Okay. Talking about what I want to do. Cool, let's go. Okay, so I have an application idea. It's a terrible idea, but I want to try it anyway. And mostly this is an exercise in how would I actually use OpenMV to accomplish my goals and possibly to make my own product, and where do I make the decisions.

Starting point is 00:33:46 Okay. Okay. So we say I want to find and identify wasps. I have a big book of wasps. A listener, Brent, noted that his spouse wrote a big book of wasps after I said I liked bees. And it's very comprehensive. I have many, many pictures.

Starting point is 00:34:06 My desktop wasp ID in TensorFlow works fine. Now what I want to do is I want to mount it on my roof, and I want it to identify all the wasps in the forest. In one direction or multi-directions? All the directions. Okay. So you have a 360-degree wasp scanner. Right.

Starting point is 00:34:25 All right. Okay. Okay. Question you have a 360-degree wasp scanner. Right. All right. Okay. Okay. Question for you. How good of an image of a wasp do you have? Do you have, like, nice high-resolution images where you can see, like, the hair on a wasp? Yeah. They have little back legs. They have little serrations. At least some wasps. I mean, there's so many wasps. But then they can use that to wipe off the fungus that tries to attack them and take over their brains. Oh, yeah, I've heard about that. They definitely have high-quality images, not only the hairs, but the serration on the hairs. Okay, so even before you get into OpenMV, I think this is the problem setup thing you have to ask yourself, which is, how do you actually take an image that's that high quality of a wasp that's flying around in your backyard?

Starting point is 00:35:12 That's the first question. Are we talking a DSLR image that's on top of your roof, just kind of pointing at wasps and then snapping really awesome pictures with a super great lens? Is that what we're looking at? No, no. then snapping like really awesome pictures with like a super great lens is that what we're looking at no no i want just to know i don't i want i don't want more pictures of wasps i want wasp identification right but if you need a feature size that's very fine to identify one wasp with another that informs how high resolution your camera has to be. Or how close the wasp has to be to the camera. Yes, yeah. Right.

Starting point is 00:35:48 Because if I can tell you how to find wasps... How many pixels on wasp do you need to identify a wasp? Yes, that's the best way to say it. Thank you. And so, yeah, this is a good concept. So, and Chris said it really well, but maybe we need to think about the visual here. If a wasp is far away, it may only take up four pixels and we won't be able to see very much

Starting point is 00:36:12 about it because it's far away. Just because the camera resolution, if I had a higher resolution camera, it would take up more pixels. Or if the wasp came closer, then it would take up more pixels. And so what was the phrase you used? Pixels on WASP. The pixels on the item of identification is really important. Or P-O-W. Pixels on WASP. And so, yeah, that is a big choice for me is do I want higher resolution cameras or am I willing to accept things to be closer? Well, I think it's actually both really because a lot of times more pixels in an image doesn't actually do anything for you.

Starting point is 00:36:58 Most cameras can't resolve optically a lot of the extra pixels. They just become noise. So it's really about the quality of the extra pixels. They just become noise. So it's really about the quality of the optics that you're dealing with. Like, can they actually produce an image that's focused and sharp for every pixel? Because you can shove an 8-megapixel or 12- or 43-megapixel camera in with a bad lens, and you'll have no better image quality than if you actually just improve the lens itself. I swear this is like talking to myself in a meeting three weeks ago. Okay, I don't want to buy $10,000 cameras.

Starting point is 00:37:37 So then you're going to want to have some zoom action. That's kind of what needs to happen. I think if you want to identify WASP, you're going to need to do two things. You're going to need to have one camera that has a really nice quality lens that can do ranging, where it can zoom in on the wasp, and then it can track it and follow it. So I have one that identifies flying objects from my background, and one camera that I say, go there, take a good picture. And then I send that to my wasP identification as opposed to my motion identification. Yeah, but now you need a gimbal.

Starting point is 00:38:09 Yep, you need a gimbal. This is getting expensive, Alicia. What about an array, a larger array of crappier cameras? Yeah, like a WASP's eyeball, a compound eyeball for cameras. OpenMV compound. I think that would work too. You could do a bunch of zoomed-in cameras that would be like a detection field

Starting point is 00:38:34 where if a wasp flew in front of them, you could see what's going on. You're going to need... If you're not doing a gimbal, I would say it's probably out of the spec of our system now. But you probably need like an NVIDIA system on this. But even then, it's still going to be challenging.

Starting point is 00:38:52 Because at the end of the day, I think the gimbal system is the most likely to happen. But if you wanted to do something like you just had a bunch of cameras and you create like a detection field. The challenge is each of them has like a different zoom and area they can see. So then you'll need like multiple cameras, like at different focal lengths. You'll need to have one that's wide angle and one that's, you know, more zoom and one that's more zoom and et cetera to kind of see every position and such. So getting away, I think the gimbal is actually better because you've got a gimbal with like a zoom lens. That would probably do the best job. What if? Okay, so I like that, but I also don't want it to have a moving part.

Starting point is 00:39:31 So gimbals are probably out. What if I didn't have them have multiple zooms? What if I had a fixed zoom on all of them, but this allows me to look in lots of directions and have them be slightly overlapping at their edges. You could do that. It's really hard to set up. A better way would be, could you force the wasps to walk into something where they're going to be in a fixed focal distance? Could you do a little hive or something where the wasps have to fly through? Then they'll all be in the same area and about the same size. And that really simplifies the problem at that point.

Starting point is 00:40:10 Yeah, I think a lot of ML problems, people haven't thought about the social engineering aspect of it. The social engineering of the wasps. Yeah. And then he's going to tell me that I just need to have good lighting and to have them go one by one walking through like some sort of little wasp fashion show okay okay i don't i don't want to corral the wasps for one thing they sometimes eat each other uh or or do weird mind control or just lay babies in each other so we don't want that um we want the wasps to be free-flying. But it sounds like because I don't have enough pixels on the wasps, this won't

Starting point is 00:40:52 happen unless I can... Unless you have more processing, higher resolution, better optics. But I really like the idea of having a whole little 360, eight cameras, and each one identifies a wasp the problem with the world is it's large oh and it's so then there's pixels and they have to go on the world which is large

Starting point is 00:41:15 and then too many pixels well i think you probably could do it i mean if the what like you could do like one of those 360 camera things i've seen people with an nvidia jetsons do that where where they have two cameras that are mounted back-to-back, and they're doing a 360 view. But the challenge is the level of detail. But the optical resolution. Yeah. The opticals are not as good, and the resolution is not enough, and your pixels on identified object are just too small. Yeah. So it could tell you that a wasp was flying around or like a thing

Starting point is 00:41:46 was flying around, but it couldn't actually tell you what version of that thing was. Okay. So two large cameras in this instance with fisheye lenses is better than eight small ones because it just changes your field of view. Yeah. You get like a 360 field of view, but then your challenge is how close are you to the particular wasp? That's what's really going to matter. So like maybe if they're within the distance of like a foot or like maybe three feet, you might be able to see them if you have enough resolution on the cameras and then you could possibly do it. But then, I mean, of course you want to put this on your roof though. And so like the wasp aren't even going to get near it.

Starting point is 00:42:25 That's the challenge. So you need to get that wash fashion show thing again and have some bait to get them to fly near you. That's like those bird feeders that identify birds for you. They're bird feeders. Well, yes. All I need for wasps, really, is a tuna can. Gross.

Starting point is 00:42:41 Won't a bird come in and eat that? Or a cat. Yes. Or a raccoon, more likely. This is going to be the greatest video set ever. Okay, so let's say I go ahead and I have the little, I have some area where I can cover, and it's not exactly the Wasp fashion show, but they're, they're a foot

Starting point is 00:43:05 to three feet apart away from my camera. And I have, I don't know, let's say I don't want to shell out for the huge cameras. I have like four open MVs and I've, I've pointed them and I just want to do the best I can. Okay. What algorithms am I looking at here? Yeah. Yeah. So there's a few different things you can do if you want to work with this. Yeah, as you mentioned, does the lighting have to be good? Yeah. So if you actually want to be able to take nice pictures in the dark or in the day, you're going to need to have some good lighting on that. And then there's, of course, the problem that the wasps are going to fly into the light. So what do you do there? There is

Starting point is 00:43:45 something you can do with thermal cameras to see them. That's like a really easy way to pick out wasps during day or night because they're going to pick up, they're going to be visible in the background. There's also something to do with event cameras. So we have some customers right now. Yeah, I was reading about these. Please tell me. Yes. me yes oh yeah yeah um there's a company called prophecy for example they're making a event camera and um more or less these things run at literally whatever fps you want if you want a thousand frames a second they can do that and they literally just give you an image that is so for every pixel what they're doing is they check to see if the charge go up or did the charge go down? And then based on that, they produce a image. And so they can actually, they're kind of like HDR in

Starting point is 00:44:32 the sense that even if they're staring into the sun, they can still detect if a pixel increased in charge or decreased in charge. And so it doesn't really matter what's going on in the background or et cetera. They basically just give you a difference image of what kind of moved around and such. And that actually creates these interesting kind of convex halls of things. So you can really see like blobs moving very, very easily because of that. It's not going to be useful though for identifying what the wasp is per se, but it will tell you like there's a wasp though walking there. But then you can easily overlay that with the regular colored image and you can tell what's going on there. Or you can do everything directly from the color image itself. It's just going to

Starting point is 00:45:16 be harder when it gets nighttime and you don't have lighting because then you'll need to somehow boost that image quality to see still. What was the name of the, I heard event? Yeah, event cameras, I think. Event cameras, yeah. They basically do, it's like the architecture of the camera itself does the things you would do in software to do motion vectors. I'm making stuff up, but. Frame differencing.

Starting point is 00:45:39 Yeah, frame differencing, and then figuring out motion directly. So it's just done in the camera, and it has such a high frame rate that it can do that much better than, say, doing that on a 60-frame-per-second camera in software. Yeah, well, the benefit is that camera can sample at one microsecond each pixel. Yeah. And so you can actually go beyond 1,000 FPS if you want.

Starting point is 00:46:00 Yeah, it was something crazy, yeah. So it's technically a million FPS, but you probably couldn't read out the data that quickly. But it allows you to do really, really fast object tracking. That's the best way to say it. So this will allow you to actually find the wasp in the image that they're flying around and actually track them with such precision you know exactly where they are. The trick is, though, then the color camera can't keep up with that. So now you're back to, you have convex hulls of wasps flying around, but at least you could see them in the daytime and nighttime. And here's the interesting thing, though. Assuming the wasps are all about the same size, then if you just wanted to identify whether or not you had a bigger

Starting point is 00:46:40 wasp versus a smaller wasp, you could probably do that on board. Because you'd have this outline of them. Would it be... How would I be able to tell a close wasp from a far bird? Well, they're all in this wasp thing, right? Where they're corralled. Oh, we're back to corralling. Oh, right. We're back to corralling. Okay. Sorry. I was thinking open sky. Well, you could probablyalling. Oh, right. We're back to corralling. Okay. Sorry. I was thinking open sky. Well, you could probably do open sky, too. Yeah.

Starting point is 00:47:09 You could train your model on shapes. Well, yes, but with the event camera. Yeah. It gives you a shape. It gives you a shape, but they're blobby shapes, aren't they? Or are they pretty crisp? I think it's the outline. Oh, yeah.

Starting point is 00:47:22 I don't know. It depends on the... It's the outline. This is what Prophecy is actually trying to sell on, is that they believe that you don't actually need the full image, the full color. They say you can do everything from outline. And it's not wrong. I remember back at my day job at Embark,

Starting point is 00:47:41 we actually did vehicle identification and stuff all based on the LIDAR scans from objects. And LIDAR scans didn't contain anything but like, you know, if you hit the back of a truck, you'd only have, you know, like a crescent shape, right, to see part of a vehicle. You wouldn't see the entire shape of it. And so we actually had neural networks that ran on board that identified what was a truck, what was a car, what was a motorcycle, all based on just partial side scans of them. Okay. So this would be really awesome for tracking paths and for identifying things without having to worry about light. And it's outlined. So it's again, going to have some number of pixels on the WASP. I think they're pretty low resolution

Starting point is 00:48:26 right now, if I remember. They're 320x240. That's for the cheaper thing that they're selling, but they also have some 1280x720 cameras. But don't ask about the price on that one, because you can't afford it.

Starting point is 00:48:43 And we mentioned frame differencing, which is something that i think would be really useful if i'm dealing with things that are flying around or moving quickly in ways that i don't expect yeah that's that's simply where you just have one image in ram and the next image comes in and you just subtract the two and boom by the way um on the ARM Cortex-M4 processors, there's an instruction that basically takes four bytes of a word and another four bytes of another word, subtracts every four bytes from each other, and does absolute value on it, and then adds them all together in one instruction.

Starting point is 00:49:17 So if you want to do frame differencing on the Cortex-M4, we can do that very, very fast on the OpenMPK. So super easy to get the max FPS on a large resolution thanks to features like that. But you still run into a challenge that the camera itself

Starting point is 00:49:35 is going to have a limited frame rate and it'll cause a lot of motion blur. And so you're really going to want a global shutter imager at that point. But then you run into now the lighting needs to be improved to go faster the constant upselling yes so when when i saw on your website frame differencing and i was thinking about how to track things i went straight to convolution which is a far more expensive algorithmic process. We actually had a customer for that.

Starting point is 00:50:10 It seems a lot more accurate than what you're talking about. Yeah, no, it is. So frame differencing is just one way. I'll say this. We had a customer that I actually put some work into doing SIMD optimizations for our morph algorithm, which lets you do custom convolutions on the OpenMVCAM. And we're capable of doing about 200 frames a second at 160 by 120. And yes, we can do that. And we had to do this for this customer because they wanted to track a single pixel of an object with background noise. And so it turns out you can do something called a masked filter,

Starting point is 00:50:51 which is kind of like, it basically is a convolution that suppresses all pixels that aren't just a single bright pixel. And this allowed us to track a IR LED in the daytime. So imagine an IR LED in the daytime, the sun emits IR light. Yes, yes. Very, very hard, but we managed to do it. So we could see this object moving around. So that is something you can use. It is, though, very specific, I would say. It was a good use case for their algorithm, for their problem. I don't know if it'll work, though, that well for WASP, since WASP might be more than one pixel.

Starting point is 00:51:37 Yeah, I wouldn't think it would be a single pixel. I mean, that means I have no features and it might as well be a speck of dust. What if you put IR emitting LEDs on wasps? Yes. Or those April tags. That would be very useful. They could just carry around little billboards. That would be so much easier than what I'm talking about. We just April tag all the wasps and each one will have its own little number and reference on where it is in the world.

Starting point is 00:52:08 I will say this. Someone did actually use the FOMO algorithm with a regular color camera to let count bees. So that's definitely possible. Their goal, though, wasn't to identify the difference between bees, though. They just wanted to know where they're like objects of a similar size flying by in the image and so i think edge impulse had like a um a tutorial about this they had the raspberry pi running with fomo and it was totally capable of checking like bee movements and and seeing and counting the number of bees entering and exiting a hive excuse me while i Google bee FOMO. Okay, so I guess I had questions here about LSTM ML algorithms and trying to track my wasps, but I feel like I'm on a totally wrong path here with my wasp identification project.

Starting point is 00:53:10 I think it's possible. You just have to solve the physics problems first. And this is unrelated to the compute. It's just first, you got to get an image that's high enough quality of these really, really small things. And that's the challenge is that the wasps are so small. If you had like, you know, if you're trying to track badgers running around in the fields or a groundhog, that'd be a lot easier. I think this is true of a lot of machine learning problems, is that we get so excited about how computers can do so much and how machine learning empowers things and forget that, oh, physics, that thing. Who cares about physics when we have machine learning? Well, it's a continuum, right? Because you can apply lots of compute to bad data sometimes, like he was talking about the IR light.

Starting point is 00:54:01 But if you want to, you can make the computer think really hard and try to clean up bad images sometimes. Or you can spend more on getting good data and do less work. Yes. But you also have the problem that sometimes what you want to do is so not really suitable for the hardware you're working on. Right. Well, I mean, that is the case. I think it's all about just the problem setup first. I mean, this is something I see a lot of our customers and people wanting to do computer vision is folks just like, they have an idea on what they want to do.

Starting point is 00:54:39 And you haven't taken the step of sitting down and like thinking, okay, what does this look like exactly on what am I trying to answer? And that's always really important for any one of these problems, and especially in vision, that you have to go through that setup of trying to do the work of actually engineering what is actually reasonable and what I'm trying to accomplish. And it does involve that physics axe point. I think we've seen a lot of demos that show off really, really strong ML happening. But even for back in Embark, our computer systems cost as much as a house, right? So unlimited budget. And even then, though, you had the best engineers working on this stuff, running the biggest algorithms with the biggest GPUs. And it was still challenging.

Starting point is 00:55:26 And that was unlimited power, unlimited budget, unlimited compute, unlimited image resolution. But you still had to actually make an ML algorithm perform and do a good job at segmenting these images well and locking on to what objects were being tracked. Reliably. Yeah. And it's like drawing a bounding box that jitters all over the place isn't really good for your self-driving truck, right? It's got to be like super locked, no jitter, really, really high quality. And so labeling, figuring out what's bad data versus good data, lighting situations, like it, you know, in the real use cases, even when you have,

Starting point is 00:56:02 you know, enough power to do anything, you still have to work really, really hard on getting good data in. As you are now full-time at OpenMV, how much of your time is spent trying to help people with applications and convince them that what they want to do isn't exactly suitable versus being able to say, oh yeah, I can help you with that? It's about 50-50. We have a lot of folks who will ask us random questions and I don't want to waste their time and I don't want my time to be wasted. So I try to make sure we steer them in the right direction.

Starting point is 00:56:36 If they need a higher end system, they should go forward to that. I'm also driving though towards the image and future I want to create. And so right now it's a lot of engineering work and developing trying to build out the company and build out what we're trying to do uh truly uh last year was a lot of like pulling the company out of the ditch to be honest um well embark was my sole focus i kind of went a wall on open mv kind of like 2021 to 2022 i

Starting point is 00:57:02 wasn't really on at the helm of the. Let's just say it like that. So we were just doing, you know, we were staying alive, but we were out of stock because of the chip shortage. I hadn't foreseen how bad that was going to be. I don't know if any of y'all tried to buy STM32s ever.

Starting point is 00:57:18 But that was some unobtainium for about three years right so um that that really hurt um but luckily the end of last year we managed to do two things one an order of about 5 000 stm32h7 chips finally arrived after waiting for two and a half years so we managed to get back in stock finally. And then we also pivoted and supported NXP's IMX RT. And so this gives us then two verticals. Now we're not dependent on just the STM32, but now we have NXP also. And this allowed us to produce the new OpenMVCAM RT1060. And because of learnings we had with our partnership with Arduino, we tried

Starting point is 00:58:07 to really include a lot of features that we saw customers really wanting on this system. So built in Wi-Fi and Bluetooth. And I'm also proud of myself recently because we're going through FCC certification and CE and other certifications for the product. And so far, it looks like it's going to pass. So we'll have a certified Wi-Fi and Bluetooth enabled product. But we also built in things like battery charging and low power. One of the biggest features on board is being able to drop down to 30 microamperes on demand and then wake up on a IOPin toggling. And so we had a lot of customers ask for such things so that they can deploy this in low-power environments. But we also added Ethernet support now. So you can actually, we have a PoE shield

Starting point is 00:58:54 we're selling on our website, and this allows it to connect and get online that way. So this is a PoE-powered microcontroller if you want to make that. And we do have an RTSP video streamer, so if you want to stream 1080p JPEGs to VLC or FFmpeg, we've got demo code that shows it able

Starting point is 00:59:12 to do that. This is what your Raspberry Pi was doing back in 2013, so we're kind of at that level of performance now. But even farther, like Raspberry Pi 2, not quite 3, but about 1 to 2 with our current system. Are GSP streaming from a Cortex?

Starting point is 00:59:28 Yeah, I know. Crazy, right? But yeah, no, totally legit. We are sending Ethernet packets or Wi-Fi packets and streaming video. Yeah, the future is coming. I'm telling you. Do I have to still use GStreamer, though? Yes, you still have to use GStreamer. He said FFmpeg, but he meant GStreamer. When can I stop using GStreamer? That's what I want to still use GStreamer, though? Yes, you still have to use GStreamer. He said FFmpeg, but he meant GStreamer.

Starting point is 00:59:46 When can I stop using GStreamer? That's what I want to know. At least on the device, you don't have to use it. Yeah, right. It's actually kind of funny, because one of the first things I had to do at Embark was I had to produce a driver interface camera for our trucks. Basically, we wanted to know what the driver was

Starting point is 01:00:05 doing in the vehicle, right? Oh, right. Okay. Yeah. Yeah. So we had to have a camera that just sat inside the cab and looked at people and would record video of the driver. And so I was like, sure, this is easy. Went online, went to Amazon. There was like one company that said, hey, 4K HDR webcam, $100. I'm like, cool, we buy it. You have to go into their like GUI and figure out how to set it up to stream RTSP. It's a little annoying. There's like a mandatory password, you know, of course, mandatory password, which means that, you know, your techs assembling these are going to have to go through this like hour long process to get these things set up. And, you know, of course, it has to be on its own. Like, you have to have it on the public network first and then use their tool that, you know, uses some identification thing before you can log into its GUI and then having to set it in its GUI, then you can set it to be on a static IP and then force it to stream.

Starting point is 01:00:57 So a lot of setup just to get these things working. Then we deploy it in the truck. And it turns out you hit a bump and the Ethernet would just drop. Like the connection would just drop. Like the connection would just disconnect, go down and then come back. And my boss was like, hey, Kwab, you know, it's like gone for like a second or two every now and then. I'm like, ah, interesting. Huh.

Starting point is 01:01:21 It's like we could have an accident in that second or two. That's a pretty big liability for the company. We need a new webcam. So this ends up being a multi-luff-mong project of me trying out many different webcams, the same annoying GUI setup, trying to get them to stream video. And finally, we settle on a $700 IP cam, not $100. From Axis, right? They're always from Axis. No, Opticon. Opticon makes webcams that can literally survive explosions.

Starting point is 01:01:53 We didn't buy the one that cost that much. That was like a $2,000 one. But we ended up buying the $700 one, which is still very expensive compared to the $100 cam. But it didn't drop its connection. Rock solid. And so I say that story to say that it's funny that I'm able to replicate

Starting point is 01:02:11 now what I spent all that time on, on a microcontroller. Could I buy an OpenMV for my truck so that I can watch people? I'm just wondering should I get his product now that he doesn't? Does OpenMV drop Ethernet packets? No. I think he's going to say no.

Starting point is 01:02:34 No, it doesn't. It doesn't. That's one of the nice things. So that's the focus there. No, it doesn't at all. And even if it did, hey, you can at least go in the firmware and figure out why. Right. So that's the focus there. No, it doesn't at all. And even if it did, hey, you can at least go in the firmware and figure out why. That's the big thing. It's kind of like with the previous system, it was kind of like, huh, we're going to have to go to each truck and physically remove these and put a new thing in.

Starting point is 01:03:01 Wish I could just do a firmware update or ask the manufacturer what's wrong. But that was not possible. But anyway, we built in a lot of features into this system just to make it easier for customers to really build things they want. And so that's what we're excited about. But then moving forward, there's these new Cortex M55 processors

Starting point is 01:03:20 coming out. And so that's the exciting thing. I actually want to ask Chris about that. What do you think, Chris? You've been playing around with one. I've been playing around with one at a very high level. So I haven't really explored the feature set and I'm using vendor tools and things. So, I mean, it seems fine.

Starting point is 01:03:37 It's very capable. I mean, they're very high clock. I mean, I'm used to Cortex-M3s and M4s. These are clocked, I think, I'm used to Cortex M3s and M4s. These are clocked, I think, at 4600 or something like that. Yeah. So it's a bit unusual to be using something that's not Linux, running that fast, and putting

Starting point is 01:03:56 Zephyr or FreeR ThreadX or whatever on it. It feels like overkill in some ways. It is a very big hammer. But the stuff we're doing with it, which I can't talk about, needs a lot of processing. So then data throughput and things like that.

Starting point is 01:04:14 Yeah. Well, I'm excited about the new ones that are coming along. Like 2025 is going to be an exciting year. Just imagine doubling the clock speed of what you just mentioned. Integrated hardware modules you would only see on application processors, so actual video encoding in hardware. Yeah.

Starting point is 01:04:36 That kind of stuff, like large-resolution camera support, and then even more ML. One of the coolest things is, you know, we were talking about all this processor performance that's coming, but that's not even the important bit. The important bit is the ARM Ethos kind of processors. These offer like hundreds of gigaflops of compute now

Starting point is 01:05:00 for these microcontrollers. And so what that means is if you wanted to run a neural network on board some of these chips, they'll actually outperform the Raspberry Pi 5 and they won't draw any power either. Like something like the Aleph, that has a 200 gigaflop neural network accelerator. And so if you ran the Raspberry Pi 5

Starting point is 01:05:21 at 100%, every single core pegged to the limit, you get 100 gigaflops of performance. And it would, like, catch fire. And so one of these MCUs will draw 20 milliamps, 20 to 30 milliamps of power of 200 gigaflops, thanks to onboard neural network accelerators. Do you really think it's machine learning that's important here, or do you think these other features like outline detection and hog algorithms and convolution, do you think it's the features? Well, that's stuff that's not, as far as I know, the neural engines aren't applicable for. Well, no, these would feed into that. Yeah, yeah, yeah. I just, how much time should we be doing straight ML on raw camera frames versus giving them

Starting point is 01:06:15 some hints? Doing some actual image processing. Algorithms and heuristics and all of that. Yeah, it's actually both. So, I mean, definitely you want to use ML as much as you can. Transformer networks being one of the new things that people are really excited about. Those require a little bit more RAM, though.

Starting point is 01:06:36 Like a lot of these network accelerators, they're really good at adding that, you know, we needed more compute. And so these offer literally 100x compute of where things were previously. And so now you can actually run these large networks. But then you need more RAM if you actually want to run the new transformer models,

Starting point is 01:06:54 which are dynamically creating weights on the fly, more or less. But you still need to do a pre-processing for these things. Like an example being, if you want to do audio, you don't want to feed the network the raw PCM audio 16-bit samples. You want to take an FFT first and then feed it slices of an FFT that are overlapping each other. And so that's where the processing performance with the Cortex-M55 comes in and having that extra oomph there. That allows you to churn out these FFTs and to generate those slices so that you can feed the neural network processor something that's going to be directly usable for it. And being able to compose these two things together is what really brings the awesome performance and power that you're going to see.

Starting point is 01:07:36 Similarly, for video data, you has to happen before you feed it to the accelerator, since your resolution is going to need to be at some limit, like maybe 200 or 300 pixels by 300 pixels. It's not going to be the full 1080p, though, of what the camera sees. And transformer. I missed transformers. I've been out of machine learning for a couple of years, which means that I'm 5 million years out of date. I think it's what the LLMs use. It's less training in no recurrent units, even though it seems like it's sort of a recurrent neural architecture. So not as much feedback. No feedback?

Starting point is 01:08:18 It does have feedback kind of internally, I guess, though. I'm not an expert on this stuff either. I'll say it I guess, though. I'm not an expert on this stuff either. I'll say it like this, though. From what I've learned so far, more or less, they look at the data inputs coming in and then dynamically adjust their weights based on what they see.

Starting point is 01:08:38 So the network isn't static. It is dynamically adjusting what it's doing based on what it's seeing. And it can remember that through a stream of tokens that are coming in. So whatever is being sent to it is tokenized. And that stream of tokens is used to dynamically update relationships between those tokens while it's running. So it doesn't necessarily have memory inside of it,

Starting point is 01:09:03 but the memory comes from the tokens that are going to it. The T in chat GPT is transformer. And they're also used for translation, language translation and things like that. The big thing is that they're just like a bulldozer. Pretty much every problem people have been trying to solve is now solved instantly by them. Okay.

Starting point is 01:09:26 So there were LSTMs and then LLMs and Transformers are after that. And I need to read up some. Okay. Cool, cool, cool. Me too. Well, what do you wish you had told yourself when you were on the show last in 2017 on episode? 219. Was it 219 2 2 12 darn uh which i believe at that point we were in sea world you are in sea world i think is the title of that show what i wish i had told myself what do you wish you could or what do you think we should have

Starting point is 01:10:01 told you about starting a business or Or what would have been good information? I don't know. It's been like a really weird random walk, just kind of doing OpenMV. I would say definitely it was a good idea not to go full time too early on it. There's definitely like a window of opportunity when you're trying to run a business. Now I see that opening with these new faster processors coming down the line where we can really do some amazing things with the concept that OpenMV has. If you tried to do this earlier, I think it would have been just kind of like pain and suffering to the max, especially when the chip shortage happened. So I'm glad I didn't go full throttle on it originally.

Starting point is 01:10:45 Honestly, I think the performance thing is the biggest. It's just, if I had known about this beforehand, maybe it would have created less random code and really focused more on things that had a lot of value. Kind of like, I think at the beginning when we were doing OpenMV, we were just trying to write as much stuff as possible and throwing things at the wall to see what sticks. And now it's a little bit more focused on actually providing good features that people

Starting point is 01:11:09 really, really want and making those work really well. And so like putting more time into one thing versus trying to spread it all over the place. And Gwabana, do you have any thoughts you'd like to leave us with? So I want to ask you guys. It's been seven years of running Embedded FM. So what episode are we on now? 400. 470?

Starting point is 01:11:34 68? This will be 477. This will be 477. Awesome. Awesome. So tell me about your experience with Embedded FM. I want to know. It's been good. I mean, I still meet interesting people and you've given us a lot to think about.

Starting point is 01:11:52 And you mentioned hearing Ralph Hempel be on and he gave us a lot to think about. And I like that. But we've both talked openly about burnout and our disillusionment with some of the AI features that are happening. Which is not necessarily disillusioned with ML writ large. I actually work on ML stuff and I enjoy it. But there's parts of, quote, AI that have been bothering me. How it's being used, yes. And so that's, I don't know. I like doing the show because I like talking to people,

Starting point is 01:12:25 but we've gone to every other week, which has been really good. I suspect we'll go to once a month at some point in the next year or two. Really? I don't know. We haven't really talked about that. What about you, Christopher? You hate everything because your computer died this morning. Well, computers have been bothering me since 20, since 1984. No, I don't know. You know,

Starting point is 01:12:51 this stuff objectively is very exciting to me. It's cool to see the capabilities, these microcontrollers getting so much more power in a very short amount of time. Since, I mean, it wasn't that long ago that, you know, an Atmel AVR was the microcontroller of the day and some picks, right? And now we're talking about close to gigahertz. Megabytes of RAM. A few bucks for something that's close to a gigahertz. There's literally a gigahertz processor if you want to.

Starting point is 01:13:23 They have one. But I also feel like, well, I mean, maybe it's time for me to let other people do that stuff. Because I miss the small processors. I mean, on the one hand, it's extremely exciting and it's a cool way you can do. But I feel like, yeah, but 128K of RAM is kind of fun to try to make something happen in. I don't get to optimize nearly as much anymore. It's a lot of trying to figure out what vendors are doing and putting together their Lego blocks so they work. And then optimizing little pieces of it.

Starting point is 01:13:57 But I never get to sit down and think, oh, here's a new algorithm. How can I make it go as fast as possible? And how can I learn the chip deeply enough to find the instructions? Like, I remember you talked about SIMD, and I remember working on a TIDSP, oh, probably 2001, maybe 2002. And it had some neat caching systems, but it was all pipelined and you had to do everything manually. And so I wrote a program in C and optimized the assembly by modifying my C so it would use the caches the way I wanted them to. And it meant really understanding what was happening with the processor and the RAM and what the algorithm, not what the client told me the algorithm was supposed to do, but what they actually wanted it to do. And I liked that piece. And I kind of miss

Starting point is 01:14:51 the deep optimizations that I haven't gotten to do lately. But that's partially client selection. Yeah. And as much as I complain every week almost about, I don't want to write another linked list or I don't want to write another spy driver for something. On the other hand, I've been using Zephyr lately, and most of the coding has been, you know, editing config files. It's like, oh, okay, I need a driver for this.

Starting point is 01:15:16 Well, it has that. It's buried in a directory somewhere, and you just have to edit the right DTS file, and then suddenly, and you don't even have to write a program. You just pull up the shell and make sure that the thing works by doing, you know, sensor get thing and it automatically just works. Anyway, I mean, it does feel like things are getting easier, which is good. But it's a little bit of a shock for people who've been working in this industry for a very long time, because it's a change. I mean, I loved learning MicroPython and developing C modules for it. It was amazing. But I didn't...

Starting point is 01:15:51 It's different work. It's very different work than needing to optimize things. And it's probably good. Embedded shouldn't be the way it has been. Yes. And everything's going in the right direction. But we existed in the right direction but we existed in the wrong direction for so long that we convinced ourselves it was fun yes we convinced ourselves

Starting point is 01:16:13 it was fun so now it's hard to change yeah well this is what i got started on so my first processor was the basic stamp oh yeah yeah yeah okay i had of those. And then I graduated to the propeller chip. Right. Lots of course. And I loved writing stuff. Yeah, no, it was so cool. Yeah. No, I actually did a whole bunch of drivers in pure assembly where I would... I think the most proud one I was of is... Okay, I did a pixel driver. It could drive 640 by 480 VGA with one core running at 20 megahertz.

Starting point is 01:16:48 And what it would do is it would have a character. So the propeller chip had a character map of like 32 kilobytes of ROM mass that had like characters. Okay. Yeah. The characters were like 16 by 32, well, 32 by 16 or something. And so the way the image about the frame buffer worked the frame buffer just encoded a single byte that told you where you know what what uh character to look up

Starting point is 01:17:14 right and that meant the frame buffer was way smaller than encoding every pixel as three bytes or so right so and because you only had 32 kilobytes of ram to play with so you you didn't have much memory at all and so that was the way the frame buffer was worked you you'd have a byte that told you what character to present and then the system would in assembly um you had to feed a video output system that was like a shift register that had a certain frequency so you had to your loop had to hit a certain frequency. Right, right. There was literally, it was like the difference between one assembly instruction, too much, it broke.

Starting point is 01:17:52 One, two less, it worked. And so you were just staring, how do I get rid of one assembly instruction to make this function? And here's the thing I did, though, that was so awesome. So during the vertical sync time, I would download the pixel maps on where... So you have a mouse cursor, right? I want to do the mouse cursor on them, and then re-uploaded them back to the main memory and put them somewhere where the system could seamlessly swap out the character glyphs of the actual character to the ones that I had blitted a mouse cursor on. And so it looked like there was a mouse cursor being overlaid on the image, but the actual frame buffer didn't include any mouse cursor in it. I would just look up the XY position of a mouse cursor and present that. And it was the coolest thing because I was actually able to make a GUI with text boxes where you can move a mouse cursor and click a button and actually do this. And it was 32 kilobytes of RAM. This is the kind of stuff I'm talking about.

Starting point is 01:19:12 Thinking about it, I miss having to do the math analytical puzzle where now I spend a lot more time reading and digesting information, which is also good and interesting and useful, but I prefer the information about wasps. And so I guess maybe I was never that interested in computers. No, I am, but I really do miss the sitting down and thinking about the analytics. The challenge of there's limits. The limits. And I have no limits now. I think removing the limits makes it less interesting for us in some ways.

Starting point is 01:19:50 Yeah, it's all about the puzzle. Like what really made me super happy with the SIMD optimization on OpenMV is just like, it's just like when you unlock that 200, okay, like the most recent one I mentioned of a road in Dialate, we've had the same code there sitting forever.

Starting point is 01:20:05 And so this year, I was just like, I need to optimize stuff. We actually scored a contract of ARM, where ARM has recognized what we're doing is so interesting that we're actually getting paid to optimize some of our code and produce benchmarks to show, hey, you can do serious things with these MCUs. These are real algorithms that people actually use being optimized and not just the seamless DSP library, which sometimes is lower performant than what you could write yourself.

Starting point is 01:20:32 Definitely. Yeah, that's the thing. But it's useful. These things move so fast, and I said it already in the show, but if things move slower, people would squeeze out all the performance that they could out of these. And now you don't have to as much because there's always going to be something faster, right?

Starting point is 01:20:48 I don't know. I don't even know what my point is. But what you're talking about is like, ARM didn't bother to squeeze out all the performance of their own things in CMSIS. They didn't have to. Yeah. Well, they should. It's their thing. Well, I think it's just they didn't have the market response, right? I think that's the new thing

Starting point is 01:21:04 happening now, though, is now that people are going to be able to see that, oh, hey, you could actually replace a more complex system with one of these things. Now there's actually some juice behind, oh, maybe we should actually try to see what we could do with these. But back to the algorithms I was mentioning, I got a 150% speedup on it using some of the Cortex-MD. 150%... Sorry, more than that. No, no, yeah, 150%. That's two and a half times performance. Two and a half times speedup. I mean, you go from, oh man, it's kind of slow to, wow. Being able to pull out two and a half times and 4x speedups on things, it's, I mean, you know, what do we get for processors nowadays? It's like a 7% speedup and people are like amazed by that.

Starting point is 01:21:50 And it's like, you know, when you're at the 150% level, it's like that's a whole different chip at that point. Anyway. Well, we should go. It is time for us to eat. And if we start talking again, we will just start talking again for quite to eat. And if we start talking again, we will just start

Starting point is 01:22:06 talking again for quite a while. We'll have you back sooner next time. Yes, I think that's the key. Absolutely. Absolutely. I super enjoyed this. It's awesome talking to folks who've been in embedded systems for a long time and equally enjoy kind of making the hard

Starting point is 01:22:22 things happen and solving these puzzles. Our guest has been Kwabena Ajemen, President, CEO, and co-founder of OpenMV. Thanks, Kwabena. Thank you. Thank you to Christopher for producing and co-hosting. Thank you to our Patreon listener Slack group for your questions. Sorry to John and Tom, whose questions I did not get to. And of course,

Starting point is 01:22:53 thank you for listening. You can always contact us at show at embedded.fm or hit the contact link on Embedded FM. And now a quote to leave you with. I have a nice quote that I actually like to read. So the founder of Luxonis this is my friend Brandon he passed away recently but he has one from Theodore Roosevelt that I absolutely love it's a little bit long but I'd like to say it it is not the critic who counts

Starting point is 01:23:19 not the man who points out how the strong man stumbles or the doer of deeds could have done them better the credit belongs to the man who is actually in the arena, who's faced his mart of dust and sweat and blood, who strives valiantly, who errs and comes short again and again, because there is no effort without error and shortcoming, but who does actually strive to do the deeds, who knows great enthusiasm, the great devotions,

Starting point is 01:23:40 who spends himself in a worthy cause, who at the best knows in the end triumph of high achievement, and who at the worst, if he fails, at least he fails daring greatly, so that his place shall never be with those cold and timid souls who neither know victory nor defeat. And that's how it feels when you optimize C code.

Embedded - 477: One Thousand New Instructions

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.