Embedded - 497: Everyone Likes Tiny

Starting point is 00:00:00 Welcome to embedded. I am Alicia White alongside Christopher White. Our guest this week is Kwabena Adjiman and we are going to talk about cameras and optimizing processors and neural networks and helium instructions. I'm sure it will all make sense eventually. Hi, Coopena. Welcome back. Hey, nice to see you guys again. Thanks for having me on.

Starting point is 00:00:34 Could you tell us about yourself as if we didn't record about a year ago? Hi. So, I'm Coopena. I run a company called OpenMV, And what we've been doing is really exploring computer vision on microcontrollers. We got our start back in 2013. Well, actually a little bit before that. I'm part of the original CMU Cam crew.

Starting point is 00:00:56 So I built the CMU Cam 4 back in 2011. And that was basically doing color tracking on a microcontroller. Since then, we founded a company called OpenMV that is doing basically higher level algorithms, so line tracking, QR code detection, April tags, face detection through hard cascades, and other various vision algorithms. And we put that on board pretty much the lowest end processors,

Starting point is 00:01:26 well, sorry, not the lowest end processors, the highest end microcontrollers, but lowest end CPUs compared to a desktop CPU. And so we got software that was meant for the desktop running on systems that had barely any resources compared to the desktop. Anyway, that was 100,000 OpenMV cams ago when we started. And since then, as I said, we've sold over 100,000 of these things into the world. People love them and use them

Starting point is 00:01:54 for all kinds of embedded systems. And we're launching a Kickstarter. We've already launched it right now. That is about the next gen, which is about 200 to 600 times in performance and really kind of takes microcontrollers and what you can do with them and computer vision applications in the real world to the next level where you can really deploy systems into the world that are able to run on batteries for all day and can actually process data using neural network accelerators on chip and make mobile applications. All right. So now I want to ask you a bunch of questions that have

Starting point is 00:02:36 nothing to do with any of that. That sounds good. That was a mouthful. I apologize. It's time for a lightning round, where we will ask you short questions and we want short answers and if we're behaving ourselves, we won't ask you for additional details. Are you ready? Yes.

Starting point is 00:02:54 What do you like to do for fun? What do I like to do for fun? Right now, it's really just working out in the morning. I do like to walk around the Embarcadero in San Francisco and try to get a run in in the morning. What is the coolest camera system that is not yours? Coolest camera system that is not mine? Let's see.

Starting point is 00:03:18 There's a lot of different stuff. I'm really impressed by just, and did self-driving in a previous life. And so it was awesome to see what you were able to do with that and being able to rig up trucks and such that could drive by themselves back at my last job. And then seeing Waymo's and etc. drive around downtown in San Francisco has been incredible. And in fact, getting to take my 80 year old parents in them and thinking about how my dad grew up without electricity back in the 40s and seeing him in a car that's driving around now,

Starting point is 00:03:57 that's a crazy amount of change. Specifics of your preferred Saturday morning drink when you are indulging yourself? My Saturday morning drink when you are indulging yourself? My Saturday morning drink when I'm indulging myself? These are good licensing round questions. Let's see. Well, you know, I've had to cut down sugar, so I can't say I have that many interesting things.

Starting point is 00:04:19 However, I can tell you about beers I like, which is I like to go for those old Rasputins. Oh, yeah. Really dark beer. Yeah, I like to go for those old Rasputins. Oh, yeah. A really dark beer. Yeah, I used to love those. They kind of do the job on getting you drunk in one, or giving you tipsy in one go, and also filling you at the same time. Since you like machine vision and cameras and things,

Starting point is 00:04:39 are you aware of the burgeoning smart telescope market? No, I haven't. But we had someone use an OpenMV cam to actually do astro... They made the camera follow the stars so they could do a long exposure. There's a lot of really cool new low-cost products that have nice optics and good cameras and little computers. And I think they're Similar to open MVs compute power and stuff

Starting point is 00:05:09 But they they can look at the stars and plate solve and figure out where it's looking So then do all the go-to stuff plus imaging. So it's a nice Integrated place that I might be a place to explore at some point I need a little air horn or sound maker so that I, because you're breaking lightning round rules. A little bell. A little bell. What do you wish you could tell the CMU4 team? The CMU4 team, well, that was me back in college.

Starting point is 00:05:37 I would say I never expected I would end up doing this for so long. It's been a journey now at this point. Do you have any advice you would give your past self? Like, really don't put in those fix me laters because you're going to actually have to fix them later? The only thing I would say is focus on the most important features and not do everything and anything. Everything you build, you have to maintain and continue to fix and support. And so a smaller surface area is better.

Starting point is 00:06:16 Build nothing. If you could teach a college course, what would you want to teach? I really think folks should learn assembly and do some kind of real application in it. And I say that because if you remove all the abstractions and really get into how the processor works, you learn a lot about how to write performant code, where to understand when the compiler is just not putting in the effort as it should be, and how to think performant code, where to understand when the compiler is just not putting in the effort as it should be, and how to think about things. It's important when a lot of our problems in life come down to, well, as engineers, we optimize things, we fix problems.

Starting point is 00:06:59 And not doing that optimization in the beginning, or at least thinking about it, can end up costing you in various ways later on having to redo things. See, also last week's episode, if you're interested in learning about the basic building blocks of computing. Okay, enough lightning wrap. Yeah. Okay, enough lightning wrap. So you talked a little bit about what OpenMV is, and what I heard was small camera systems that are intelligent. Is that right?

Starting point is 00:07:38 Yeah, yeah. We build a camera that's about one inch by one.5 inches, well, 1.75 inches. So it's pretty tiny, a little bit smaller than a Raspberry Pi, like half of the size or so. And what we do is, as I said, we find the highest end microcontrollers available in the market and we make them run computer vision algorithms. And we've been a little bit of a thought leader here. When we got started, no one else was doing this. There was just an OpenCV thread. Well, what is it? Not Reddit. Stack Overflow. There was a Stack Overflow thread where someone was asking, can I run OpenCV on a microcontroller?

Starting point is 00:08:17 And the answer was no. And so we set out to change that. That no longer is the highest ranking thing on Google search when you search for computer vision on a microcontroller. But let me tell you, when we started, it stayed up there for quite a few years. And you actually you just said you have 1.7 by 1.6? Yeah, about 1.7 inches by like 1.3 or so, I think. That's our standard cam size. So it's pretty tiny. And this is a board. This or so, I think. That's our standard cam size. So it's pretty tiny.

Starting point is 00:08:46 And this is a board. This isn't a full camera. This is something you put into another system. Well, it's a microcontroller with a camera attached to it. So we basically have a camera module and a microcontroller board and then IOPens. And so you can basically sense, you can get camera data in, detect what you wanna do, and then

Starting point is 00:09:06 toggle IOPens based on that. So kind of everything you need in one system. It was meant to be programmable, so you wouldn't have to attach a separate processor to figure out what the data was. This is important because a lot of folks struggle with inter-processor communication, and so not having to do that and just being able to have one system that can run your program makes it a lot easier to build stuff. And when you say programmable, I mean, micropython.

Starting point is 00:09:38 Not just we have to learn all about the camera and about the system and start over from scratch, but just micPython. Yes. So we have MicroPython running on board. MicroPython's been around for 10 plus years now. And it basically gives you an abstraction where you can just write Python code and be able to process images based on that.

Starting point is 00:10:03 And so all the actual image processing algorithms and everything else, those are in C, and they're optimized. And then Python is just an API layer to invoke the function to call a computer vision algorithm and then return results. A good example would be, let's say you want to detect QR codes. There's a massive C library that does the QR code detection.

Starting point is 00:10:24 So you just give it a image object, which is also produced by a massive amount of C code that's doing DMA accelerated image capture. And then that returns a list of Python objects, which are just the QR codes and the value of each QR code in the image. So it makes it quite simple to do things. But from my perspective as a user, it's just take picture, get QR code information. Yes.

Starting point is 00:10:55 Yes. And so we make it that simple. Isn't that so nice? But QR codes, OK, so we know how to interpret those. But I can do my own AI on this. I can make an orange detector. Or as I call it, machine learning. Yes.

Starting point is 00:11:15 Or as I call it, regression. Linear regression at scale. Sorry. Linear regression at scale. Linear regression. Sorry. So what's new with OpenMVCam is that we've actually launched two new systems, one called the OpenMVCam N6 and one called the OpenMV AE3. And so these two systems are super unique and super powerful. Microcontrollers now come with neural network processing units on board. And these are huge because they offer an incredible amount of performance improvement.

Starting point is 00:11:55 Case in point, it used to be on our open MVCAM H7 Plus, which was one of our highest end models that we currently sell. If you wanted to run like a YOLO object detector that detects people in a frame or oranges or whatever object you train it for, that used to run at about 0.6 frames a second. And the system would draw about 180 milliamp-years or so at 5 volts running. With the new systems, we've got that down to 60 milliampere as well running, so 3x less power, and the performance goes from 0.6 frames a second to 30. And so if you do the math there, it's about a 200x performance increase. So crazy. I was like Moore's Law has gone out of control.

Starting point is 00:12:40 Well, that's what happens when you put dedicated hardware on something. Yes. Well, and the A3 is the small one, the cheaper one. It's like 80 bucks. Yes. And I made a point earlier about saying the size. One inch by one inch. But yes, this one is tiny.

Starting point is 00:12:56 The size of a quarter. We got everything down to fit into a quarter. Ha. I mean, your camera lens is the biggest part on that, isn't it? And we wanted to make that smaller too. It ended up being that big though because the, so for the audience, the OpenMV AE-3, we got two models. The N6 is kind of a successor to our standard OpenMV cam line.

Starting point is 00:13:20 So it's got tons of I.O. pins. It's got a removable camera module that you can put in multiple cameras and other various things, and it's got all this hardware on board, full featured. And then we built a smaller one though called the OpenMV AE3, which is honestly, the way it came about is an interesting story, and so I do want to go into that in a little bit. But it came out to be one of the, a really, really good idea on making this such a small, tiny camera. It's so tiny you can put it in a light switch. I mean, at one inch by one inch you can put it almost anywhere. And it features a neural network processor, a camera, a time of flight sensor, a microphone, an accelerometer,

Starting point is 00:14:02 and a gyroscope. So five sensors all in one, and a one inch by one inch form factor with a very powerful processor on board. I'm just boggled. I mean, it's got everything, and it's so small. And I just... You work with microcontrollers? I know, and I work with small stuff. This also has the camera.

Starting point is 00:14:27 It's always been for me, once you have a camera, everything gets big again because it's just... Do you have your phone? I do want to ask about the cameras, just briefly, so we can get that out of the way because I'm camera focused these days. So what kind of... What are these? One megapixel. One megapixel and what kind of field of view do those two options have? Both of them are just to try to aim for around 60 degrees field of view. It's pretty standard.

Starting point is 00:14:57 Both of them though actually have removable lenses, which is really really nice. So for the OpenMVcam N6, that's a standard M12 camera module. Got it. And so you can put any lens you want on it. And we have a partnership with a company called Pixar Imaging. So Pixar, we actually met them at something called TinyML, which got rebranded to the Edge AI. No.

Starting point is 00:15:25 Edge Impulse. No, it's not Edge Impulse. Okay, there's too many Edge things. Anyway, it used to be called TinyML. It's now the Edge AI Foundation, I think. And- Wait, those aren't related? They're the same organization.

Starting point is 00:15:40 It's a strange names. No, no, the Edge Impulse and Edge AI aren't related. One's a company, one's a conference. Well, yes, but- Oh yeah, one's a company and one's a consortium. No, no. The Edge Impulse and Edge AI aren't related. One's a company, one's a conference. Oh, yeah. One's a company and one's a consortium. Oh, I don't know. But are they... I mean, Edge Impulse is a member company of the Edge AI Foundation.

Starting point is 00:15:56 It's too confusing. And then you have the Edge AI Vision Alliance, which is a different thing. No, no. We're not doing it. And then you have the Edge AI hardware. What is it? Edge AI hardware conference, which is another thing. So there's too many edges here.

Starting point is 00:16:17 Makes it a little challenging. Anyway, but yeah, what we were talking about, cameras. So you have, we have two different types, M12, which is a standard camera module, and we have a partnership with Pixart Imaging. And so they actually hooked us up with 1 megapixel color global shutter cameras. And so we're making the standard on both systems. So the N6 has a one megapixel color global shutter camera and this can run at 120 frames a second at full resolution.

Starting point is 00:16:48 So that's 1280 by 800. And then the OpenMV AE3 has the same camera sensor but we shrunk the lens down to an M8 lens which is also removable. So you can put like a wide angle lens or a zoom lens on there if you want. And that'll be able to run at a similar frame rate. It has less resources than the N6,

Starting point is 00:17:07 so it can't actually achieve the same speed and performance of the N6, but we're still able to process that camera at full res. So it could do like maybe 30 frames a second at 1280 by 800. But for most of our customers, we expect people to be at the VGA resolution, which is 640 by 400 or so on this camera.

Starting point is 00:17:27 And that'll give you about 120 frames a second. OK, I want to switch topics because I have never used an NPU and I don't know what it is. And I mean, it's a neural processing unit. I got that. So it has some magical AI stuff that honestly seems like a fake. It's a whole bunch of... How is it different from a GPU?

Starting point is 00:17:50 It's all the parts of GPU without the graphics necessary. So it's all the linear algebra stuff. Yeah. So it's just an adder multiplier? Pretty much. How is it different from an ALU? It's got a billion of them. It's a vectorized ALU?

Starting point is 00:18:04 Yeah. Pretty much. This is actually what I wanted to talk about more. Not to try and do the sales pitch on this. I think folks will figure out things themselves, but I wanted to talk to some embedded engineers here about cool trends in the industry. So, MPUs, what are they? Yeah, basically there was an unlock for a lot of companies, I think, that they realized, hey, we've got all these AI models people want to run now. And this is a way to actually use sensor data. So you've had this explosion of the IoT hardware revolution happened.

Starting point is 00:18:42 People were putting internet on microcontrollers and connecting them to the cloud, and you'd stream data to the cloud. But the challenge there is that that's a lot of data being streamed to the cloud, and then you have to do something with it. And so you saw folks just making giant buckets of data that you never used it for anything.

Starting point is 00:19:02 You might add like an application to visualize the data, but you technically never actually put it to use. You just have buckets and buckets of recordings and timestamps. And that's all very expensive to maintain, to have. And while it's nice per se, if it's not actionable, what good is that? And a lot of times, it's not quite clear how do you make an algorithm that actually, how do you use accelerometer data and gyroscope data directly? If you've seen the kind of filters and things

Starting point is 00:19:34 you need to do with that classically, they're pretty complicated. And it's like, how do you make a step detector or a wrist shaking? There's not necessarily a closed form mathematical way to do that. You process this data using a model where you capture what happens, how you move your hand, etc. Then you regress and train a neural network to do these things.

Starting point is 00:20:01 That unlock has allowed us to make sensors that had very interesting data outputs and turn those into things that could really detect real world situations. And of course, this becomes more and more complicated, the larger amount of data you have. So with a 1D sensor, it's not too bad. You can still run an algorithm on a CPU, but once you go

Starting point is 00:20:26 to 2D, then it starts to become mathematically challenging. That's the best way to say it. And you're 1 and 2D here for an accelerometer are the X and Y channels or do you have a different dimension? It would just be like an accelerometer is just a linear time series, right? So to build a neural network model for that, you only need to process so many samples per second versus if an image, you have to process the entire image every frame. Sorry, when you went from 1D to 2D and you were still talking about accelerometers, I was like, is that XYZ or something else?

Starting point is 00:21:06 Well, you also have 3D accelerometers, so it's very six channels if you think about it, right? And usually there's gyros and sometimes there's magnetometers. So yes, you throw all the sensors on there, but those are still kind of 1D signals as opposed to the camera, which is a 2D signal because... 2D and large dimension too. Right, right. Yeah, yeah.

Starting point is 00:21:31 Like an accelerometer, right? You might have the window of samples you're processing. Maybe that's, I don't know, like several thousand at once per ML model and a thousand different data points per time you do run the model. That sounds like a lot, but not really compared to several hundred thousand that would be for images. More millions, yes.

Starting point is 00:21:55 Yeah. There's a reason there's an M in megapixel. Yes, yeah. There's a lot of pixels. Yes, absolutely. So there's a lot more. So anyway, enter the MPU. Basically processor vendors have been doing this for a while, like your MacBook has it,

Starting point is 00:22:13 where they've been putting neural network processors on systems. And what these are, are basically giant multiply and accumulate arrays. So if we look at something like the STM32N6, it'll have 288 of these in parallel, running at a gigahertz. And so that's 288 times 1 gigahertz for how many multiply and accumulates it can do per clock. OK, my brain just broke. Let's break that down a little bit.

Starting point is 00:22:42 OK, 288 parallel add multiply units. Yes. And so any single step is going to be 288, but I can do a whole heck of a lot in one second. Yes. So that's 288 billion multiply and accumulates per clock. Then it also features a few other things, like there's operation called rectified linear unit. That's also counted as an op, it's basically a max operation. That's done in hardware.

Starting point is 00:23:14 Then you go from 288 to 500 effectively, and there's a few other things they can do in hardware for you also. All that combined, it's equivalent to about 600, you know, 600 billion operations per clock cycle for basically running any ML model you want. But there's a problem here. I don't know that I believe in AI or ML.

Starting point is 00:23:41 It seems like, I mean, it's- Well, it's not confused to two things. The philosophical and the technical? No, I mean, I was going to have a disclaimer about we're not talking about LLMs. Ah, okay. Which is what I, I traditionally am up in arms about. This is a copyright issue. This is image classification.

Starting point is 00:24:02 No, not anyway. Everybody knows my opinion on AI quo LLMs. And if you don't, it'll be just cussing me next week so we can find out. I work on machine, one of my clients works on machine learning stuff. I work on machine learning stuff I have for years. It's very useful for these kinds of tasks

Starting point is 00:24:20 of classification and- And self-driving. Detection, it's not useful for self-driving because that doesn't seem to work very well. It worked fine when I did it. Yes, your truck on a dirt road following a different truck at 20 miles an hour. It worked. Yes.

Starting point is 00:24:40 Anyway, what was your point? Well, her point. Your point is probably better than both of our points. My point was we are seeing a lot of funding, a lot of things that go into processors that are called neural processing units, like they're supposed to be used for neural networks. And yet we're also seeing some difficulties with the whole ML and AI in practice. I don't think those difficulties are actually related to what you're working on,

Starting point is 00:25:15 but do you see them reflected in either your customers or your funders or people just talking to you and saying, why are you doing this? Because it's not really working out as well as people say it is. Well, I think there's some difference there. One, we were using the branding AI because that's what everyone uses nowadays. You have to. Just to be clear, I would prefer to call it ML,

Starting point is 00:25:40 but that's old school now. So everyone's using AI. So we had to change the terminology just to make sure we're keeping up We're doing we're just doing CNN accelerators. And so these are probably pre-chat GPT really convolutional neural networks, yeah, yeah, and so what they're doing is So most of the object detector models for example example, let's say you want to do something like human body pose, facial landmarks, figure out the digits on where your hand is, like your hand detection, figuring out how your fingers are, your finger joints, things like

Starting point is 00:26:18 that. These are all built off these convolutional neural network architectures that basically imagine small image patches that are being convolved with the image. So like imagine a three by three activation pattern and that gets slid over the image and produces a new image. And then you're doing that in parallel, like it's way too hard to describe how COBDECs work too. Let me try.

Starting point is 00:26:46 At a high level. You have a little image that you remember from some other time, and maybe it's a dog's face, and you slide it over the image you have here in front of you, and you say, does this match, does this match, does this match? Or how well does it match? How well does it match? And then at some point, if you hit a dog's face,

Starting point is 00:27:06 it matches really well. And now you can say, oh, this is dog. Now you do that with eight billion other things you have remembered through the neural network. And you can say, well, this is a face, or this is where I have best highlighted a cat face, and this is where it best highlighted a dog face and there's a 30% chance it's one or the other.

Starting point is 00:27:28 That sort of, the convolving is about saying, so I have this thing and I have what's in front of me and I want to see if this thing that I remember matches what's in front of me. And there are lots of ways to do it. You can have different sizes of your remembered thing because your dog face might be big or smaller in your picture. And if you look inside the networks after they've learned and then to kind of interrogate the layers, you can see what it's learning.

Starting point is 00:27:57 Like it'll learn to make edge detectors and lots of even more fine features than just a face. It might just be, okay, there's a corner, and a corner might mean a nose, but it might mean this, and it combines all of that. So it gets very sophisticated in the kinds of things it looks for. You can look inside a convolutional neural network that's been trained and kind of get a sense for what it's doing. Yeah. And it's, I mean, this is the way things are going nowadays for how people do these higher level algorithms.

Starting point is 00:28:26 And honestly, you really couldn't solve it before without using these kinds of techniques. Just because most of the world is with these weird amorphous problems where there's no closed mathematical form to describe like what is a face, right? You actually have to build these neural networks that are able to solve these problems.

Starting point is 00:28:47 And it's funny to say it, because this started blowing up 10 plus years ago now. And so it's like, it's actually been here for a long time. It's not necessarily even new anymore. Definitely not. And so when I fuss about is AI still a thing, or is it going to be a thing, it's not this class

Starting point is 00:29:05 of problem. This class of problem, the machine learning parts are really well studied and very effective. And with these, and this is the last thing I'll say on this, you do get with the output a confidence level. Like it says, this is a bird, and it says, I'm pretty sure, 75% or 85%. You can use those in your post-processing to say, well, what action should I take based on this confidence, as opposed to certain other kinds of AI things that do not do that. Yeah.

Starting point is 00:29:36 So the chat GPT-like stuff, that's a whole different ball game. Let's come back next year maybe and talk about that. Yeah. You mentioned YOLO, which is an algorithm. Could you tell us basically the 30-second version of what YOLO is? Yeah, yeah. So YOLO is you only look once. So when you're trying to do object detection, right, the previous way you did this was that

Starting point is 00:30:02 you would slide that picture of a dog over the entire image checking every single patch at the same time Well, you know patch one after another and you can imagine that's really computationally expensive and doesn't work that well It takes like literally the algorithm would run for seconds to determine If something was there, so if you only look once it's able to do a single So if you only look once, it's able to do a single snapshot. It runs the model on the image once, and it outputs all the bounding boxes that surround all the objects that it was trained to find.

Starting point is 00:30:34 That's why it's called you only look once. Because instead of it, before these, there's another one called single-shot detector, SSD. Before these were developed, yes, the way that you would find multiple images or multiple things in an image would be that you would slide your classifier, basically a neural network that could detect if, you know, a certain patch of the image was one thing or other.

Starting point is 00:30:59 And you would just slide that over the image at every possible scale and rotation, checking every single position. And that would be, hey, the algorithm could run on your server class machine and it would still take 10 seconds or so to return a result of these are all the detections. And so you can imagine on a microcontroller that would be, you'd run the algorithm, come back a couple days later and you get the results. But you have YOLO running pretty fast on the new Kickstarter processors. Did you code that yourself?

Starting point is 00:31:31 No, no. It's thanks to these neural network processors. And so the way they work, actually, the easiest one to describe is the one on the AE3, which is the ARM Ethos MPU. That one uses a library called TensorFlow Lite for microcontrollers, and so it's an open source library that's available. They have a plugin for different accelerators. So basically, if you don't have an Ethos MPU, you can do it on the CPU, just a huge difference in performance. And if you have the ethos MPU available, then the library will offload computation to it.

Starting point is 00:32:12 And so you just give it a TensorFlow Lite file, which basically represents the network that was trained. And as long as that has been quantized to 8-bit integers for all the model weights, it just runs. And the MPU is quite cool in that you can actually execute the model in place. So you can place the model in your flash, for example. And the MPU, you just give it a memory area

Starting point is 00:32:36 to work with called the tensor arena for its partial outputs for when it's working on things. And it'll run your model off flash, execute it in place, and produce the result and then spit out the output. And it goes super fast. We were blown away by the speed difference. For small models, for example,

Starting point is 00:32:57 it's so crazy fast that it basically gets it done instantly. An example would be there's a model called FOMO, Faster Objects, More Objects, developed by Edge Impulse. Yeah, the name is on the nose, right? And then that object went from running at 12 frames a second, about 12 to 20 frames a second on our current open MVCAMs to 1,200.

Starting point is 00:33:23 I was going to ask how you, models can be fairly sizable and with microcontrollers we're usually not talking about many megabytes of RAM, so I was, yeah, you answered the question. So it gets stored in flash and executed directly out of flash. And 8-bit, and that's an optimization that's happening all over, is that it turns out you don't need your weights to be super bit heavy. You can have lower resolution weights and still get most of the goodness of your models. Yeah, yeah, you can. And this also does amazing things for memory bus bandwidth, because if you imagine if you're moving floating point numbers, like a double or a float, that's the four to eight times more data you need to process things.

Starting point is 00:34:06 And so if 8-bit, yeah, it's just a lot snappier trying to get memory from one place to another. Quantizing to 16-bit is not too difficult. Quantizing to 8-bit, which I've tried a few times for various things, there's some steps required there that are a little above and beyond to saying, here's my model, please change a tape bit for me, right? You have to- Yeah. Yeah. Typically what you need is to actually do something called,

Starting point is 00:34:35 you wanna do quantization aware training, where when the model is created, whatever tool chain you're using to do that or tool set, those actually need to know that you're quantizing, that you're going to be doing that. Otherwise, when it tries to do it, it'll just result in the network being broken, basically. You can't just quantize everything without any idea of what data is flowing through it. Otherwise it'll not work out so well. And when we say 8 bits, we don't mean 0 to 255.

Starting point is 00:35:05 I mean, we do in some cases, but there are actually 8-bit floating point, and that's part of this quantization issue. I think these are your integers. No, it's not floating point. It's just scaling and offset. So each layer basically has a scale and offset that's applied globally to all values in that labor. And there's some more complex things folks are trying to do, like making it so that's even further refined, where you have different parts of a layer being broken up

Starting point is 00:35:35 into separate quantization steps. But so far right now for TensorFlow Lite for microcontrollers, it's just each layer has its own quantization. It's not more fine grained than that right now. OK, I wasn't aware it was per layer likeers. It's just each layer has its own quantization. It's not more fine-grained than that right now. OK, I wasn't aware it was per layer like that. That's cool. I didn't realize that the NPUs basically

Starting point is 00:35:52 take TensorFlow light. That gives the power to a lot of people who are focused more on TensorFlow and creating models and training them. So what did you have to do? Well, let's say it's not that easy. Really? Working with TensorFlow is not that easy.

Starting point is 00:36:15 Yeah, it's not that easy. Let me say it like that. And for no good reason, I think. But anyway. Yeah, yeah, no. Well, I mean, well, what happens is actually different manufacturers have different ways of doing this. So for ST, for example, they do not use TensorFlow Lite

Starting point is 00:36:33 for microcontrollers. They have their own separate AI system called the ST Art AI Accelerator. And so their MPU is more powerful. But totally different software library and package. None of the code is applicable. You need to use their tool chain to build models and their libraries to run. Let's talk about this SD. It's not a good idea. Well, I mean, the reason they did that is because

Starting point is 00:36:59 they wanted to have more control over it. it totally makes sense. And it lets them optimize in ways their processor is optimized for instead of with TensorFlow Lite, where you have to optimize the things everybody's doing instead of what you're specifically. Never mind. Yeah, and I think that's the reason why they went for it. Another reason is, and this is a weird architecture divergence, but with TensorFlow Lite, you

Starting point is 00:37:25 have basically a standard library runtime that includes a certain number of operations. And so for us with the OpenMVCam, we have to enable all of them, even if you don't use them. And so ST was trying to be more conscious to their customers and say, okay, for customers who have less RAM or flash available on their chips, we want to then instead compile the network into a shared library file, basically, that just has the absolute minimum amount of stuff needed. And then that way it's executed on your system.

Starting point is 00:37:56 And if you don't have an MPU, it works on the processor. And if you do, then it goes faster. Yeah. The only challenge of that is that means the processor, and if you do, then it goes faster. The only challenge of that is that means your entire firmware is linked against this model. It's not like a replaceable piece of software anymore. So it's optimum from a size standpoint, but it means that being able to swap out models without having to reflash the entire firmware becomes a challenge. And for us with MicroPython, one of our goals is so that you don't have to constantly update the firmware to change any little piece of the system.

Starting point is 00:38:33 And so it was a lot easier to get the integration done for the ARM Ethos MPU, because it was kind of built that way, where the library is fixed and the model is fungible. Can you run multiple models? Like if I, you said the OpenMV had a camera as well as some other sensors, can I have a model running the vision part and one looking at the sensors for gestures and things? Okay. Yeah, yeah, that's part of the cool feature. With OpenMV CAM AE3, for example, you can actually have,

Starting point is 00:39:04 we actually have two cores on it, so I wanted to get into that in a little bit. But you can basically, once you finish running the model, you can have multiple models loaded into memory and you just call inference and pass them the data buffer, whatever you want. Obviously only one of them can be running at a time, but you can switch between one or another

Starting point is 00:39:23 and have all of them loaded into RAM. And so if you wanted to have five or six models running and doing what you want, you can. And again, the weights are stored in flash, just the activation buffers are in RAM. So as long as the activation buffers aren't too big, there's really not necessarily any limit to this. It's just how much RAM is available on the system. We're going to come back to some of these more technical details in a second, but if I got the AE3 from your Kickstarter, which just launched and you will fulfill eventually,

Starting point is 00:39:55 but I got one today, what is the best way to start? I mean, do I go to tinyml.com, which now will redirect me somewhere else? And how do I start? Yeah, we've actually thought about that for you. So there's two things. One, we built into OpenMV IDE our software, a model zoo. And so this basically means you're going to be able to have all the fun, easy to use models like human body pose, face detection, facial landmarks, people detection, all of that. There's well-trained models for that, and those are going to be things you can deploy immediately.

Starting point is 00:40:40 And so we'll have tutorials for that. And then for training your own models, though, we're actually in partnership with Edge Impulse and another company called RoboFlow, which is a big leader in training AI models. And so with both of their support, they actually allow you to make customized models using data. And one of the awesome things that RoboFlow does, for example, and Edge Impulse,

Starting point is 00:41:02 is that they do automatic labeling for you in the cloud using large language models. There's these ones called vision language models that are kind of as smart as chat GPT, but for vision. So you can just say, draw a bounding box around all people in the image and it'll just do that. You don't need to do it yourself. And it'll find like most of the people in an image and draw bounding boxes around them. Or you can say oranges or apples or whatever you're looking for.

Starting point is 00:41:31 And then using that model, that basically helps you create, you just take the raw data you have, ask the vision language model to mark it up with whatever annotations are required to produce a data set that can then be trained to build one of these more purpose-made models that would run on the camera. So it's kind of like extracting the knowledge of a smarter AI model and then putting it

Starting point is 00:41:56 into a smaller one that can run on board. I'm familiar with Edge Impulse, but Roboflow is new to me. Have they been around for very long? Are they robotics focused or is it just now it's everything machine learning and vision for them? Well, Roboflow is just focused on machine vision. They're actually quite big in the more desktop email space. And so like Nvidia Jetson folks and all the developers who are at a higher

Starting point is 00:42:26 level than microcontrollers, that's where they have been playing. But they are one of the leaders in the industry for doing this and making it easy for folks to run models. And we're working with them to kind of help bring these people into the market, to help make it so that you can train a model easily. What they do is they'll provide you with a way to train a YOLO model, for example, that can detect objects, and the object can be anything. And as I mentioned, they'll help bootstrap that so you don't even have to draw bounding boxes yourself or label your data. You just go out and collect pictures of whatever you want, put that into the system, ask the vision language

Starting point is 00:43:05 model to label everything, and then you can train your own model quick and easy. Is it really? Well, the deployment might be a challenge. We've got to work through those issues, but the hope is it will be by the time we ship. Last time we talked, we hinted at the stuff with the Helium SIMD. Actually, I should start that because I'm not gonna assume other people have heard that episode. What is the Helium SIMD and why is it important?

Starting point is 00:43:36 Especially since you have this NPU, that's what I was gonna ask, because you mentioned it, yeah. Yeah, well, there's two big changes that we're seeing actually on these new microcontrollers. I think the first thing to mention is, well, there's two big changes that we're seeing actually on these new microcontrollers. I think the first thing to mention is yes, they all have neural network processing units on board. So these offer literally 100x performance speedups.

Starting point is 00:43:55 I mean, people should take aware of that. I don't know where you get 100x performance speedups out of the box on things. Two orders of magnitude is a pretty big deal. But even more so, ARM also added the Cortex M55 processor for microcontrollers, which feature vector extensions. So last year, we were just getting into this, and we were thinking about what does it look like to program with vector extensions.

Starting point is 00:44:19 And I hadn't done any programming yet. I was just talking about the future, what it could be. But now, with launching the OpenMVCAM AE3 and N6, I spent a lot of time writing code in Helium. In particular, the OpenMVCAM AE3 is actually somewhat of a pure microcontroller. It does not have any vision acceleration processing units. There's nothing in there specifically

Starting point is 00:44:44 to make it easier to receive, to process camera data. It has a, it has a, it has MIPI CSI, which allows you to receive camera data and it also has a parallel camera bus, but there's no, normally processors nowadays have something called an image signal processor that will do things called image debaring. It'll do scaling on the image, color correction, a bunch of different math operations that have to be done per pixel. So it ends up being an incredible amount of stuff the CPU would have to do. That doesn't actually exist on the OpenMVCAM A3 and hardware.

Starting point is 00:45:19 The N6 from ST has that piece of logic. And so it's able to have 100% hardware offload to bring an image in. That's why it's able to achieve a higher performance from the camera because there's nothing, you don't have to do any CPU work for that. But what we did for the OpenMVCAM AU3 is we actually managed to process the image entirely on chip using the CPU. So the camera itself outputs outputs this thing called a Bayer image which is basically each row is red green red green red green and then the next row is red blue red blue all green blue green blue green blue and then it alternates back and forth and so to get any pixel if you want to get the

Starting point is 00:45:59 the color of any particular pixel location you have to look at pixels to the left right up down and diagonal from it. And then compute. And that changes. That pattern changes every other pixel. Because depending on the location you're at, you're looking at different color channels to grab the value of the pixel.

Starting point is 00:46:16 And you have to compute that per pixel to figure out what the RGB color is at every pixel location. And so if you just think about what I just said in your head, it's a lot of CPU just to even turn a Bayer image into a regular RGB image that you can even use to process and do anything with. And basically, every digital camera sensor works this way. Color resolution is far lower than the absolute pixel

Starting point is 00:46:41 resolution because of these filters. Because a camera doesn't know about color, right? It's just measuring light intensity. And so to get color, you have to put filters in front of it and then, yeah, do this kind of math. Yeah. And so what we had to do is we're debaring the entire image on the OpenMVChem A3 in software using Helium.

Starting point is 00:47:02 And what we were able to achieve is this is a 400 megahertz CPU. And so we were able to achieve is, this is a 400 megahertz CPU. And so we're able to do about 120 frames a second at the VGA image resolution, which was about 0.3 megapixels at, yeah, 120 frames a second. So what is that? What's the math on that? 0.3 megapixels times 120? Yeah, about 36 million pixels a second with the processor.

Starting point is 00:47:29 What's crazy here though is that you try to do that on a normal, regular ARM processor. Which I have. Yeah, for M7, the previous generation. You wouldn't get that. There's about a, Helium offers probably around 16 times the performance increase, realistically. And that's huge. I mean, again, it takes an algorithm that would be totally not workable. Like you'd get maybe 20 frames a second, 30 at the best, and now we're at 100.

Starting point is 00:47:55 Well, yeah, like 20 or 30 maybe at the best, and now we're at 120 frames a second, right? I mean, that's crazy. Sorry. Yeah, that's crazy. Sorry. Yeah, it's good. I'm used to thinking about, like, if you're going to do image stuff, if you're going to get complicated or you aren't quite sure what you need to do, you probably need to go to the NVIDIA ORIN or the TX2 or whatever. I would usually I would first go to Nvidia's website and see whatever their processor is. And what dev kit I could get there,

Starting point is 00:48:37 which then involves Linux and all of that. When did the microcontrollers catch up? Did they catch up? Or are they just one step behind and I'm three steps behind? Well there's still one step behind. Like if you look at Nvidia or etc. Those have a hundred tops. And so with the opening became a hundred tera ops. So a hundred trillion operations a second. Right.

Starting point is 00:49:02 So microcontrollers are now just starting to hit up to one tera op. So there's still a hundred X performance difference there. But what's important to understand is with the current performance of these things, they're good enough to do useful applications. And what's valuable is that they can run on batteries. That's the big unlock here. So if I already can run on batteries, That's the big unlock here. So if we look at the open- My Ori can run on batteries. They just weigh 10 pounds. And are carried on a six foot wingspan drone. I don't see the problem.

Starting point is 00:49:36 Yeah, yeah. So as long as you have a big vehicle, it's no issue, right? Right. But that's the challenge here, is that you need to have a big vehicle for that. So what we're looking at is, OK, with the OpenMVCAM AE3, for example, it's going to draw 60 milliamps.

Starting point is 00:49:54 And this is like, I can't go over this number. It draws 60 milliamps of power at full power. That's amazing. At full power. And in terms of operations per watt, that's way beyond what ORAN or anything does. Well, think about it like this. A Raspberry Pi 5 without the Healy, without an external AI accelerator, that gives you

Starting point is 00:50:14 100 giga ops if you peg every core at 100%. And this thing is able to give you double that with that much less power consumption. These AI accelerators are incredible in the performance. I mean, like again, 100x performance increases, nothing to laugh at. It's a pretty big deal. But we're looking at 60 milliamps power draw, full bore. So 0.25 watts or so. And we got it down to about 2.5 milliwatts at deep sleep, but there's some software optimization we still need to do because we think we can get it below 1 milliwatt while it's sleeping. Anyway, the reason to mention that though is that, okay, two AA batteries, that's one

Starting point is 00:50:58 day of battery life at 60 milliamps, two AA batteries. So you can have the camera just running all the time, inferencing, like, you know, if you want to do one of those chest-mounted cameras like the humane AI pin, for example, this little thing could do that and give you all-day battery life. And again, two energizer alkaline AA batteries, nothing particularly special, cost a dollar each. And so if you put a little bit more effort in, and you actually have like maybe a $30 battery or something, you have more than a couple days of battery life. And then if you think about, okay, maybe I can put it into deep sleep mode,

Starting point is 00:51:32 where it's waiting on some event to happen, like maybe every 10 minutes it turns on and takes a picture and processes that, now you have something that you can build an application out of. Like, let's say you want to detect when people are throwing garbage in your recycling can. You could put this camera in there and every 10 minutes or every hour. Wait, wait, wait. Why would I want an image of me making a mistake all the time?

Starting point is 00:51:56 Sorry. I live in a condo, so we have a shared place. It's a problem constantly with people complaining because in San Francisco, at least, Recology does hand out fines for you violating that. Yes. I can make a little beepy sound or change the shoot or... Like the time of flight sensor. If I wanted to make a smart bird feeder, I could wait

Starting point is 00:52:25 until a bird was actually detected physically without using the camera first before starting to take an image. Yeah, you get an interrupt from your accelerometer that the thing has got a bird. Bird detected. Oh yeah, no, very easily. And this processor can be in a low power state waiting on that. And the second it happens, yeah, you wake up and you proceed to take an image and check to see what, you know, let's say you want to know what birds are appearing in your bird feeder,

Starting point is 00:52:53 right? Well, when birds touch the bird feeder, they cause it to shake, right? So you've got a nice acceleration event. And so you could have the accelerometer in a really low performance state just waiting for when it sees any motion detection and when that happens then the camera turns on takes a picture runs inference and then if it sees a bird there it could then connect to your Wi-Fi or we're also going to be offering a cellular shield for the open MV cam N6 and AU3 and it could you know connect to the cellular system and send a message like a text message to you and then go back to sleep. And maybe it could text message the entire image too, if you wanted.

Starting point is 00:53:29 So these things are going to be doable. And the best feature here is, again, it could last on batteries and then you could also use like a solar panel, for example, and have that attached. And then, you know, then the battery life is really, at that point, infinite. The bird feeders do exist. Do they already have open MV cameras in them? No, they're using something else right now. But I don't imagine it actually does much processing on board though to, you know, determine

Starting point is 00:53:57 what bird type or etc. Probably just get an image. It's cloud stuff. Which means you're sending your data to the cloud where, you know, is someone else's computer in your yard. And they might be spying on your bird. Well, I know. I know.

Starting point is 00:54:14 An example would be trail cams. Yeah. So trail cameras right now, a big complaint of them currently is that they take a lot of images of nothing because anytime there's any motion or whatever, they turn on snap an image. And so you might have, you know, a...

Starting point is 00:54:29 Wind. Yeah. Wind is such a problem with such cameras. Especially in your trees, which they are generally for trails. Yeah. Yeah, trail cameras just, they're known right now to take tons and tons of images of nothing.

Starting point is 00:54:45 And you have limited SD card space on these, right? So if your trail camera is taking tons of images of nothing, then when it actually comes time for take a picture of something useful, it might have run out of disk space. It might have used up all of its batteries, right? Or just if you want to go and actually do something with that data, now you have thousands of images you have to look through trying to find the one that actually has the picture of the animal you're looking for. And so yeah, having this intelligence at the edge, there we go again,

Starting point is 00:55:14 using the word edge. Having this intelligence at these devices really allows you to make them much more easy to use really if you think about it because now the system is actually doing the job you want versus capturing a lot of unrelated things you don't care about and there is a privacy argument there too because the more we That's where I was headed. The more you push the intelligence to the edge the less you have to move stuff to the cloud where it's vulnerable and For certain applications you might have an entirely closed system. That's totally inaccessible to anyone outside without physical access, which is not possible if you're shipping stuff up to a cloud server to run on a GPU. Well, and Thumbodom bandwidth isn't that expensive anymore, but it's still not, I want to send videos all the time.

Starting point is 00:55:59 It's sending a text message that said, I saw this gecko you've been waiting for, would be way more useful. Yeah, no, absolutely. Because otherwise right now it would be, here's a picture of a gecko you were waiting for, it's a picture of like wind, and repeatedly over and over again. So yeah, no, it's gonna be fun

Starting point is 00:56:20 what you can do with these smart systems and what they're gonna be able to do. And being able to do. And being able to run in these low power situations is important. So I mentioned earlier, I wanted to touch on how we got to the OpenMVCAM AE3, for example, being so tiny. Why did we create a one inch by one inch camera? So tiny.

Starting point is 00:56:38 Yeah. Well, you know, honestly, I didn't think this direction in the company would be something we were going to support. I kind of wanted to keep the camera at the normal OpenMV cam sites. But the actual reason we ran into this and we made the OpenMV cam 83 is because the Aleph chip was actually super hard to use at the beginning of last year. We were... I remember some complaints around here too.

Starting point is 00:57:03 In a mission failure kind of mode with it to be honest. Yeah. There were a lot of promises for the Alice. Yeah, there were a lot of promises. There were bugs in the chip. I know that if you listen to our last episode, you'll have... Oh wait, I'm free to talk about this now. Yeah, yeah.

Starting point is 00:57:23 There were some issues. In particular, USB was broken. IOPens did weird things. You had to set the IOPens, the I2C bus to push-pull for it to work, which if you know about I2C, it should be open drain. Stuff like that, repeated stops on the I2C bus didn't work. Did you encounter, I certainly didn't, did you encounter any power issues, brownouts, flash corruption kinds of things?

Starting point is 00:57:54 No, luckily we didn't encounter those, but we had issues with the camera driver. When you put it into continuous frame capture mode, it just overwrites all memory with pictures of frames because it never resets its pointers when it's capturing images. It just keeps going and incrementing forever. OK.

Starting point is 00:58:14 Yes, Aleph. But you got it working. We got it working, though. And now it's the best thing ever. It's crazy how your whole interpretation of these microcontrollers changes once you get past all of the, oh my god, we're about to, you know, this is the worst idea ever, bugs. Because the way that the OpenMV-AE3 came about was these bugs were so bad,

Starting point is 00:58:37 we were running into so many issues. Because it's a brand new chip, by the way. This is a new processor. Yeah, that's the issue. Since beta, beta you know what I started using it was beta silicon yeah yeah beta silicon they have finally got to production grade silicon now so a lot of these bugs you won't encounter anymore they fixed them but we were kind of in the full bore of that and what we did actually is said to ourselves okay you know we want we put so much time and effort into this chip and trying to make a product out of it we need to ship something and you know, we want we put so much time and effort into this chip and trying to make a product out of it. We need to ship something. And, you know, we were just like, hey, what can we do to ship it? And it's

Starting point is 00:59:10 like, well, okay, if we remove all of the features that make the regular open MV cam fun, like the removable camera modules, and we just make everything fixed, then there's a possibility, a hope that we could actually build a product that makes sense. And so we were just like, okay, removable camera module gone, let's just make the camera module fixed. IOPens gone, there's a lot of issues with the peripherals, let's just get rid of that, make it so there's,

Starting point is 00:59:37 you know, minimal peripherals, minimal IOPens exposed, this way we don't have to solve two billion issues. And so we just went down the line basically just fixing everything instead of it being super, super, you know, having every single feature possible exposed and usable. We just said we're cutting this, cutting that, cutting this, cutting that. Removing your flexibility in order to optimize. Yeah, we kind of reduce the flexibility versus the N6 is super flexible, can do all this stuff, the AE3, we removed a lot of the flexibility,

Starting point is 01:00:11 but that ended up like actually creating one of our best ideas ever. It's, I share this with the audience just to say like, hey, good things can come out of going on a bad journey, basically. And I'm also constantly in favor of constraints. I think constraints can actually enhance creativity sometimes and lead you places you wouldn't necessarily

Starting point is 01:00:32 have gone if you tried to just solve every problem or be a general thing. Yeah. Yeah. And for us, what we decided was, OK, well, we don't know how to, like, so much stuff is having a trouble on this chip. Let's just, you know, make it tiny, right?

Starting point is 01:00:47 Everyone likes tiny. Use the smallest package they have. Just reduce the features. Not going to try to use every IOPEN. OK, that actually though yielded a lesser cost. But then we started to do fun stuff and level up our abilities. We're like, OK, well, I guess we're going to make it tiny. We're going to use all 0201 components, we'll use all the tiniest chips. And, you know, over the course of like,

Starting point is 01:01:09 I think it took me about three weeks or so to design it originally, we managed to cram everything for this camera into a one inch by one inch form factor. And it's been, I mean, just talking to people and showing this off pre-launch, everybody's been blown away. They're like, this is a camera that's one inch by one inch. You've got everything on there, processor, GPUs, MPUs, RAM. What makes the Aleph chip so special is it has 13 megabytes of RAM on chip, meaning you don't even need external RAM to use the system like all of these things.

Starting point is 01:01:48 And so yeah, that emerged through this weird process where we thought we were going one direction and going to make a normal system and ended up somewhere entirely different. And you've made a system. Okay, so I have to admit the N6 super cool cool addition to your product line, makes a lot of sense. But the 83 just gives me ideas. Goosebumps, right? It makes me think about things differently, like different directions. And I mean, one by one is too big to swallow, but there are a lot of places you could fit such a self-contained system Yeah, yeah, that's why everyone has been I mean like I'm glad we ended up at making this tiny camera

Starting point is 01:02:30 Because I wouldn't have gotten there. I was constrained by my own thought process on what our system should be given our previous form factor But yeah, now it's like yeah It is legitimately small enough to put it inside of a light switch, right? Like anything you can think of one inch by one inch fits almost anywhere. I mean, then your problems go back to how do you light it? How do you light the image well enough that you can use the machine learning? But that's a separate issue that is everywhere.

Starting point is 01:02:59 Well, are they, do you have an IR filter? It does for some of them. Flur boards that look really cool. Right, but you can use an IR filter? It does for some of them. Flir boards that look really cool. Right, but you can use an IR flood. Oh, you mean just take out the little lens? You can use an IR light. Oh, oh. It takes a lot of power.

Starting point is 01:03:16 People can't see, but. On detection, you can, yeah. Illuminate with IR. Well, you could do that. We potentially might make a different variance of the AE-3. Potentially, I'm not saying we're going to do that. But you could use different cameras. Like we're supporting this new camera sensor called the Prophecy GenX 320, which is, say, an event camera. It only sees motion. So pixels that don't move, it doesn't see. And this one can work in very dark environments.

Starting point is 01:03:42 It's an HDR sensor, so it can work in bright and dark environments. And that one, for example, is also very privacy preserving, because literally it doesn't see anything but motion. So pixels don't really have color. So if you just wanted to track if someone's walking by or something, that one could be used for that. Also, because we're using, thanks

Starting point is 01:04:04 to our good relationship with PixArt, we actually have data sheet access for these cameras and support. How? How did you do that? I know, right? It's great. You can never talk to camera manufacturers. Yeah, no, we actually have the field application engineers on the line. We ask for help and they can respond. They probably tell you how to initialize them.

Starting point is 01:04:22 Yes, we got all of that. It was amazing. Damn it. It was amazing. Damn it. Christopher is jealous. So much time trying to get just cameras up and running. It's just so painful. For the audience, if you don't know about this, a big camera everyone uses is the Omnivision stuff because they built so many of them. And Omnivision provides no help support whatsoever to anyone using their products who aren't cell phone vendors. So you have to reverse engineer everything from a data sheet that has basically no descriptions. Or you pick it up from the internet and you send it a random set of bytes that you don't

Starting point is 01:04:57 know what they mean, but you know if you change it, something might go horribly wrong. Or it might get better. You don't know. Things go horribly wrong though. That's the thing about these cameras is that for the bytes, like you try to figure out what's the minimum set and then you realize that the default register settings don't work. Right. Like it does not even produce images or function at all

Starting point is 01:05:14 with the default settings on power on. You have to give it a special mixture of bytes. Some of which are undocumented, they're just bytes. Although they're all undocumented almost. They have reserved registers you'd be writing to. And it's like, what is this byte pattern to this reserved register? Ah, yes. I did manage to get the data sheet for the on-efficient camera we were working with, which helped some, but there was like 80% of the registers written to weren't in that data sheet.

Starting point is 01:05:39 Yeah, yeah. No, it makes it challenging. In particular, what's challenging about that is you can't do stuff like actually being able to set your image exposure correctly. So with these two cameras for the N6 and 83, we can actually control the exposure, control the gain, trigger the camera, all the features you'd want out of a global shutter camera.

Starting point is 01:06:00 Oh, also change the frame rate. Like everything you want to be able to control, we actually have the ability to control precisely and correctly now. We have kept you a little bit past the time and yet I still have listener questions. Do you have a few more minutes? Yeah, let's go into them. Matt, who I think might actually be able to help you if you answer this question correctly,

Starting point is 01:06:21 if you were to wave a magic wand and add three new and improved features to MicroPython, what would they be? Yeah, well, we actually have been waving that. We're working with Damien George from MicroPython directly to help launch these products. We've been big supporters of MicroPython from the beginning, and so each purchase of an OpenMV cam actually supports the MicroPython project. We actually want to fund this and make sure that these systems, when we're able to sell products based on MicroPython,

Starting point is 01:06:53 MicroPython is also being financially supported. And what we've worked to improve, for example, is the Aleph port is actually being, Damian has helped directly with that. And so you'll find that support for the Aleph chip is actually going to be mainstreamed into MicroPython with MIT licensing. Our special add-ons for image stuff will be proprietary

Starting point is 01:07:18 to OpenMV, but we will be mainstreaming the default Aleph setup. And so anyone who wants to use this Aleph chip now will have someone else already fixed all the bugs for them. So you will not have to fight through the giant, you'll not have to wade through all of the crazy problems that'll have been done for you and generally available to everyone in MicroPython. Similar, the same thing for the insects, the general purpose support for that. So, you know, we're bringing these things to the community.

Starting point is 01:07:45 People are going to be able to use a lot of the features we're putting efforts on and do things with them. There's also a new feature to MicroPython that we've supported called ROMFS. And so this is very, very cool. Remember how I mentioned those neural network processors execute models in place? So the way we actually make that easy to work with is that there's

Starting point is 01:08:06 something called a ROM file system on MicroPython that is about to be merged. So this allows you to basically, you can use desktop software to concatenate a whole bunch of files and folders like a zip file, and you can then create a ROM file system binary from that. And then that can be flashed to a location on the microcontroller.

Starting point is 01:08:29 And once that's done, then it appears as a slash ROM slash your file name or folder name directory. And so you have a standard directory structure that can be used to get the address of binary images in Flash. And so what this means is that we can take all of the assets that would be normally baked into the firmware and then actually put them on a separate partition where you can be updated. And your program then just references them by path versus address.

Starting point is 01:08:57 So this actually allows you then to technically ship new ROM file system images that could be new models, new firmware for like your Wi-Fi driver or whatever and etc. It's a very powerful feature. I mean, this is, I love this. You could also use it as assets. And so I immediately went to display assets. Whatever you want. And it's mapped into Flash, so it's memory mapped. Yeah. That's nice.

Starting point is 01:09:24 Yeah. That's awesome. That's awesome. We're finally entering the 90s. I just. Shh. I mean, to everyone who's listening here, so right now what you have to do is you have to bake things directly into the firmware, but this means the address of that stuff always changes constantly.

Starting point is 01:09:40 Or build your own weird abstraction thing that has, yeah, that's mapping addresses. It's perfectly fine. It was an image library. No, but it's super helpful. And it's actually a problem because if you think about the fat file system, the problem with fat file systems is they get fragmented, right? Files aren't stored necessarily linearly. Every four kilobytes or so is chunked up. And they can be located all over a disk, and so it's impossible to execute those in place. So it's a magically good feature.

Starting point is 01:10:13 And then one other thing we're working on, which is a request for Comet, I guess, right now, but hopefully it'll get worked on, is the ability to have segregated heaps in MicroPython. What I mean by this is right now you have one giant heap and it has all the same block size. And this is a problem because guess what? On the OpenMVCAM A3, we have a four megabyte heap on chip just for your application, four megabytes. And what this means now is that you can just allocate

Starting point is 01:10:43 giant data structures and do whatever the heck you want. But it also means that if you're storing the heap as 16-byte blocks still, that you actually have a lot of small allocations. And so it takes forever to do garbage collection then. And so what we're trying to do is have it so MicroPython can have different heaps and different memory regions where you have small blocks in one heap, larger blocks in another heap. And then the heap allocator will actually traverse through the different heaps and different memory regions where you have small blocks in one heap, larger blocks in another heap, and then the heap allocator will actually traverse through the

Starting point is 01:11:08 different heaps looking for blocks that make sense on a size to give you. All right. You talked there about features that are going into MicroPython and that last one was a new feature. We're trying to get that one. That one has not actually been implemented yet. That one's the one we want to see happen though, because it's important for dealing with megabytes of RAM now, which is like, you know, we didn't think we'd ever get there, right?

Starting point is 01:11:30 Everyone would be stuck on kilobytes of RAM with MicroPython, but nope, megabytes now. Why would you want more than 42k of RAM? It's very funny that we're talking about still microcontroller level quantities of RAM and ROM. I know. We're still talking about megabytes. With this huge thing on the microcontroller level quantities of RAM and ROM. I know. We're still talking about megabytes.

Starting point is 01:11:47 With this huge thing on the side that does a ton of compute. Some portions are moving up to desktop class and some are still stuck in the... I mean, that's just the way it is. If you want it to be cheap and small, some things are going to have to be cheap and small, but it's nice that some things are advancing. All right, so I think we got two more listener questions. Couple more, yeah. Simon wanted to know more about edge inferencing

Starting point is 01:12:15 and optimization and all of the stuff that we did kind of talk about. But we didn't mention one of the features of the N6 that you are very excited about. Yeah, yeah. So the STM32 N6 that you are very excited about. Yeah, yeah. So the STM32 N6 actually has an amazing new hardware feature, which is it's got H.26 hardware encoding support on board.

Starting point is 01:12:32 Oh, very good. And so that means now- What does that mean for those of us without perfect memories? Oh, yeah, it can record MP4 videos. Oh, yes! So this means you no longer need to be running a system that has Linux on board to have something that can stream H.264 or MP4 videos.

Starting point is 01:12:51 So I mentioned the N6 has an MPU, so it's got the AI processor. It's got something called the ISP, the image signal processor, so it can actually handle camera images up to 5 megapixel with zero CPU load. And then with the H.264 onboard, it can then record high quality video streams. And that can either go over ethernet and it's got a one gigabit ethernet interface on board. Again, a microcontroller, one gigabit ethernet.

Starting point is 01:13:14 It's so funny, we're putting these high powered things all around the edge of this tiny little CPU. Well, that's the thing, the CPU isn't tiny. It's 800 megabits of vector instructions. It outperforms the A-class processors of yesteryear, right? Yeah. Yeah. Yeah. It's new tech. But then you also have Wi-Fi and Bluetooth, so you could stream H.264 over the internet, or you can send that to the SD card. And even the SD card interface has

Starting point is 01:13:41 UHS-1 speeds, which is 104 megabytes a second or so. And so yeah, you can push that data anywhere you want and actually do high quality video record. Amazing. Okay, Tom wanted to ask about OpenMV being the Arduino of machine vision, but then that led me down to a garden path where you actually worked with Arduino. Could you talk about working with Arduino before we ask Tom's question? Yeah, working with Arduino has been excellent, really, really good. You know, thanks to their support, we really were able to level up the company and get in touch with customers that

Starting point is 01:14:18 we never would have met. You know, obviously, as a small company, people don't necessarily trust you. And so having Arduino be a partner of us has really helped us grow and meet different customers who are doing some serious industrial applications that wouldn't have considered us otherwise. So it's been a really good deal. We're super happy for Arduino and working with them actually. And we're super glad that they really supported us in this way and that they wanted to work with us. And this was the Arduino Nikola vision and now Arduino is manufacturing it? Yes, yes, the Arduino Nikola vision.

Starting point is 01:14:55 That was where we've been working with our partnership and also the Arduino Portenta, that's where we started. And we also support the Arduino Giga. So those three platforms run our firmware and we basically make those Arduino Giga. So those three platforms run our firmware and we basically make those work and have been, you know, have really shown off the power of those systems. So then I think you had something else you wanted to mention. Well, I did, but then I wanted to ask about Arduino versus MicroPython because I, you know,

Starting point is 01:15:21 can't actually go in a linear manner. micro Python because I can't actually go in a linear manner. Yeah. Well, this is one of those diversive things I wanted to talk about a little bit more. It's not really Arduino. It's really more of a C versus micro Python. Where do you think things are going? Yeah. The best way I would be to say that is, again, I mentioned that these microcontrollers have megabytes of RAM for heap now, megabytes of RAM.

Starting point is 01:15:50 I mentioned that we can allocate giant data structures. We have neural network processing units. We actually have something called ULab, which is a library that looks like NumPy running on board, or the OpenMVCam 2, which kind of lets you do up to a 4D dimension array processing. And this is quite useful for doing all the mathematics for pre and post processing neural networks and also doing, like you want to do matrix multiplications,

Starting point is 01:16:21 matrix inversions, determinants, all the things you would need to do linear multiplications, matrix inversions, determinants, all the things you would need to do linear algebra to normally solve to do signal processing and sensor data processing, all of that's on board. And so a question I pose to many people is where do you want to be doing all of that high-level mathematical work? Do you still want to be writing C code to do that, or would you like to be in Python? So much Python. I want to be in Python. I want to be in Python. I want to be in a Jupyter notebook on Python. Well, that's the beautiful thing. With MicroPython and with NumPy kind of like support, this means you can actually kind of take your Jupyter notebook code almost directly and run

Starting point is 01:17:03 it on board the system. And so this is where I think things are going, which is it's just now that we're starting to get free of the limitations on, again, like the Aleph ensemble chip, that's a six by seven millimeter package with 13 megabytes of RAM on board and the open MV cam and then the N6 processor that comes in different package sizes, but you can get one down to a similar size and with 13 megabytes of RAM on board and the open MV cam and then the N6 processor that comes in different package sizes but you can get one down to a similar size and footprint. It has less RAM on board so it might need some external memory but still comes with four megabytes on chip and so when you look at these things it's kind of like yeah we're in a new world where it kind of

Starting point is 01:17:43 starts to make sense to actually invest in standard libraries and higher level programming. The best way to say it would be, what's the default FFT library for C? Is there one? The one I wrote for whatever project I'm running. No, at least use CMSys. Well, you've got CMSys. It may not be efficient, but it's okay.

Starting point is 01:18:06 The one out of numerical recipes in C from 1984. That's the challenge is that, okay, well, let's say that one exists. What about doing matrix inversion? Yeah, yeah, yeah. Which one are you using there? And so these are all things where like having a standard package, but then also you can accelerate this with helium. So all of these standard libraries that are in MicroPython could be helium accelerated

Starting point is 01:18:33 under the hood. Now you're talking about a system where developers can all work on the same library package and improve it. And everyone can utilize that and be efficient at coding and getting more done with less mucking around versus rewriting all your C code with Helium, which is a recipe for lots of bugs and a lot of challenges if you're doing it brand new from each system. How often does somebody suggest you rewrite the whole thing in Rust? Actually, not so much.

Starting point is 01:19:03 I hear about all these conversations theoretically. I don't know though if I hear too many about them actually from a practitioner standpoint. Okay, so Tom's question. Tom assumes you are targeting some low-cost hardware. Is there some higher-end hardware also? He wants a global frame shutter that has been tested to just work in the spirit of Arduino. He also wants a lens upgrade, a Nikon mount, and a microscope mount. He's shopping on our podcast.

Starting point is 01:19:32 I know. This is like, okay, so now I want to go to the lens aisle and then, but you have a lot of these. Yeah, I would say we're targeting global shutters by default now in our systems. We just are going to have that. And what that means for people is rolling shutter, it's the way their camera reads out the frame. Cheaper cameras, actually most cameras, but electronic digital sensors do rolling shutter

Starting point is 01:20:02 where they'll read out rows of the image kind of slowly, not all at once. So if there's motion or something while it's reading out the image, you can get this weird artifact where like something moving. My head's over here and my body's over there. Not quite, but something moving might be diagonalized. Yeah. Right, because some part was here while it was reading out, and now it's over here, so it's skewed. Global shutters read out the entire, well, they capture and read out.

Starting point is 01:20:32 They expose the image at once. Yes, that's it. It exposed the entire thing at once instead of by rows. Yes. It's getting tripped up and read out in exposure. Yeah. But we're going to make that standard now, though. We think it should be, because we try to do machine vision.

Starting point is 01:20:46 We're not trying to necessarily take pretty pictures, but we'll also have a regular HDR rolling shutter for high megapixel counts. And then we actually are working on plans for two megapixel versions of the camera soon. Those will launch later, but we have a path forward for actually increasing the resolution for the global shutters. Regarding the Nikon mount and microscope mount, given our small size of our company, probably going to leave that up to the community, but we think people are definitely going to be able to build that though if we see, once the system gets out there, we see people making

Starting point is 01:21:24 these things. That's just a 3D printer thing, right? Yeah. Okay. Because we have 3D files now in CAD for everything we do. Those are available. And so you just take that and you take your microscope and you print up what you need to translate them, says the person who did not even look at the new 3D printer that arrived at the house. Anyway, and you already do lens shifts.

Starting point is 01:21:50 I saw a bunch of those. Lens shifts. Lens upgrades. You have different lenses. Yeah, we do lens upgrades. All righty. I mean, not for the AE-3 because that's the super small one and it's been optimized, but for just about everything else you have.

Starting point is 01:22:02 Yeah, for the N6, tons of lenses. We actually have a lot of features for the N6. So we've got thermal camera. We're actually gonna have a dual color in thermal camera. So you can do a FLIR Lepton and a global shutter at the same time. So you can do like thermal and regular vision at the same time.

Starting point is 01:22:17 We also have got a FLIR Boson, which is high resolution thermal. Then we have the Prophecy Gen X 320, which is a vent camera. Then we've got an HDR camera, as I mentioned, five megapixels, which will be like your high res camera. And then it comes with the default one megapixel global shutter.

Starting point is 01:22:32 So you've got options. And then all of those besides the GenX 320 have removable, well, sorry, the regular cameras, the color ones, just have removable camera lenses. So you can change those out too. All right. All right, I have to go write down some of these ideas that I've gotten through this podcast with what I want to do with small cameras and different cameras and microscopes

Starting point is 01:22:52 and moss and tiny snails. Do you have any thoughts you'd like to leave us with? Yeah, no. Just thank everybody for listening. We're excited with what people are gonna be able to do with these new systems. And if you have a chance, check out our Kickstarter and take a look and buy a camera if you were so inclined. Our guest has been Khabana Adjiman, president and co-founder at OpenMV. The Kickstarter for the AE3 and the N6 just went live, so it should be easy to find.

Starting point is 01:23:29 You can check the show notes to find the OpenMV website, which is openmv.io, so it shouldn't be hard. And there are plenty of cameras there that you don't have to wait for, along with all of these other accessories we've talked about. Thanks, Kavanaugh. All right. Thank you. Thank you to Christopher for producing and co-hosting and not leaving in that section

Starting point is 01:23:49 that he really, I hope, cut. Thank you to Patreon listeners' Slack group for their questions. And of course, thank you for listening. You can always contact us at show at embedded.fm or at the contact link on embedded FM. And now a quote to leave you with from Rosa Parks. I had no idea that history was being made. I was just tired of giving up.

Embedded - 497: Everyone Likes Tiny

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.