Embedded - 59: Vision for Simple Minds

Starting point is 00:00:00 Welcome to Embedded, the show for people who love gadgets. I'm Alicia White, here with Christopher White. This week we will be speaking with Craig Sillinder about computer vision. Hi Craig, thank you for joining us. Hi, well it's good to be here. Can you tell us about yourself? Yes, so I'm an electrical engineer. And early on, I was a baker and I made cookies and bread and pastries and worked with different musicians in Austin at the bakery.

Starting point is 00:00:40 And so I ended up building effects and things for them, which brought me back into engineering. Bands like Meat Joy and Two Nice Girls. And then later on, I went and did a tech program and worked as a technician for a while, and then went back to UT, finished engineering, and worked on the space station for a while. And while I was working there, I was working for a guy that was doing a video modem for telephones. And so during the day, I was using very expensive equipment in a very expensive lab with like laser protective curtains around all the the high voltage equipment and uh to to watch things like relays opening and closing then at

Starting point is 00:01:32 night i was doing this really high tech stuff with a 500 scope so it was a funny time his uh his demos were in strip clubs in chicago and so so work on the video modem, you dial up a strip club and see who's on stage. I have to admit, from cookie engineer to... Strip club video modems. I'm baffled. Your career is far more interesting than mine. And then for a while I did welding. I went to ACC, which if anyone's in Austin, they should think of taking classes at ACC.

Starting point is 00:02:12 It's fantastic. And the metal school there is amazing. The teachers are the best. So I'd say baking was the best job I ever had, and the metal program at ACC were the best teachers I've ever had. And then I worked as a welder for a while, and I would show up early, like 7 in the morning, to beat the traffic, and I'd stay out in the parking lot waiting for the place to open. I was working on a paper for pixel processing for a conference, and that got accepted.

Starting point is 00:02:43 So then I quit the welding and kind of got back into image processing, went and presented the paper, got a job working on image processing for cell phones, and worked for a few startups. And then recently what I've done is mostly working on patents on this technology I came up with and meeting with semiconductor companies. And now I'm changing gears again into helping startups

Starting point is 00:03:10 with cameras and image processing. So what are the ideas you've been patenting? What's happened in computer vision and why it's sort of a strange mixture of potential and not really much in the way of fulfilling that potential as far as seeing it a lot of products is that it it's a little hard for people to make out even what computer vision is. And I've been in meetings with experts, like people that were running companies that do compression

Starting point is 00:03:54 that just got bought by another company, and they bring those guys in to the meeting as their experts. And we'd be talking about computer vision, and the compression guys would go, why do we need computer vision? We already have compression. So one of the things I learned was always start at the beginning. And when people tell you they already know what you're doing, don't believe it.

Starting point is 00:04:15 Just start at the beginning. Well, that actually brings up, maybe we should start at the beginning. Embedded vision is a big, probably overloaded term. What is embedded vision to you? Let's look at it like this. We'll call computer vision everything where you connect a camera and do some processing and where you want to see objects or track objects or recognize objects, do some kind of detection like that where you don't necessarily need to transmit the image, but some information about the image.

Starting point is 00:04:55 So we'll call that computer vision and let that be the broad category. Is there a difference between computer vision and machine vision? Yeah, and mainly because of the market. So in the market, I just use the word consumer vision for the area I'm in. And that's why I want to be down below $20 for some kind of vision function or even a complete module, vision module, hardware and all. And then embedded vision is wherever, just what it says. I mean, it's whenever the thing is operating independently and it's in a piece of equipment. Machine vision is at the top as far as the markets

Starting point is 00:05:36 go. So if you buy market research on machine vision, it's specifically in the industrial area for manufacturing. It's like $100,000, $500,000, multimillion-dollar systems. And if you look at the volume, the volume of sales is like hundreds or something a year. So really distinct. And it's really confusing for people because when you work with business people, if they latch on to the idea of machine vision and you're talking about consumer vision, things can get very confusing. Yeah, so machine vision is at the top. But here's the problem

Starting point is 00:06:12 with the marketing reports is that when you look at marketing to prove there's a market for something, the markets are defined by places people have already taken. So if someone's already there, they call it a market. If nobody's there yet in which you have a lot of potential to be there, they don't see it as a market because they can't get a report on it. So it's kind of self-defeating, but that's the way the business and investor side of it works. So if you come up with something new and you start saying, you know, there's going to be a market for low-end consumer vision products and these functions that are needed in $5 devices instead of $5,000 devices, they just won't see it.

Starting point is 00:06:59 Creating a market or finding one that isn't tapped yet is the risky part. So it depends on the risk tolerance of the people you're talking to, right? Yeah, which is really what separates people. So you either are willing to take the risk to create something new, or you get a job. So sub $20 things, that's not self-driving cars and robotics and manufacturing. What can it do? Is it motion detection, gesture detection? In itself, it can easily do motion. You could probably get it to do gesture because the approach I came up with, it's called Chipsight, does data grouping without extra frame buffers, without a very high-powered processor.

Starting point is 00:07:51 So I've done it on small processors. Data grouping, is that like the face recognition stuff where it tells you where the eyes are? Yes, but data grouping is below that. So we're still talking about pixel processing where the eyes are? Yes, but data grouping is below that. So we're still talking about pixel processing where the pixels first come from the camera. So at the very front end, you have the camera, pixels, and that's just pure data. And that data means nothing to a computer.

Starting point is 00:08:16 It's kind of unfair to push data at that bandwidth onto a general purpose processor that has to look at it one instruction at a time and do the simplest thing like say this pixel to the left is similar to this pixel to the right so why don't we just group them together oh and that's and now now we're back to compression again sort of so what i've done in instead of calling it compression, is it's more along the lines of segmentation where you're looking for the larger areas of interest in an image. That makes sense. So what application with putting out the product myself at first, I'm just going for the simplest applications I can find like

Starting point is 00:09:10 lighting controls. So with lighting, the PIR sensors, the passive infrared sensors have sort of given out on their usefulness and all the building automation people really are asking for something better and they'd like to use the CMOS cameras because you can get CMOS cameras for a dollar. Unfortunately, sorry, go ahead. The PIR sensors are those things that like turn on my porch light and then at twilight they flicker for a while because they can't decide whether they're on or off. Okay, just want to make sure. Exactly, and you run into problems if something's not moving. Like I have a friend that works in an office with the PR detectors for the motion sensing to turn the lights on and off.

Starting point is 00:09:49 And if he's the last one working there at night, he has to stand up and wave his arms every half hour. Yeah, I remember doing that. to decrease wasted energy worldwide is that as outdoors and over distance, the PIR sensors just don't work. They don't work over temperature. They don't work well when it's real hot or real cold, and they don't work well over distance. So if you want to control lighting in a campus or on a street, you really need to, the best approach, least costly approach,

Starting point is 00:10:24 and most flexible approach would be to use a camera. That makes sense. I have a friend, Ikana, she was on the show like a year ago and she's been building a camera system for capturing photos of animals at night. And she tried using the PIR sensor to trigger the camera, and that hasn't been entirely successful. I know she's running Raspberry Pi and definitely can code things, but what?

Starting point is 00:10:56 She should be able to do it with a Pi and that camera. That's a nice setup. And maybe she should hook up just an IR emitter. Yeah, she's got that. An IR camera just came out for the Pi. The Pi NoIR. It's pretty neat. I have one.

Starting point is 00:11:15 And I guess my problem with the Pi right now is that it's six frames per second if you're running through Python, which isn't nearly good enough. Right. And sometimes it is. So sometimes all you need is one frame a second. Sometimes you want 10.

Starting point is 00:11:31 Sometimes you want 15. And where it gets interesting is where you can get a real advantage out of doing more in hardware in the front end is, say, a tracking problem that is very difficult at 15 frames a second might be simple at 60 frames a second. So you could take a tracking problem where there's some ambivalence about which object is which after a certain amount of time

Starting point is 00:11:59 and decrease the amount of time between frames and all of a sudden it's easy to tell which optic is which. And so you've really simplified the tracking algorithm just by running faster in the front end. And that's what you've been working on is the front end pieces. Yes. And there's a couple of reasons I did that. One, because it was interesting, it was challenging,

Starting point is 00:12:23 it hadn't been done, and the system I was working on had i think 14 frame buffers it was insane and so i do it with no frame buffers uh the data grouping part and i can run it whatever the camera speed is running at at any resolution and probably about maybe i'm guessing like 10 000 gates or so so so if chip site was made into a peripheral your the microcontroller you buy from microchip or someone uh could could do a lot more with maybe 10 cents worth of gates added to it so going back to that your description of the original system what's the purpose of having so many frame buffers? It can get complex to do much processing on an image.

Starting point is 00:13:14 It was a pipeline process, and so they were passing it along the way. And yeah, it was interesting. That's about a half second of latency on a normal camera or something. Yeah, I don't remember. It was expensive. It was a big board. I mean, the amount of data throughput is the problem with cameras. You just get a ton of data.

Starting point is 00:13:40 You want information, and what you're getting is data. And it's a good way to look at the architecture of really a lot of different systems. But let's look at a vision system where you break it into blocks. And if the first block is the camera, it's outputting information from the sensor. But to the next block over, it looks like data. So to us, that just looks like pixel data. To the camera, it's output, it's information. The next block, that's data.

Starting point is 00:14:06 The next block outputs what it considers to be information. The next block up, let's say it's the application, is in taking what looks like data, like let's say features or something. And then finally it says, look, that's a basketball in the middle of a court. Okay, so the basic idea is to prune early. At each step, you're converting data to information. Yeah, so you prune. You're trying to prune data as much as possible before it gets to the slow processing. Yes, and make sense of it in as economical a way as you can.

Starting point is 00:14:39 And so it means you're looking at a system from a viewpoint of cost, really. Instead of we build general purpose processors and we want you to buy the fastest one we sell. And so you should use this library of vision functions on our fastest processor. That makes sense. Yes. I mean, isn't that why OpenCV exists? Because Intel said if enough people are doing computer vision on our processors, then they'll have to buy more expensive processors? Yeah, exactly. No, we're really roped into

Starting point is 00:15:19 general purpose processors. And it's good for developing your system and it's just when you need to build a million of them you you need to rethink the algorithms and really look into them which is nice i mean it's a challenge it's a great engineering challenge to look at what's really needed in a system and design around it. But I mentioned that about the architecture because I want people to look at computer vision as an architecture approach that's really understandable and not look at it and go, it's so ad hoc.

Starting point is 00:15:59 It's such a bag of parts. I really don't even want to look at it. And someone with a PhD has to come do this. So, another part of Chipsight is that it's standardizable. So, if you use data grouping as your front-end main idea, and if you can get a lot of mileage out of it, then you can use it to build applications on. So, someone that does software can take this front end and always know what the data structure is they'll be getting, what type of data they'll be getting. So they'll be interacting with the front end instead of having to design the hardware, design image processing from scratch. It actually didn't occur to me to try to design image processing from scratch

Starting point is 00:16:47 I read about the simple CV and I was pretty excited about all the different features you get for what feels like free I mean there's edges and blobs and the face detection algorithm worked right out of the box. Yes, one of the people on that was Nathan Ustendorp, and he's been really nice and talks to me once in a while. He's very kind, and he's fun to talk with.

Starting point is 00:17:16 So that's another good example of something you could use really either way if you have the processing to support it or to use it for development. But as I don't want to have a high-cost general processor all the time, my options are... It really depends if you want to go to volume for a consumer product. So in some cases, people don't care. Like if you're doing a camera for an intersection to know when there's cars waiting for a light, to turn the lights to green and red, they cost $10,000 each. And so you can afford some processing and you're better off maybe using a library of some kind. If you want 100 cameras per block to do some other kind of

Starting point is 00:18:01 basic detection, then you probably want a different approach so you can get down to, you know, $50 a camera. So it's kind of a shift in thinking of, am I going to do this high-end, you know, C3PO kind of vision? Is that really what I need? Or can I use something really simple and use it everywhere, like all over the home or all down the hallways of a business, that kind of idea. So how far, just to bring up some specifics, you talk about Chipsight being able to drive vision to lower end processors and cheap processors. What are we talking about in terms of low end? I mean, what's the realistic, you know, simplest application and the lowest end processor that you're targeting?

Starting point is 00:18:49 I did one demo that's on the chipsite.com with an 8051, and that didn't use Chipsite, but it did a vision function, and that was an 8051, and it ran at 60 frames a second on a VGA camera. So the input was a VGA camera. What was the output? It was the locations of blobs for a specific colored object. Okay, and we should define blobs. I know I started it.

Starting point is 00:19:18 Just a region, just an area of neighboring pixels of the same color, let's say. Often with a threshold, so it's nearly the same. And that's one way to do facial recognition is because faces have different colors than their surroundings often. Yeah. And, you know, an interesting thing on some of the facial detection is that across all cultures and races, the face color tends to have the same hue.

Starting point is 00:19:52 And so you don't have, there's some parts of the color that you don't have to make adjustments for for different people. So that suggests that face detection works better on a color camera than a black and white, for example? Yes. So back to the 8051, what was the application for it? You know, I did it to see if I could do it.

Starting point is 00:20:15 Okay. And part of the trick was synchronizing the processor and the camera. So they were working, they were locked. In fact, I did it with a PIC a long time ago. When PIC first came out with the first 8-pin processor, I sampled an analog camera and did the same thing, just locked it to the clock of the camera. And so everything that happened was totally deterministic.

Starting point is 00:20:46 I mean, it was running in complete lockstep with the video. That's pretty impressive because I think we were kind of alluding to this earlier, but people have a notion of computer vision being intensely processing expensive and very difficult and requiring gigahertz processors and lots of ram and then you're you're demonstrating that if you can pare your problem down to exactly what you need to do and be clever about it you don't need all of that you don't need almost anything there you go well that's sort of what embedded systems are right is taking the general purpose and cutting out everything you don't need. But you do, you said be clever about it.

Starting point is 00:21:28 There's a pretty wide range of difference in processing power to what people expect. But really, in the embedded world, as embedded engineers, it sort of goes with it, I think, that we should care about what's on the inside. Definitely. More recently, on your website, you have the IoT cam, which is, you have it as the $20 vision system. Tell me about that. Right, now that one, I went back and did chipside and software.

Starting point is 00:21:58 So that one's in the firmware of that ST part. It's an ARM4. Sorry, an M4. So it's got the floating point. Yeah. I don't think I've used it yet, but what I use it for is it had lots of on-chip RAM. And so the microcontrollers,

Starting point is 00:22:22 even at the low end, because that comes in some fairly inexpensive versions, are getting pretty capable. But I noticed you had a mention on your blog about the idea of hardware versus software. No, you meant your history in your past and doing optimization, and you thought it may not be useful anymore, but it still is in the embedded world because now you wanted to do the products for health monitoring in their own limited devices. I mean, that is where the optimization comes back.

Starting point is 00:22:57 I mean, the processors get bigger and you feel like, oh, I've got all this space now and all these cycles. I could just burn them. Linux, put Linux on it. And then somebody says, oh, I've got all this space now and all these cycles. I could just burn them. Linux. Put Linux on it. And then somebody says, put Linux on it. And you're like, oh, now my processor isn't fast enough. Put a bird on it.

Starting point is 00:23:11 Sorry. Yeah. So, and people go back and forth between hardware versions like ASICs or FPGAs or CPLDs, which are also becoming lower cost and more capable, versus the general purpose processor. So it's a little hard to explain a philosophy that goes one way or another because it's like maybe it would just go back and forth. One year you might want to go one way, one year you might go another. But the reality is that people are using these things and people are developing these things.

Starting point is 00:23:52 So using a general purpose processor is sometimes like having a Swiss Army knife that, yes, it does have the Phillips head screwdriver on it, but sometimes you really just want the Phillips head screwdriver itself. And that's what works best and not the Swiss army knife. So, so sometimes you want to do the custom algorithm in gates on a CPLD or an FPGA. And, and sometimes you might want to go to a general purpose processor. So looking at the architecture, I prefer to use the general purpose processor for the higher end part of the app, like let's say for the application itself. So the application is the part that knows what it's being used for. It knows what time of day it is. It maybe knows where it's located. It knows it's supposed to be looking for the

Starting point is 00:24:40 basketball or whatever. And everything before that doesn't really need to know anything, and hopefully it will be more like parallel processes that are a little less expensive. And so this little camera board, because you've chosen to go software, you've opted to have the flexibility of a more general processor. Are you working in an FPGA version? Yes, I did a version in a Mach XO2 first, and that's on the side also.

Starting point is 00:25:14 That was, I think, year before last. And I love the Mach XO2 parts. They're really, you get a lot for the money so you know six dollars or so you get a whole lot of logic and again some nice blocks of ram so you can really do a lot with it and it's um what is the mock x02 it's a it's a cpld i tend to just call them all FPGAs, but it's from Lattice. And so it's a Lattice Mach X02, and I recommend it. In fact, they have a breakout board right now that has the largest, has the smallest part and the largest part on it, whichever one you choose, and it's like $25.

Starting point is 00:26:00 Okay, and that's Mach as in speed of sound Mach. Yes, and so it's a great way to get started with doing a product with an FPGA type device. Plus it's single chip, it's got a clock built into it, it's got a voltage regulator built into it. So you give it one voltage, no clock, hook a camera up to it and you can do things. Ooh, evaluation kits under $29. Yeah. Must not just click buy. Okay.

Starting point is 00:26:32 The problem I ran into with them was, what were the tools? And that sort of hampered my development of that product. That was kind of a manufacturing prototype. And that slowed me down a lot with it but but uh you know i don't know maybe i was doing something wrong who knows oh yeah the tools never catch up in time so are you building the iot cam and why is it iot oh for internet of things and uh so so after these last few years of telling people, no, you could use it for this and this and this and this, and having them just sort of look at me strangely. Now, you know, the IoT is making a case for itself and people are going crazy over it and they think it's going to be the biggest market ever.

Starting point is 00:27:22 Does it have any connectivity, Wi-Fi or whatnot? Now what I've been building is just as a peripheral device so I just I put on a UART, I2C port, SPI port and USB that's what's on that chip so those were easy to put on and so whatever system you do you could connect it but I am working on a product for a Kickstarter, and I'm adding a Wi-Fi module to it. Right. You mentioned the Kickstarter project, and it's a peephole camera? Like for a door? Yeah.

Starting point is 00:27:59 So you attach it to the peephole over your front door, and it goes on the inside of the door inside your house, looking out the peephole. your front door, and it goes on the inside of the door inside your house, looking out the peephole. It's battery-operated and wireless. If somebody knocks on your door, it sends a photo of them to your phone. That's pretty cool. Is it going to do some processing to make sure it gets a good image of the face? That's a good image of the face? That's a good question.

Starting point is 00:28:25 So now, see, part of the problem is I need to reduce the amount of, I need to reduce the processor enough that it's not using too much of the battery power, but also want enough processing there to do things like what you just said. And so you are going battery are we talking double a's here no uh lithium polymer how am i going to recharge it it's attached to my door you have to yeah you have to take it off it'll be attached uh magnets and a bracket cool yeah it's very nice very nice design the guys at Peep came up with. Very cool.

Starting point is 00:29:07 And their Kickstarter isn't live yet. That's coming. Yeah, it should be in the next month or two. Another reason I'm excited about being part of that project is with watching all the Internet of Things products that have come out in the last year. And they really seem like pet rocks. I mean, they're like things that you have to love them to make them into a product. And once you stop loving them, they're not a product anymore. And so the peep is exactly the opposite. It doesn't bother you. It's event driven. So when there's an event, it shows you a photo and that's

Starting point is 00:29:44 it. There's nothing to do it doesn't demand a lot of love from the customer attention or anything like that it's um yeah it's just pretty exciting it's a great way for us to get into that market of the internet of things and to do it in a way that is really useful to people and not just look what we can do with our Bluetooth module, give us money. So this one's going to be Wi-Fi so that you don't need a gateway or anything. It'll go straight to your wireless in your house and jump off to the internet. And then I can see it if I'm not home. Exactly.

Starting point is 00:30:28 And, okay, so that one doesn't have a lot of computer vision to it, does it? I mean, you said you weren't. No, it doesn't really need it for the first pass. So this is an example where pixels are what you want. So in this case, it's like the event puts it into context. The event turns that image into information. It's an event where someone knocks on your door. So you know that someone, do you see what I mean? In other words, the image is telling you, here's who knocked at your door. So that is the information you want. Fair enough. So back to the vision processing, where do you think people should get started if they're building a small system for themselves?

Starting point is 00:31:16 Well, they built the Peep prototype. They won a hackathon with it using a Raspberry Pi and a camera and the Raspberry Pi camera. So that's a great system to start with. Have you played much with the Raspberry Pi camera? No, no I have not. You've been playing it this week and exploring it, right? Yeah, I've been doing a lot with it this week. And the closer I get to the hardware hardware the faster it is and the better it is but i was kind of hoping i just wanted to poke at the high level things and explore some of my ideas

Starting point is 00:31:52 so i'm a little sad actually that i'm having to actually work yeah so that's starting to feel like work that's the trick when you're developing something is i think you need the processing power to be able to play right without having to think hard about optimization first and so this is the kind of problem that almost encourages you to optimize first which is not what you really want to do when you're first developing your ideas no you really you learn a lot about the guts of it and that can come in handy later on. And embedded people really need to know the low end of things if they're going to design things for markets where the cost needs to be reduced. So it'll be in her bag of tricks later on that she knows how to optimize some of those lower level things.

Starting point is 00:32:42 And knowing some of the higher level things are possible and what they're called. I mean, even knowing what an algorithm is called sometimes is enough to get started. But the fun part is definitely the higher level and building on other people's stuff just to try it out. I mean, this peep camera, yeah, I could totally figure out how to make that.

Starting point is 00:33:08 I mean, I probably could do it from the parts on my desk. That's right. I have an electric imp here and a Raspberry Pi and a camera. I'm sure underneath all of this stuff there is a battery subsystem. My desk has gotten a little out of control. So I took one of the imps and connected that first camera to it a couple of years ago and sent the imp guy a photo of it, and I call it the pimp cam. Sorry, that was the whole point of the story.

Starting point is 00:33:48 So which camera? The $20 camera? The one before it was a CPLD. It would have probably been about a $20 thing also. It's a nice board, though. I chose the largest packaged chip I could for the back of it. So when you look at the back of the board, the PCB area is completely

Starting point is 00:34:08 taken up by the IC. But the electric amp doesn't pass a lot of data. No, that's the great thing about working with information is it's low bandwidth. Not every application needs high bandwidth data or information. I mean, sometimes you just, let's say if you had a robotics application that was doing warehousing. First, it needs to find a box on a shelf, and that can be pretty rough information, low bandwidth. And then it might need to get close and read a label. And so

Starting point is 00:34:48 for just a moment there, it needs a lot of more data to get the details to read the label, but it doesn't need that all the time. So it's good to have your front end be flexible in that case, which is what I do with ChipSight is you adjust how much the level of detail that you want to receive the shapes and locations. Well, that goes back to my friend's cameras. What you really want is to have very low bandwidth until something is moving. And then you figure out if that something is just a leaf. And then you take the good picture that nobody's ever seen of the amazing little lizard or whatever yeah so so the

Starting point is 00:35:30 load on the imp there would be would would run along pretty well i mean it wouldn't even be stressing it in in between times when it saw something move it wouldn't need to say much at all and then it might just say hey we saw something move, it wouldn't need to say much at all. And then it might just say, hey, we saw something move, and that doesn't take much. And then to upload one image, you might take 20 seconds, let's say. And so you wouldn't be doing video, but you'd be doing enough to do what you just described. That'd be neat.

Starting point is 00:36:01 That's another way of doing your architecture is it's not only pruning data for every frame or whatever. That'd be neat. That's another way of doing your architecture. It's not only pruning data for every frame or whatever as you go up the block diagram. It's also doing it temporally and saying, okay, we only want to care about certain kinds of events. So let's push the intelligence to something that can determine what kind of event it is to save us later down the pipeline. You know, that's perfect. I mean, you could look at the system in cost or you can look at the system in bandwidth. And you can design from there.

Starting point is 00:36:37 I mean, those are really helpful things to restrict your design. And mainly what I've worked from in the past is restrictions. I mean, when I was designing this, when I was doing the image processing for cell phones, I was calling one of the customers, and I'd say, okay, you know, to implement, we were doing some noise filter for cell phone cameras. And they say, well, how much memory does it need? And I said, 200 lines, you know, like enough told 200 lines of an image. They said, no, that's too much. I'd go back and work on it. It was really a bad idea to have the engineer calling on the customer. It made me crazy.

Starting point is 00:37:11 And then I'd come back, I would go, okay, 20 lines. They'd go, no, that's too much. And I'd go back and work on it again. And that's the approach I took with this was to say, well, what if not only you have restricted resources, what if you don't have any memory? You know, what if you only have, you know, the stack or enough to buffer a couple of lines?

Starting point is 00:37:31 Then what do you do? And it really helped the design. So sometimes, you know, not having all the money in the world to use on your design is an advantage. That's probably another reason you get better ideas coming from outside of large companies than you do from inside of large companies. Definitely having no constraints on your resources is not inspirational sometimes. Oh, yeah. Constraints are good for getting things done and designed.

Starting point is 00:38:01 I know it's hard to figure. You could probably write a book just about that, right? Of course Speaking of books or maybe blogs or tutorials I think winnowing the information and understanding that is a good place to get started but there's more What advice would you give to somebody who has a computer vision problem and wants to start to learn the industry standards or to

Starting point is 00:38:34 learn how to go about solving it? So, just read a lot, you know, and read a lot online. And I look at academic papers also. Do you have any specific blogs or books that you'd suggest? You know, I did have a list, but I haven't updated it lately. And people tend to change. But it's fun to find them. Because the people you find that are doing the image processing projects and blogs, might then go on to do their own products and their own software and their own Kickstarters and things like that. So it's fun and you can reach out to them and communicate. And, you know, if you gain an understanding of something, you can write about that.

Starting point is 00:39:16 So I wrote just a couple of articles about color. And it was funny because people were so opinionated about color, how to use color and how to do color transforms. It's like it was like I'd lit a match to gasoline or something. And so it's fun. But no, I don't have any right off the top of my head right now that I would send people to. I mean, right now, the best things I've heard have been about the Raspberry Pi and the camera. So, you know, that's where I would start and see what people are doing with it. I did the Getting Started with Raspberry Pi book

Starting point is 00:39:55 from O'Reilly and then switched into the something about simple CV book. And I liked both of those, but then I found out that Python was just too slow for what I wanted to do. So that was annoying. But you can use Python on top of C. So you kind of want to translate the slow sections into C and are even more into, well, let's just leave it at C.

Starting point is 00:40:22 Well, once I, I mean, having used simple CV, then there's open CV, which is parallel. And the simple CV, well, I guess it's obvious it's simpler. But that was a good place to get started for me. And now that I'm starting to play in this e-source code and thinking about, well, I could use open CV as as a library or I could try to hack something together that just solves my problem. But it was a good introduction to know what resources are available out there. Well, the approach I'm taking now is to try to keep everything as basic as possible in what I'm doing so that when I give someone a board and the API,

Starting point is 00:41:07 there doesn't have to be a lot of description. So when I work with color right now, I stay in the RGB domain with it so that it's like you could teach people about red, green, and blue and work from there without ever going through any kind of complex transform and for what's going on. So I would like to do a book, something like Vision for Simple Minds or something like that. I don't know and uh and and you know supply these uh low cost modules to the you know high

Starting point is 00:42:10 school robotics groups and and uh embedded classes and universities and high schools and let people learn that way so they can just start writing code on top of it and kind of have a good intuitive feel for it so that they never start with that problem of you know my god this area is just too confusing and people really don't have it together here yet i mean the advantage is is there is that there's a lot of opportunity in computer vision right now it's done pretty crudely and it's so there's not a lot of distance from the camera to like the best people can do with it. It's just been kind of held back for a long time. So that gives us the opportunity to do something that allows people to move into it and expand really quickly and, really quickly and hopefully break through the cost barrier and complexity barrier of it

Starting point is 00:43:10 so that some kind of architecture just makes sense, that lets it all move forward. It's not so stuck on the speed of these general purpose processors and old ad hoc groups of algorithms and things like that. If we can do it right, it will seem very magical. I mean, right now embedded systems so much depend on touch and buttons. But the Leap gesture recognition thing was pretty cool. The Connect got, I mean, I still think that's pretty amazing.

Starting point is 00:43:47 And the more we can use the vision the way humans do to get a whole big picture of our environment, I think it will be, I think we can do really neat things with embedded systems. I think we have no idea what's going to happen. No, seriously. I mean, our thinking is just, I feel like you do, that it's going to be great. But at this point, it's just impossible to see. I mean, even when we talk about the Internet of Things, it's so much at the beginning, and we're still thinking so much from the past that we have not even come close to what it is going to become. We can't even see it from here. So at some point, you know, hopefully we're going to change our thinking about it and design these systems so that they take care of themselves.

Starting point is 00:44:44 They don't need a lot of human interaction. And we can go about doing what we do, and we're not in the middle of the machine. And I mean, things like cell phones are great, or mobile devices are great, but they're total attention whores. You know, they're just, they can't live without us, you know, or we can't live without them. So, you know, hopefully we'll get into something else where we are free of the devices and the devices are free of us. And it's more of a, you know, a built-in service. Like you walk into your house, the correct things happen for the AC and the lighting, and maybe you can even gesture to talk to someone on the phone, but it's not this thing that is just totally dependent on your attention.

Starting point is 00:45:39 That makes sense. But I can't say what it is. You see, I really think we're going to get somewhere that we can't really envision yet. Was that intended to be punny? Going back to color transforms, what do you have? I mean, I can think of going from RGB to black and white as a transform. What other transforms do you mean? Or is that not even in the right realm? Well, that's a good example right there. So one of the people I worked in that taught me a lot about color and image processing was Dr. Al Edgar. And he had come from the startup called, I think it was Applied Science Fiction.

Starting point is 00:46:23 Is that right? I may have it wrong, but anyway, they were bought by Kodak and they did things like removing spots and noise from images. And so it was just a complete education for me working with him and learning about color. And so some of the things people do with, especially at the low end, you know, you can do a full on correct conversion of color to black and white, or you might just use the green channel because it's the one we're most sensitive to. And the same for other conversions that you do where you don't want to do the floating point or fractions. So you can do things with, instead of RGB, you can use RGGB.

Starting point is 00:47:09 We're using green twice. And then you're still in the realm of kind of binary math, where you could multiply and divide by two. And so you could do RGG, R plus G plus G plus B, and divide by four, which is all easily done with shifts and also have a representation of black and white. How important is having a good math background

Starting point is 00:47:36 for all of this? I mean, FFTs and convolutions are for edge finding. But is, I mean... Well, and wavelets are another big topic in extracting features and things. And you mentioned high school students, and that, those don't, I mean, are they just going to have push buttons?

Starting point is 00:47:56 Well, you can find edges by pixel differencing. Right. So you're just subtracting the left pixel from the right pixel, or a group of four pixels, you know, subtract four the left pixel from the right pixel or a group of four pixels subtract four ways from the one in the center so there's different ways to do a derivative as you look across the image

Starting point is 00:48:14 and those are the kind of things that would lend themselves really well to hardware like FPGAs yes so you can cut a lot of corners with algorithms to hardware like FPGAs. Yes. So you can cut a lot of corners with algorithms, and what you're giving up is robustness. You're giving up really beautiful results. Let's say if you did a noise filter using the full-on best world-class algorithm,

Starting point is 00:48:42 let's call that 100%, you get a really beautiful result where you can still see all the details in the image, but otherwise the areas of noise seem suppressed. Or you could do some simpler method in hardware that's really efficient and may not be intended for human consumption. It may be intended to help out the computer vision part and not the human imaging part. And let's say you do 80% as well as far as the quality of the result, but your one-tenth is complex. You've got something that is fairly inexpensive to implement.

Starting point is 00:49:25 I feel like that trade-off is common. There you go. And that's my kind of rule of thumb. It's like if I can get within 80% of the high-end algorithm and do it in a way that fits an embedded system, then I'll do it. So have you heard or have you played with it Connects? No but I've

Starting point is 00:49:50 watched what's going on with it and I've visited with the guys here in town that are doing some of the 3D like

Starting point is 00:49:57 architecture room scanning and volume scanning products with it and so I've kept up with it a little bit. I mean, because it's a camera, and then it's got,

Starting point is 00:50:09 I don't know what the rangefinder is. Is it a laser rangefinder? No, I think it's got two cameras. So it does some sort of parallax thing to do. Well, no, the data you get from it is a camera, and then the RGB plus, and then the depth. And you think it's from two cameras. The first version used structured light

Starting point is 00:50:28 and it had an IR LED with a holographic filter on it that puts out, it looks like just a cloud of random spots, right? Different sizes and different locations. It just looks like, you can find them online. There's videos on YouTube where they show the infrared view of a room illuminated with a Kinect. It's very cool. And so one of the cameras is detecting those IR spots. And somehow from that array of spots, it comes up with a depth map.

Starting point is 00:50:59 And they do some math processing on there. But they also had a lot of research behind it. I forget who all was involved. There was some university involved. It's a really good low-cost implementation. If you look at the teardown of it, it's very cool. But it has a lot behind it. It has a lot of research behind it a lot of uh database help

Starting point is 00:51:27 behind it a lot of learning behind it so it's not the kind of thing you or i would be able to start with if we were doing something like that and so you know i would tend to go with buying one of the time of flight cameras or you know using two cameras or something like that, or you get your 3D clues some other way. Well, and the Kinect is very processor-intensive. Is it? You know, I don't think I looked at the processor on it. I wanted to hook it up to a Raspberry Pi, and everybody said,

Starting point is 00:52:01 no, it's not going to happen. Well, there has to be some output on it. I'll bet there's some kind of output you could do on it that you could use in a Raspberry Pi. So then you'd have like this really nice 3D detector hooked up to your Raspberry Pi. Yeah, I think there are a lot of people who want that and the internet is still crying.

Starting point is 00:52:50 As we talked before the show, you mentioned that lighting and LEDs are incredibly important to computer vision. And then the lighting people were some of the first ones to contact me about using Chipsight. Because here's the problem with almost any OEM, be it lighting or anything else, is that when they decide they want to use a camera, they see these $1 cameras in the world. And they go, man, we should use that camera for a product. And then they start looking around for the processing for it, and they generally end up talking to TI because that's who does a lot of image processing in their products. And immediately they're screwed because TI is going to go great and show them a board that has $500 worth of parts on it. In the end, it's like a $500 thing. And the lowest I've seen with a product on the market using a TI part was I think $150 and it didn't include the camera.

Starting point is 00:53:40 So it's just hard to do the brute force method to get much, even in the way of simple functions, from a general purpose processor. But then it's a dead end for the OEM. They've got nowhere else to turn because that's it. They talk to the people that are seen as the people that should know. And, you know, but from the semiconductor company's point of view, why would they offer something less expensive? They don't see it as a market. So we're kind of stuck.

Starting point is 00:54:12 The OEM is stuck, helpless, going, well, you know, this thing has to be $20 or less, so we just can't build it. And then once in a while, they find me. And then what they need in lighting is something that could conceivably go per fixture, like a sensor per fixture, or even just replace the PIRs in a hallway or an office and do a better job of detecting motion and luminance. So one of the new requirements in California is that as more lighting comes into the room, as more daylight comes into the room, you should turn down your lights to save energy. And so I think the OSHA recommendation is like, I forget, let's say 500 lux or something on the work surface. And so with a camera, you can adjust the lights perfectly.

Starting point is 00:55:10 The demo I have on chipsight.com shows the demo with the light meter and turning the lights up and down. And it can just nail it at 500 lux all the time, regardless of what other lighting is changing in the room. So as the sunlight comes on, you can turn your lights down. And it's pretty simple, and it's not completely a simple idea for a camera to do, but it's doable. It's doable at low cost, and that's what they need. You can imagine extending that to changing the color temperature of the lights, you know, to

Starting point is 00:55:46 enhance mood or keep you from staying awake too much. A lot of the architectural lighting, they want to do different colors. It depends on what your Facebook status was. Our time of day. You know, seriously, people like to, like, yellow

Starting point is 00:56:04 the lights as they dim so that it looks more like incandescent lighting. Yeah, and you often change your color profile on your computer. Yeah, I have a little gadget on my computer that, as it gets closer to sunset, takes the blue out of the screen image. I love it. Yeah, I use that. I love it. Remember when you tried to do some video editing at that? Yeah, no, I was doing some photo editing, and I'd forgotten that that was on and realized I'd screwed everything up. Everything was kind of yellow.

Starting point is 00:56:38 So when you introduced yourself, you mentioned patents and that Chipsight, that's a lot of what Chipsight does. Yes. Now, patents were interesting because they really showed me how much communication is involved. So if you think about doing an embedded system or you think about doing something that you've created that's a new application for an embedded system your job kind of switches from being technical to being about communication communication can be almost the most important thing really probably the most important thing you do. And patents can help with that. They can help you say exactly, you

Starting point is 00:57:31 know, what you've done, where it applies, how it compares against the old ideas, and break it down into a language that can be read and translated and understood in different parts of the world amongst people with different that may be lay people as far as technology is concerned. So it's like a whole other form of communication. It's amazing that it works, but you know, people can pick up a patent and read the claims language or read the summary and sort of get an idea of what you've done and what it's for and the boundaries of it.

Starting point is 00:58:09 But they get such an awful reputation. Especially software patents, which are being hammered right now. Yeah, I mean, you know, companies with money can really screw things up. And it just becomes a survival thing for them. And so instead of wanting to bring some new idea to the world that helps people, they are just fighting to stay alive, which for some reason they think is all the time, no matter how much money they're making. And so they just misuse it. It's just like throwing too much money at it, too many lawyers at it. And I don't know if the system is broken, but it does not favor

Starting point is 00:58:55 the unfunded person. And you've done some of these patents on your own. I mean, your company paid for them and your company is you. Yeah, it's pretty brutal. Have they been worth it? No, not yet. And I don't know if I would do it if I started over again. I can't say that today I would do these patents again. The good thing right now is that they're still perceived as valuable by investors and other people. They

Starting point is 00:59:28 still perceive it as like something almost tangible. The baseball trading card theory. Yes. And it is intellectual property. So it says that this intellectual thing I came up with is property and has value. And I've marked out the boundaries of it so it's like a little beachfront property of a couple of acres that I own yeah and then you have to hope that global warming doesn't drown out the supreme court aren't those the same no that's right well I um I'm feeling under the weather, actually, so I'm going to call this and ask you if there are any last thoughts you'd like to leave us with.

Starting point is 01:00:15 Yeah, if you love doing embedded engineering, just don't let your interactions with people kind of diminish your love of it because along the way you have to deal with different companies and different groups. And especially if you're doing contract work like you guys are doing, you run into different attitudes with engineers in those companies and managers. So, you know, I hope that people can keep their love of what they're doing throughout their careers and not get too beat down by it. But either way you go, whether you get a job in a company or kind of exist outside of it and do more creative things, you have to deal with that. You have to find a way to protect this thing you love doing. Ah, the joy of engineering.

Starting point is 01:01:10 Yes, it's great. Thank you so much for joining us. Thank you. My guest has been Craig Soelander, owner of Chipsite. And thank you to Christopher White for co-hosting and for producing. And, of course, thank you for listening. If you have any questions or comment, hit the contact link at embedded.fm or email us at show at embedded.fm.

Starting point is 01:01:33 And this week, our final quote, I'm stealing it straight off of Craig's website. It's from Helen Keller. The only thing worse than being blind is having sight but no vision.

Your Ad Here

Embedded - 59: Vision for Simple Minds

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.