Embedded - 161: Magenta Doesn't Exist

Episode Date: July 19, 2016

Kat Scott (@kscottz) gave us an introduction to computer vision. She co-authored the O'Reilly Python book Practical Computer Vision with SimpleCV: The Simple Way to Make Technology See. The book's w...ebsite is SimpleCV.org. Kat also suggested looking at the samples in the OpenCV Github repo.  To integrate computer vision into a robot or manufacturing system, Kat mentioned ROS (Robot Operating System, ROS.org). Buzzfeed had an article about SnapChat Filters. Kat works at Planet. And they are still hiring. 

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to Embedded. I am Alicia White with Christopher White, and I'm excited to have Kat Scott on to talk about computer vision. Oh, already? There's usually some sort of lead-in. Hi, Kat. Welcome to the show. Hi. Kat just complimented us on our professionalism. Now here we are. That lasted. Kat, can you tell us about yourself? Oh, well, so I'm a senior software engineer at Planet Labs
Starting point is 00:00:33 right now. I do quite a bit of computer vision work. I primarily work in Python, occasionally C, C++. Prior to Planet, I've started a couple different startups doing robotics and manufacturing systems. And then I started my career actually being a defense contractor doing small business R&D contracts in Michigan. And that was quite an interesting job. That's where I kind of learned how to do robotics and computer vision and that sort of thing. You also co-authored a book. I also co-authored a book. It's Practical Computer Vision with SimpleCV. It's been out for, what, four or five years now? So SimpleCV is basically a sort of extension wrapper
Starting point is 00:01:15 on the OpenCV Python bindings. And what it allows you to do is to basically take something that may be an abstruse, you know, hundred lines of code to get working and do it in a single line of code or a couple lines of code and makes things a lot faster to prototype, which is, I think, the great advantage of doing computer vision in Python is the ability to prototype rapidly and sort of figure out the right way to do something before you deploy it. It's usually, I find computer vision is more of a science
Starting point is 00:01:48 where you have to go and try a bunch of experiments until you find something that works precisely. And that means that your prototyping time is the most valuable time that you spend. You're spending all your time doing research, developing, trying stuff. And the faster you can do that, the faster you can get things done. Cool. Okay. Mostly I'm going to ask you about computer vision because it's an area that I always want to try more in and I'm hesitant. But before we do that, we have the lightning round where we ask you questions and want short
Starting point is 00:02:18 answers. And if we are behaving ourselves, we won't ask you for additional information. That never happens. So first, favorite electrical component. Favorite electrical component. Hmm. That's interesting. There's such a line of sensors, right? There's so many sensors. You know, it's your basic CCD. I mean, that's how I get my bread and butter. I mean, maybe it's a CMOS, but usually it's a CCD. I mean, that's how I get my bread and butter. I mean, maybe it's a sea moss, but usually it's a CCD. All right. I mean, that's a very generic answer, I know.
Starting point is 00:02:48 No, that's less generic than some others. Dinosaurs. Do you like dinosaurs? Pro-dinosaur. Would you like dinosaurs to exist? How do they not exist already? In their old form. Sure.
Starting point is 00:03:04 Which ones? Ooh. Let's stick to the herbivores and maybe some of the little flying ones. The little flying ones. That seems safe. Yeah. Like the California brown pelicans. Yeah, well, they were like raptor, or they were like the ones that would fly that were about that size.
Starting point is 00:03:21 They had the sharp beaks, right? And they'd dive down in the water and they'd stab fish. It was pretty cool. Or so we think. Or so we think. Science, technology, engineering, or math? Engineering. What is the most exciting sci-fi concept
Starting point is 00:03:38 that is likely to become reality in our lifetime? Utopian or dystopian? Depends on your mood uh i i i guess dystopian isn't very exciting oh well it's more terrifying well it's been it's been a really rough week so dystopians uh it may not be the way to go this week we should be positive about things i i think the utopian nature i think you know a lot of the bioscience stuff is really changing and i think probably within my lifetime we'll get rid of most of the cancers i think i think that's pretty exciting that is exciting uh what is most important to your job a whiteboard a soldering iron or a keyboard mouse keyboard mouse?
Starting point is 00:04:25 Keyboard mouse. Mainly keyboard. What is your least favorite planet? Least favorite planet? Venus. What is your preferred programming language? Python. What language should be taught first in university courses?
Starting point is 00:04:42 Oh, Python. Okay, so Python all the way. Yeah. What's your favorite text editor?, Python. Okay, so Python all the way. Yeah. What's your favorite text editor? Say Python. Emacs. Oh, I was just trying to keep the Python going. What's your favorite type of snake?
Starting point is 00:04:54 You get that? That was the right thing. Ooh, favorite type of snake. The correct answer is Python. That's not a correct answer. I'll go with Python. I mean python should I be more specific on that? because snakes are cool yes you should be more specific on
Starting point is 00:05:10 what is your favorite snake now I have to be specific about my favorite type of snake and I'm drawing a blank on snakes toilet snake? that one's gotten me out of trouble what did you want to be when you grew up and don't answer snake? I can't? Okay.
Starting point is 00:05:30 Actually, when I was very little, I wanted to be a neuroscientist. And that actually extended almost into college for a while. I went through a year period in college where I thought that was what I was going to do. And then I took chemistry. Hacking, making, tinkering, or engineering uh half engineering half hacking i'm done oh okay um then what is a workday like for you this one can be longer of course uh well workday you know i think it's not very standard i i'm quite actually planet is is fantastic um i really get about on a normal day i get about seven hours of coding in all day you know coding and then discussing and
Starting point is 00:06:12 thinking through things um so i don't do a lot of meetings or managing or anything like that it's very much like come in i have objectives i really like having a day-to-day where it is every day the software that you write does something more than it did the day before, and you can have a discrete sort of accomplishment every day. So lately I work a lot on doing image recognition. So at Planet we have lots and lots and lots of stuff coming down. Pretty soon the entire earth every day is being imaged and we can't process all of it and we can process all of it but we can't we shouldn't because a lot of its clouds phenomenally like 40 50 percent of the cloud the world is covered in clouds every day and you can't really see anything and that information's not really useful and so we shouldn't process it so I'm sort of working on how to weed out all of this scenery
Starting point is 00:07:05 that is not particularly usable or sellable. So this is Planet. Patrick is at Planet. He was on the show recently and we talked about satellites image the Earth every day. Formerly Planet Labs, now Planet. And of course they're still hiring. Yeah, we're always hiring. How long have you been there? I've only been there about three or four months.
Starting point is 00:07:28 Yeah, because I thought you got the job after we scheduled this, or you told me about the job after we scheduled this. Yeah, yeah, yeah. Why would you... It seems like taking the clouds out in post-processing is less efficient than just not taking pictures when they're in clouds uh yeah generally but you have a constrained environment on the satellite and you the way you sort of address these problems right is that you assume that every bit of data is probably sellable or usable and you're always trying to aim for the most data to go through, even if it is not correct.
Starting point is 00:08:10 So you want to process things even though they are clouds because if you lean the other way, you're losing stuff that you can sell. And still, computation is pretty cheap. So since the satellites are constrained, we can't do quite a good enough job. I mean, there's stuff we're working on right now. But we do it all on the ground. And so it's sort of like there is a team that handles getting stuff off the satellites and making sure, you know, here's the data. It comes down.
Starting point is 00:08:37 Here is another group that deals solely with, like, image quality issues as they relate to color correction and how sharp the images are and that sort of thing and then my group sort of is how do we only process things that are useful that we can actually line up to the earth and then we actually do that sort of rectification and then once it's sort of rectified then it is ready to either be used in a mosaic product or sold to the customers and the word rectification here is sort of stapling it to the ground. Yeah. So I like to say like all rectification, most rectification requires that you have a map already. So we already know what the earth looks like through various companies and governmental organizations. You have these base maps. And what we do is we say,
Starting point is 00:09:22 oh, well, here's this like little corner of a road over here and here's this little uh manhole cover you know something that you can see and we we try to line all those up with the thing that we just got and you do that very very precisely to like a sub-pixel level and you say you take these two maps and you kind of like if you think about two paper maps sliding them over each other until they lined up perfectly that's basically what we do and then we do some stuff to correct for oh like there's mountains and this thing isn't actually flat it's actually curvy and round and and funky world yeah and so you have to go correct for all that too okay so computer vision as a whole what can you tell me about it
Starting point is 00:09:59 so i i well so so i like to i like to phrase these things as computer vision is like the absolute worst way of replicating like replicating light for humans to understand it and i say that because like computer vision is right this notion of processing images to get data out and the images that we generally take are really just bad representations of light in the world and like if you actually draw out the system right and i've done this when i do talks like you start out there's a thing and there's a light source like the sun and you have all these frequencies of light coming up the sun like just this huge giant pile of frequencies and they come down and they hit this object and the object absorbs some of them and reflects some of them and they bounce into a piece of glass and
Starting point is 00:10:55 those colors and all of that stuff bounces around in the glass hits a ccd or a CMOS sensor which has well actually before it does that it goes through what's called a bare filter, which is little tiny chunks of red and green and blue. It might also go through an IR filter. So it divides it up and already like weeds out most of the frequencies. Then you're just left with like basically a little response around a certain peak of red, a little response around a certain peak of green and a little bit of blue. Hits the CCD. That CCD has some sort of response curve to it gets digitized processed
Starting point is 00:11:27 beat up stored as bits and then those bits get moved around recreated reprocessed put on a screen that may not represent the colors that actually came in then bounces off from the screen into your eye the process repeats and then it gets weirdly interpreted in your brain and this whole like long chain of events is like the worst possible way of representing what's in the world right there's so many levels of just processing filtering weird stuff going on and yet it all works like i can show you a picture of a cat and somehow we still intuit that it's a cat and like how that actually works is just incredible and then kind of magical yeah and then we're supposed to sort of process it and then like
Starting point is 00:12:10 have a machine say oh yeah that's a cat because cats all look the same yeah i mean well well it it helps right because the other side of it is that you just have so much data i mean when you when you look at like how much data is an image, it's books worth, effectively, of bits. You have 8 bits for red, 8 bits for green, 8 bits for blue, or more. And then you have 1 megapixel, 10 megapixel, 20 megapixels of that. And it's a pile of data. So luckily, just in that giant mass, you can sift and throw away a bunch of it and eventually get to an answer. So another way I like to think about computer vision is it's the process of
Starting point is 00:12:49 throwing away as many bits as possible, as fast as possible, to arrive at the one bit or the two bits that you actually need, which is like a classification or yes or no, or a number that is a measurement of something in the world. Okay, so how do people get into computer vision? If they're, say, embedded software hardware engineers or software engineers, computer vision is sort of specialized. How do you go from college classes to this? Well, you know, so I didn't, my background is actually, I did my undergrad in EE and computer engineering. And I really didn't do any computer vision in undergrad. And I got my first job, and it was for this crazy, awesome, small, it was literally a mom-and-pop defense contract.
Starting point is 00:13:34 It was a husband-and-wife team that ran this 50-person R&D shop in Michigan. And they basically said, here is the list of all the government sbr grants that come out every quarter sbir is the small business innovative research yeah grant okay and if you apply like if you put together a grant proposal and you win it that's basically your pile of money to go do what you want and so i started looking through and i'm like okay what are the cool projects because i only like the one thing i try to do with work is I only try to do cool and interesting, hard things. And so I went through that list,
Starting point is 00:14:08 and I found, you know, the things I thought cool. And everything I found that I thought was really, really cool was related to sort of taking images and understanding them in the world. And I had no idea how to do any of it. But I could sit down and read papers and, like, read through tutorials and just sort of
Starting point is 00:14:25 work and work through it and try to understand break problems apart it's just a general engineering principle it's like well how do I solve this problem well to do this I need to find these lines or I need to find like I'll have what can I say about the system that I'm looking at well it'll be a camera outside which means it'll be light and sunny. So I need something that's invariant to illumination. And I need to look for people in a cluttered outdoor scene. Well, what does the research look like for that? And then I go read through like the research papers and I say, oh, this one does this, this one does this, this one does this. And you sort of say, okay, well, that's sort of what the state of the world is. And what can I take from each of these? And what do I, i you know after you do that a few times what can i take from my prior knowledge
Starting point is 00:15:07 of how to do stuff and like what sets of technologies work really really well and i started out doing this all in c++ and it was hard it was really hard it's frustrating like the notion that you have to have a lot of grit to be a good um to really get good at something that's sort of hard. It takes time and it takes practice. My post this week for our embedded blog is titled Resilience is a Skill. Yeah. So yeah, I totally agree with you there. So that's how you got into it. And then you ended up getting a master's degree, partially in computer vision?
Starting point is 00:15:42 Yeah, it was like computer vision and robotics and machine learning because they're all sort of interrelated i think to a certain extent but how would you recommend somebody now i i again i think it's it's starting to read right so i think probably the best set of tools out there i mean there are some really um there's some really great teachers out there uh now there's sort of various blogs and stuff that people are showing tutorials. Like if you want to get, if you want to do something right now, if you want to like sit down at your computer and learn something like the
Starting point is 00:16:14 open, the open CV examples folder, especially the Python folder where it's like, there's probably 50 programs in there. Right. And you just open them up. You look at the guts inside, you run them, you tweak the guts inside,
Starting point is 00:16:26 you see what happens, you run them, and you keep doing that. And I think that, like if you have a basic knowledge of Python, that is probably the quickest, dirtiest way to start getting your hands dirty. And for that, you just need a webcam and a computer, right? Yeah, you may not even want a webcam. You could probably do it off still images and videos, too.
Starting point is 00:16:48 Webcam makes it a lot more fun if you're going to do like face recognition or mustachination yeah stashination you take a picture of someone i know what you mean and they i just it puts a mustache on them yeah i just you don't think that's a word well it's only a word so much as like snapchat makes like rainbow a nation and other crazy crazy i mean that's a little slightly different technique but it's based on the initial very similar approach okay so i can go through and do this on my computer and then a raspberry pi would be not too far from that no no, no different at all. Just slower. I mean, but if you want things to run faster with computer vision, generally you just make your images smaller.
Starting point is 00:17:32 It's a really handy trick. Things are too slow, make the image smaller. Yeah, and that's, you can, I mean, sometimes the camera will make the image smaller for you. Yeah, it's much, if you can solve, you know, never send software to do hardware's job as much as you can get away with it. I like that. I like that. You know, if the camera will do it, then you should talk the camera to do it. You know, similarly, if you're trying to make something robust, I generally very much advocate for controlling situations. Like people assume that there's magic, a lot of magic bullets out there, which are that all of a sudden,
Starting point is 00:18:08 like I can write an arbitrary computer vision algorithm that works really, really well outside in five lines of code. And the truth of the matter is, is no, no, you can't. But if you want to solve a problem, well,
Starting point is 00:18:19 like a sort of industrial computer vision where you want to recognize a lot of things and, and measure things and look at things consistently, you have to have a consistent environment. You need to constrain the environment a lot. Reading through your book, one of the things that struck me that I hadn't quite realized,
Starting point is 00:18:37 having played just a little bit with some computer vision, is that lighting is something you shouldn't solve in software. If you can have consistent light, that is a physical property that will make your software so much simpler. Well, if you think about signal processing, you want good signal. And sometimes people forget that images are signals. It's a picture.
Starting point is 00:18:58 I should be able to manipulate it in Photoshop or whatever to bring out detail. Well, no, if you have a bad source, you have bad data. Well, people, there's a weird sort of notion that, you know, we understand color, not light. And that's part of the problem. You know, there's so many color hacks, like the fact that magenta doesn't exist. Like magenta is this color that we see on screens
Starting point is 00:19:19 that doesn't, the rainbow, like if you look from red to blue, it doesn't loop back over and mix red and blue again to make this magenta color. It's just a magical thing that we intuit. And it's a very, like you have to remember that light and color are different things. You can't really replicate color well without understanding the light that it's coming from. I'm torn between going down that path and being so confused because last week one of our guests was named Magenta. And so I'm just sort of baffled here.
Starting point is 00:19:52 Well, I want to go back to something you said about magic because I've had limited experience with computer vision in my working life. And the one real serious thing that I tried to do was at a medical startup where we were doing imaging. We were imaging the inside of arteries and it was sort of the same kind of cross-sectional images you get from ultrasound in that they were very low contrast, features tended to be soft and, you know, not have hard edges. And the desire was to be able to pick out all of these features and segment it in these complicated ways to say oh that's calcium that's you know a plaque that's some other disease and
Starting point is 00:20:32 the technical executives were just flabbergasted that we couldn't just do this yeah well there's really like this gets to the very common set of problems. And the first is, you know, A, these problems are hard. They're not like, not everything is a hard boundary. It's a very softy, morphic, like setting up good definitions for what something is, is very hard. You have to be very sort of rigorous in how you set up a problem. Like what is the definition of the thing I want to find? Is it, what does that look like precisely and then the following is like when you represent that to people this gets back to
Starting point is 00:21:09 color too i was actually um i went out with a friend of mine who works at a microscopy company and they were talking about imaging tissue samples and you know looking how do you you know if you're doing some sort of analysis on something and you want to show a region, if you show it in true color, like you actually show what you found, things don't pop. People don't see it. But then when you try to use an artificial color space, and if you don't do it correctly, what does it mean for an image to pop? What makes it very easy
Starting point is 00:21:45 to see and that's something that's that's very difficult so it's not just defining the problem because like somehow latently in your brain like a doctor looking at this thing knows what it looks like but like how do you take this thing that is intuitive and describe it rigorously enough that you can kind of make it into a binary decision. Yeah, and it's like a different class of difficulty, too, because you mentioned seeing a cat and recognizing a cat. Okay, we all know how that works. But if you have something where it takes weeks of training to be able to discern features, okay, that's probably a harder problem to get a computer to do as well, because it's hard for a human to do. Yeah, I generally define don't believe that I can't make a machine do anything that I can't do myself. Right.
Starting point is 00:22:29 And, you know, there's certain things, like, computers can see in, like, different parts of the spectrum, or, like, we can get cameras that see in different parts of the spectrum, but then I remap that to something I can see, which is always, like, a really, I find that one of the most interesting problems right now is that how because we can get like these i used to work with these floor cameras these forward looking infrared cameras sort of night vision but super night vision like heat seeking so you can see temperatures or things in the ultraviolet or just like way out different bands in the spectrum that we don't see the not the red green and blue that we're used to and how you remap those such that people can understand them when it's something that we just don't have the capability. Like if you look at IR images, like IR photography is a different form of photography. You can go buy this IR sensitive film and you go look outside, all the tree leaves are white because that's how they look under IR. And it messes with your brain because our brains are so tied to the sensors that we have that you're just like tree doesn't look like that tree looks like this and it just it messes with things and like how you how do you remap those things such that people can understand this new forms of information
Starting point is 00:23:34 is really kind of an interesting like both science problem and a psychology problem well there are animals that can see ultraviolet and can see polarization and we can't and i've always wanted to see that i mean it's like okay now i want to be i want to see how a butterfly sees well the other the other thing about it is we we define sight as like such a interest like there's this very macro huge wide field of view whereas plants actually see like plants have photoreceptors that see different wavelengths that's how they sort of know like when it's fall or when it's spring they actually see the changes in the elimination and what time it is and everything out there is seeing in different ways that are optimized for its like sort of local application so you know and that kind of gets back
Starting point is 00:24:20 to solving some of these cv problems where people assume that if we take a because it works well for us kind of that if we take an rgb camera and we apply it to a problem that we should be able to fix it when optimally like um the case that comes to mind is i i had a buddy in grad school who was working on something um about sorting recycling and like the different plastic types and generally all plastic to us kind of looks more or less like white or clear plastic you put you look at a couple different wavelengths or a couple a little bit of polarization information all of a sudden that stuff just pops and the types of plastics are like super discernible and you can sort them exceptionally easily and so if you sort of try not
Starting point is 00:25:00 to think like a human but think about understanding the problem and solving it. You can actually make the solution way easier. That's got to be a huge key point to doing good computer vision projects, is both to not expect the computer to do any more than you can, but also it has different sensors, so it can't do more discernment, but it can have different inputs. Yeah, you can, I mean, some cameras can have like better, you know, we can change the depth of field, we can constrain the color of light, we can put filters in front of the lens itself that filter out different wavelengths of light.
Starting point is 00:25:41 You can adjust the light such that things appear darker or lighter depending on like coming in at a blink angles versus backlighting versus top lighting lighting is just incredibly important for particularly like manufacturing tasks well that's one of the things i really wanted to ask you about i have done a raspberry pi project where i got open cv running and i got to blob detection which was was very cool, and face detection, which was all just one line of Python at a time. It was very encapsulated. I didn't have to work very hard, really. And now I think about, well, what if I need to do a manufacturing line, and I want to use computer vision on that?
Starting point is 00:26:20 Can I really expect to be able to? I mean, yes you you can um so the world of manufacturing is sort of broken into vendors that specifically vend things um some of which are like very simple turnkey solutions right like if you just need to read a barcode that that is a turnkey solution you shouldn't roll that from scratch yeah uh that's a thing now if you have something much more complex um i'm thinking quality control where i have a screen that does stuff and i need to make sure my screen is good yeah before i ship it yeah and i think the way that you approach most of these problems is as far as i found is that it actually starts with collecting a lot of data right because if you one of the the places that is really, really bad to be
Starting point is 00:27:06 in computer vision is to be in what I call the optometrist office, where you are changing stuff, trying to develop, and you're saying- Is this A better or B better? A better or B better? Anybody who has glasses has done this. Yeah. And you sit there and you're like, you're changing a parameter or tweaking something. The first step is always to develop data. And you sit and you have a data set and you're like, you're changing a parameter or tweaking something. The first step is always to develop data. And you sit and you have a data set and you have an objective measure of what you want to accomplish. And once you have that objective measure and you set up a test environment where you don't have to deal with, like, say, an assembly line, it's just sitting there, you run across it. You run across the data with whatever algorithm you're working on.
Starting point is 00:27:44 You say, I achieved an accuracy of you you get really into stats you say i achieved an accuracy of 96.4 and then you tweak something it says 96.5 or 96 or just eight because you really screwed it up and and sometimes you can actually just um that's where like computer vision sort of overlaps with things like numerical methods and stats and all this sort of math background. And machine learning here, because you're going to talk about false positives and false negatives and true positives and true negatives and your F1 score and all these things. You end up with all these boxes to put things in and you have to figure out your tolerance of, it's okay to be wrong sometimes. It has to be okay to be wrong sometimes. Absolutely.
Starting point is 00:28:23 And people don't believe you when you say it's going to be wrong. Because even humans don't do most of these tests 100% of the time. And is it better to have it be wrong more? But in one, yeah. So you have to do a lot of this planning and specification of how good does it actually have to be and when somebody says it has to work every time then you multiply your uh planned budget by about a hundred yeah like so i'm doing a million well yeah i'm doing a um a lot of sort of deep learning things right now and i i generally suggest that like well we've gotten to a certain level and i i kind of believe that like as a rule of thumb i don't
Starting point is 00:29:02 know if this is exactly the number but to give you an idea i think that like after the decimal place each a single another nine or another decimal place of accuracy is probably going to cost you another order of magnitude of data yes because what you're doing is you're actually running through your data to find the cases where things break the common case generally if you if you do a bunch of machine learning you'll get the common case very quickly it's the corner cases like we all know in writing any bit of, it's the corner cases that are going to get you. And you have to collect those corner cases. They're like Pokemon. You just got to go and find them. Yeah, we might've talked about Pokemon before the show. But there's also clean data versus data that's not so good. My example for that is if I was training a voice learning system and I took all of my trained data in the morning and then I also took data that I wanted it to refuse in the afternoon.
Starting point is 00:29:57 Like I had Chris record in the afternoon and I wanted to train to my voice. If there were birds singing outside when I was doing it but not when he was doing it it may train on the wrong thing so you have to choose your features intelligently you can't just let the machine go crazy and say garbage in garbage out and and well the other thing is like we like in a lot of these projects use a turk right you use somebody and you say some poor person that like has to sit in front of a computer and just click click click click click to say like this is thing a this is thing b this is thing c so we know very very well that um humans ability to do those sorts of tasks like that's why we want machines is that people can't
Starting point is 00:30:37 do that well for very long you can do it really well for an hour and then you need to go take a break and have a cup of coffee and if you have people generate that data for you, there's always going to be problems with it. That's why you see things like some of these captchas, right? It's not like the captcha where it says, click on three images of pizza in this image or something. Those are usually going through people multiple times. That same image will probably hit, I don't know, three, five, ten people. Because no one person, if you're doing I don't know, three, five, 10 people, because no
Starting point is 00:31:05 one person, like if you're doing that stuff a lot, people kind of disagree. Like even, you know, like even at Planet, like looking at clouds, it's like, what exactly defines a cloudy, cloudy image that we don't want to, we don't want to sell to customers? Because it can't just be white, that may be snow. Yeah, well, and yes, exactly yes exactly but there's other there's other bits of that too where it's like if i could make a way to calculate a number that says like this image is 97 cloudy i've solved the problem and right because i said this this image is 97 cloudy uh and if you define it that way like if you don't have that definition it's still a subjective thing because that's what we're really trying to find we're trying to find the subjective measure for
Starting point is 00:31:50 the subjective thing and that's that's sort of where things are interesting all right computer vision i'm sorry you have me so distracted into machine learning and data and i totally but computer vision i'm going to be focused uh yeah we're out of focus that's part of it too uh if i wanted to build a manufacturing test bed for quality control um and say i i used a computer um i didn't try to do a Raspberry Pi, although it would be very different. And I want to look at a screen. What are my steps? Assuming I don't have your background, that I'm pretty much coming at this without computer vision. Okay, so the first step is you need to find a camera.
Starting point is 00:32:37 And you need to figure out what your illumination situation is. And you probably want to specify the problem. So you have some problem that you want to solve. You want to sit down and very rigorously define what success looks like. Maybe get an idea of how often things will happen. You know, say if it's a widget, you want to say, toss all your broken widgets in a box for two months, all the rejects, because we're going to run those through every single day. Because those are things that you really want to catch you're going to have millions of good widgets so you you first set up defining your problem like this is what i want to find this is how i know i will have found it this is roughly what we think it'll perform at then you have to go and specify before you even do the software side of things it's really you should pick the right tool for a job, which means finding the optimal lens and set of filters.
Starting point is 00:33:27 So do I need RGB? Do I need to work in near IR? Can I just get away with a grayscale image? Do I need like a high bit depth? Because this is something where there's not a lot of contrast. So try to figure that out. And fewer bits is better. I mean, because a smaller image is easier to deal with well it was
Starting point is 00:33:45 just easier to deal with um you know a higher more bit depth can if you have less contrast can pull stuff out a lot easier you know like i said it's all about throwing away it's about throwing away bits as fast as possible but if you need those particular bits then you just need them don't throw them away too fast yeah and so first sort of figuring out a lens camera system like what frequency does the camera have to operate that should it be running over usb can i do i want it to run over gigi there's all these different protocols you know is it going to vary in temperature between negative 22 degrees and 120 degrees or is it just going to be 70 degrees all day and you know will i have to deal with stuff like lens fogging or anything like that what does the lighting environment look like
Starting point is 00:34:33 will i want to put lights in to drown everything out if so how long will they last where should i position them how are they going to get powered that sort of thing so once you sort of deal with those hardware externalities then really hard problems and we haven't even opened Python yet. Yeah, you haven't even opened Python. So then, depending on who your vendor is, you may be writing your own drivers. Or not writing your own drivers, but they might have like a C API, right? Yes. Yes, they do. Camera link. Oh, jeez. See? So they may or may not they may or may not be supported in linux they may only run on windows xp uh you you don't know you know vendors like pick a vendor who has really
Starting point is 00:35:20 good support uh for all of these things you know things. It's horrible to get in a situation where you have to buy the cheapest camera and it has the worst support. There may be an open source Python wrapper to it. There may not be. You may need to use some sort of... You might have to write your own Python wrapper to actually get it at the camera library.
Starting point is 00:35:39 You may not. That's all vendor-specific. I've found recently on a couple of projects, I really, robot operating system, if you're going to like build this on Ubuntu, I think there's going to be some Mac and Windows support actually fairly soon. But if you want to build these, if you want to get access to most of these sort of professional machine vision cameras or even stuff like a DLSR or something like that. Robot operating system, ROS, has really, really good support for getting at these cameras,
Starting point is 00:36:13 like interfacing with the cameras and then putting them on a unified bus structure and presenting you with a nice set of Python tools for interacting with the images, grabbing them, changing camera parameters, and seeing what's going on. So I've used that in a few prior projects, and it's saved me an absolutely ton of time. But so you're basically going to figure out how to get the data and tell the camera what you want it to do.
Starting point is 00:36:39 And that's usually a pretty tedious step. I was actually talking to somebody yesterday who was trying to build a camera interface driver and she was not happy about it. It was really rough. See, we shouldn't have to rebuild these things all the time. It goes back to my previous rant in earlier shows, but I'm just tired of I want to do machine vision. Well, you better
Starting point is 00:36:55 learn to write a Linux device driver. But you can start with Raspberry Pi and their camera and it may not be everything you need, but it may be enough. Yeah. So then you have an image. Let's just assume now you have an image.
Starting point is 00:37:12 You have a JPEG or PNG or something raw, whatever. Then it is a matter of figuring out what you want to do with it. And that can be as simple. You can do IPython. I have actually been doing a lot of my prototyping. Depending on what it is, a lot of it goes into Jupyter Notebooks first. I just want to see Jupyterism. It's now sort of more cross-platform where you can work in other languages,
Starting point is 00:37:36 but it started out as IPython Notebooks, and it's sort of grown into this inclusive community of different languages. But the way to sort of think about it is in the browser you can run code uh visualize code very similar to matlab where you have like matplotlib i've been using bokeh lately which is just builds these beautiful impressive plots that people love um that are nice and interactive and you can save them in their like javascript html and like you can keep all your documentation in there about what you're thinking as you go along you can actually see like process like the newest versions it's incredible you can actually see like how much of a resource you're consuming you can farm stuff out to multiple computers it's like the sweetest
Starting point is 00:38:17 development environment i'm not sure she answered the correct thing when she said her favorite editor was Emacs. I do a lot of work in Emacs. So, Jupyter is an IDE for Python and other languages. Yeah. I have PyCharm and I have Anaconda, which Anaconda isn't really an environment, but PyCharm is sort of an environment. Is that similar? I don't know what you produce.
Starting point is 00:38:45 I'm trying to figure out where it is. Yeah, I mean, so it's, I think the primary difference is, like, PyCharm, I believe, is an IDE that is actually, like, you know, compiled and runs on Mac. It's, like, running Visual Studio, whereas IPython is more,
Starting point is 00:39:00 more or less, like, an access to a Python shell in your browser. Yeah, okay. With like a lot of sugar on top of it to make things render and be pretty and be saveable. Little macros, yeah. Oh, and you have like access to your documentation. It's the cat's pajamas.
Starting point is 00:39:17 Okay, cool. Learned something about Python. So, and then it's really, you know, the process of solving these problems is really, it's really specific to, to what is going on. And I, I no longer pontificate about like a particular approach to generally doing stuff. I have hunches, but for any computer vision test, and this is part of the reason I like doing stuff in Python so much is that you set up your metric, right?
Starting point is 00:39:43 You say, this is the thing i want to do this is well i know it'll be done and then you throw a bunch of stuff at the wall like you have some inclinations given prior problems you say like oh i think i can just do you know i do a little bit of contrast adjusting i do a threshold i find the connected component and i measure the center of this thing. I think that'll work, but you don't know. Like you just start throwing stuff at the wall and tweaking.
Starting point is 00:40:10 And that's where that number, getting an objective measure of what you want to do really, really helps you. Because then it's like, instead of it gets you away from this optometrist, that's better, worse, better,
Starting point is 00:40:18 worse. Cause you can just be in your basement. If you have enough sort of gumption, you can just be down there trying to make things better indefinitely or ever. And if you have a number, then you at least know like, oh, if I get it to 95% tomorrow, I just have to spend tomorrow getting it to 96. And I have a time when I go home. And 95% of something. You can't just have it be better. 95% good. It has to be 95% accurate over
Starting point is 00:40:44 these data sets or over these problem definitions. Yeah, I've had the 95% better and it's like, better in what way? But what you said suggests that you do have to have enough knowledge to understand some of the basic kind of operations, like you thresholding and and connect you know finding connected segments and things so there's like a handful of things like that that you kind of piece together to yeah to get your result i mean for an analysis test like that that is going to be um you know you're what i almost like to call like sort of early it's like first gen normal it's image processing right it basically takes things from 1d and extends them to 2d so it's you know thresholding values above a certain level throwing away values uh basically looking for discontinuities um which are effectively edges
Starting point is 00:41:41 right we're looking for corners corners Corners are exceptionally important in most computer vision tasks because they allow you to sort of position things. You know, and a lot of that stuff, especially for, you know, these sort of old school problems, it's like find a bunch of points, fit a line to those points or fit an arbitrary function to those points
Starting point is 00:42:03 or take the outside and measure across various vectors across something i mean it's a lot of geometry really in your book uh there were terms and it seems like some of these terms are related to what you're saying now um for example dilation and erosion and i'm i was about to define them but if i was smart i would say could you define those so so these are what we call basic morphological operations um so when you have an image right you have each pixel has eight bits of red eight bits of green eight bits of blue and and that's great for representing stuff but it's not really good at throwing away bits. And so if you want to throw away bits, you basically say, is this a bit I care about or a bit I don't care about? And you can usually do
Starting point is 00:42:53 that about brightness. So a thing we're thinking about is, I guess, before we even talk about R, G, and B, there's sort of color spaces, right? So you have a notion of red, green, and blue, but you also have a notion of dark and light, like bright things, dark things. And so you can map between red, green, blue, and bright and dark, and there's all these different color spaces. HSV in particular is pretty handy. And so what a threshold is, is you say, hey, I have bright regions and dark regions. Give me all of the pixels that fall in between this brightness and this brightness, or between this color red and this color red. And you label them as all being, say, ones, as being white, and everything else is zero. So once you have that pile of 1, 1, 1, 1, 0, 0, 0 on a grid, you basically say, well, if I have a white pixel, a black pixel, and a white pixel on one column, and then another column where I have a white pixel, white pixel, white pixel, so I have basically a reverse letter C is what I'm thinking.
Starting point is 00:43:58 So a dilation would say, well, if you have that letter C, one of the operations is to take that black chunk that's in the center of C and change it white. And so this has the effect of sort of, like if you have a blob, right? A bunch of these pixels, all contiguous. It makes the blob grow. You're dilating it. You're making it bigger. So if you're next to a black pixel, or if you're surrounded by white pixels and you're a black pixel, then go white yeah okay and erosion is similar like if you have like a one little pixel sticking out of like so if you have like a row of five white pixels and then another row on top of that where the center one there's just one white pixel on top of that an erosion operation will get rid of that white one and what you're basically trying to do is you're trying to manipulate this blob to make things sort of separate or combined into contiguous regions
Starting point is 00:44:46 is usually what you're going after. And so the example in the book was tools on a pegboard. And by dilating and eroding, they could remove the dots of the pegboard and still keep the outline of the tools because the tool outlines were pretty consistent over that those operations yeah but there are other operations um you said you mentioned brightness and turning up the brightness and i i've done that in gimp i mean i know how to do that on my phone yeah but i did not realize that one way to think about brightness was just adding the image to itself or multiplying by more than one. And that was brightness.
Starting point is 00:45:30 That is the mathematical operation. So what other mathematical operations lead to these words that I already know what they mean? Okay. So, you know, there's a whole subset of these things right like when you deal with binary images right addition right goes back to sort of being logical or right so if you have white and white you have two whites they're still going to be 255 if you have one white one black it's still going to be 255 like so addition of subtraction have that same similar effect um let's say when you start talking about edges right
Starting point is 00:46:12 an edge is really where you're going and you're looking for a derivative i mean if you want to think about it more simply you have two adjacent pixels one may be say like zero and one may be 255 and you subtract them from each other and you put it back in the original pixel and you say oh this is 255 but the next so let me re-explain that a little bit you have zero zero zero 255 255 255 and you go zero zero zero and then you flip to the 255 well that's an edge because it's like the subtraction is gonna be 255 and you go to the next one and it's zero and so it goes away and so all of a sudden if you look at this as a line it's like the subtraction is going to be 255. And you go to the next one, and it's zero. And so it goes away. And so all of a sudden, if you look at this as a line, it's like boop. And that's an edge.
Starting point is 00:46:49 And that's how you start finding edges. And you find two sort of edges in each direction. You all of a sudden have like a corner. And those can be super handy. There was also subtracting two images to show motion. And that makes sense now. But if you'd asked me how to how to look for motion i don't know that taking two images and just subtracting them would have been my first yeah and well it's a it's a it works to a certain extent um you know if you have a really solid
Starting point is 00:47:22 background right what is a green screen, right? Other than like I have this giant field of green. Now you can do certain things by changing color spaces and make it better than it was. But like the rudimentary version is just a subtraction. And everything after that is just getting more and more fancy math and doing more and more sort of processing afterwards. So the green screen works by having all the green in the background and then you video
Starting point is 00:47:48 the person in front of it and then you subtract everything that's that color green or close to that color green. And then in that space that is now totally blank, it's not black or white or red or green or anything. It is transparent. It's alpha. Yeah. Now you can put in a background image and keep the foreground image, the weather casting. Yeah, and if you do it correctly, you can also kind of keep shadows if you want them
Starting point is 00:48:14 and keep illumination highlights and stuff like that because you can kind of tease out the hue, the color from the illumination, the brightness. So you don't just subtract every green in there you subtract every green to some extent well you subtract the the hue part of it like so imagine like subtracting the green part of it but leaving the red part of it but you work in a different color space so you're instead of subtracting you subtract the green part of it but you leave the brightness part of it and then you can reapply that brightness to another thing. One of the interesting things I think about computer vision is actually it's the reverse problem of computer graphics,
Starting point is 00:48:55 which is a really interesting way to think about it. In one, you're trying to project through a camera and recreate it. In the other, you're trying to take things in from a camera and build the 3D model. And that's always just... The interplay between those two is actually kind of interesting because most cameras, when we actually talk about where all this imagery comes from, it's because we're watching TV all the time. We're making films, and some of those films are actually 3D generated. Deconstruction of graphics.
Starting point is 00:49:23 She makes graphics sound easy.'s just math yeah shears and warps what are those and and going back to it's just math i bet they're connected yeah i so uh you know like a sheer like a general like perspective warping right is this Is this sort of, hey, I have like corners on a billboard and I want to, it's like, say you were playing your GIMP example, right? You have your friend holding a book or his, you know, their iPad, and you want to go and take an image that you found on the web of something, you know, you want to show him having an iPad, but you want to put a cat on it. And you take the image of the cat and you grab your tool in GIMP to take the corners for that cat image. And you say, here's one corner for the iPad. Here's the other corner of the iPad. Here's the other corner of the iPad. Here's the other
Starting point is 00:50:11 corner of the iPad. And you warp and move that image into so it fits on the iPad screen. And so that's just a general change of perspective. Now, some of the warping stuff, right? When you say, if you have a Mac, you open up like the camera app and there's all these weird warps that, you know, make your nose look big and your mouth look small. Oh, the circus funhouse things. Yeah, and those are just sort of mathematical transformations where you say, I'm going to take this pixel right here
Starting point is 00:50:39 and map it out to contain, like to the adjacent pixels in some sort of circular way or some round way or some based on some sort of function, right? It's just a definition of a function to remap from the normal pixels to a new set of pixels. I remember thinking in college, I have no idea why anybody would use this stupid matrix math stuff. I was so wrong. So very wrong. Are there things people, are there barriers to getting into computer vision? Things like understanding matrix math? program i you know i think that's a that's a huge barrier um you know the math side of it you know to a certain extent i actually find learning applied math much much easier than learning like abstract math and figuring out where to apply it like if i you know and this is where actually computer graphics i think comes in a more. If you sit down and want to learn basic, like very basic elementary
Starting point is 00:51:50 linear algebra, it's, it's not exciting stuff. At least I don't think so. But if you, if you go and want to hang out with a bunch of kids and you say, I'm going to teach you how to construct a matrix to do crazy rotations in your video games. So that you click on this thing it rotates in this these different ways and then gets big and gets small again at the same time like i'll show you how to do the math to do that and it's just like oh it's just this you know you take this thing over here you take this thing over here and you take these three matrices and you wrote you put them all together and that's how like and you see you can decompose them so that one's the rotation one's the scaling and one's the change in position if you teach people like that and they can see it it's it's way easier to teach like that i mean i think that's sort of one of the reasons i
Starting point is 00:52:35 got into computer vision and into graphics and these sorts of things is that you get this feedback loop where it's not just like numbers on a screen it's actually something you see and you you tweak it a little bit and you get really good feedback and it's not just like oh the you know i move the text box in my web page or something it's it's like a real huge image that you can do really cool stuff with yeah and it it changes both how you see the world and how you can interact with the world i mean if you can make a robot that really can tell that something's broken, well, then you can make a robot that can fix things. It's hard to fix something if you can't see it, and that's where computer vision really starts to shine.
Starting point is 00:53:18 So what sorts of things are you doing for Planet? Well, right now it really is recognition, and it's it's starting out on on the very basics of like um so when the satellites are up and they're imaging we have a reasonable estimate of where they are um but we don't know exactly so sometimes like you you might you know you might overshoot and get a little bit of water before you hit the coast or something like that and you know my job is to basically filter out things like clouds and water and find scenes that are very, very interesting and then find things that customers really, really want in those images. And help us figure and develop tools to understand those images.
Starting point is 00:53:59 And especially now, what's really, really interesting is seeing that data every day. Like, you know, it used to be that this, like, satellite imagery was this sort of thing. Well, we'll get an image now, and then we might get an image in six months, or we might get another year. And with Planet, within the next year, it's going to be, we will be getting images of parts of the globe every single day. And you're going to have to throw away some of that. Well, you're going to. Not actually throw it away, but. Well, the government requires you to delete all the UFO ones, right?
Starting point is 00:54:35 Yeah. So that's most of your work. Actually, I can't comment on that. But you said that throwing away the pixels is a huge part of the job, the ones you don't care about. And so having this huge amount of data every single day, you're going to have to throw away a lot. We save all of it that comes down. We're actually contractually, I believe we're contractually obligated to store all of the raw data. But generally, especially now, some people really care about the imagery. Some people really want to see it other people really just care about the number like they want to know
Starting point is 00:55:11 is this field growing you know how much water is this consuming how much more should I put on it they want very concise answers to very difficult problems and we're throwing it away by actually getting them that answer and those are the sort of things that I'm really excited to start tackling. Yeah, because they don't care what the image looks like if they can get, okay, tell me how to irrigate this. Because that's what I care about. I don't care about your computer vision. I care about water. They want a number that is something like, you know, how many cars were in the parking lot of the mall the day after Thanksgiving?
Starting point is 00:55:44 And how does that correlate with sales in that mall? And give me those two numbers because I can really do something with those. It's great that I can see this and I can get a relative guess, but get me something that's even slightly more accurate of relative guess. This is going to be a weird world.
Starting point is 00:56:00 Going to be? Oh, it's already weird. I mean, Pokemon. Let's see i guess uh planet is hiring they're not doing a contest because they let me have cat instead and embedded fm at planet.com go look please they're nice folks space is awesome so back to the computer vision and the book why is a barcode scanner one of the last examples in the book that doesn't seem hard it's black and white it's lines why is that one of the hardest things to do you said to buy it don't make it which i totally understand but if i'm going through the examples in the book why is that hard it's not hard to do a triv... And this is a camera-based barcode scanner,
Starting point is 00:56:46 not the scanning laser one-dimensional one. Right? Yeah, it's a camera-based one. It's not... So, there are packages that will do this for you, like Zebra Crossing will do this for you. Somebody has already solved this. I have also written these things from scratch.
Starting point is 00:57:05 It is easy to make it work. It is not easy to make it work well. I mean, it's like all engineering problems almost. It's easy to get a prototype that kind of works good, right? It's alright. To get it to work robustly for any sort of application that you would want, right? It's very, very difficult. Like if your barcode scanner, like even now when you go to the grocery store right it still screws up maybe one in 20 times or one and that's annoying right like clerks sitting there like oh i gotta scan and you're just waiting and then like they're
Starting point is 00:57:35 wiping it off they're wiping it off and then they're into the voodoo gods to make it work yeah and it's a it's a very very difficult problem because there's so many different, you know, you have this camera, right, that's probably fixed. And then you have this thing that you can rotate in all these different ways and it can still be kind of seen. You can kind of see the barcode, but you may not see the barcode. And the projection of the barcode in the camera could be really, really weird. So, you know, you definitely want the common case to be it flat right across the thing but you don't always get that and so making it robust to all these different kinds of ways that you can you can turn the barcode and manipulate it and warp it and share it is very very difficult and to then on top of that there's lots of different barcode formats too
Starting point is 00:58:23 yeah that didn't seem as hard a problem but the warping and sharing and dealing with lighting But then on top of that, there's lots of different barcode formats too. Yeah. That didn't seem as hard a problem. But the warping and shearing and dealing with lighting effects, if the barcode is half in light and half isn't, you can't use the same threshold to say what's light and dark. Yeah. And or, I mean, there are techniques to improve on that. This is also why certain things like, well, you may actually want more bit depth because you just can't get that data out
Starting point is 00:58:45 unless you actually have a little bit more range at the dark side. And is that a similar problem to putting known features around a room if you want to do position detection or if you're trying to do some robotic thing where, okay, I want to go over here and find this position. Well, I'll put a thing that looks like a barcode. Like the QR codes. Can you make that an easier problem than the general barcode like the qr can you make that easier
Starting point is 00:59:06 an easier problem than the general barcode case i guess my question it's it's effectively the same problem yeah okay um yeah and you you kind of intuit why you do this like the you have to understand the logic of why barcodes are in some ways are the shape they are and why you use those for like understanding position um so if you if you stare at a white wall like if i were to sit you in a infinitely large room with a giant white wall and you're just kind of floating in space if you move away from that white wall you don't know really how far away you are and if you move to the left you don't know that you move to the left or the right or up or down because you have no way, I mean, you might feel it in your inner ear, but you don't know how far. If I put a single line,
Starting point is 00:59:55 like if I had a giant pencil on this infinite wall and I drew a line, you could tell if I moved you, and it's a horizontal line, you could tell if I moved you up or down relative to the line. But if I move you right or left on that line, you're not, you can't tell how far you've moved. Now, if I put another line perpendicular to that, now all of a sudden you actually have a frame of reference and you can actually tell if I move you left, right, up or down. And so, you know, having corners and having lots of corners and having to understand and being able to sort of see those shifts in movement or what helped you to do these sorts of problems. I'm trying to speak sort of generally about these things.
Starting point is 01:00:30 Because you need these corners to then basically do a de-warping to get it to sort of the square thing that you want, which you can either then figure out what's the transform to get to the projection that we had. And then I can also, since I warped this back to a sort of 2D space, I can then read the barcode very very easily okay because i'm looking for basically the relative like widths of the different lines so is making a line following robot still one of those intro things you do when you're
Starting point is 01:00:58 learning about computer vision and robots or is there some other new nifty thing that everybody does? Well, I feel like I have never actually made a line following robot. Truth be told. I think generally the easiest, most approachable thing is actually to go to use these QR codes or these AR. The primary one is AR toolkit. And this is just a package in ROS now that you can get. And it's super easy to use these QR codes or these AR, like the primary one is like AR toolkit. And like, this is just a package in Ross now that you can get, and it's super easy to use.
Starting point is 01:01:29 And you can actually, you know, once you know the size of the, of the, the barcode, you can actually reconstruct the camera transform. So you can say this thing is here and my camera's here and it's moved like this as long within certain fields of view, you can't like go completely oblique to it and like,
Starting point is 01:01:44 then it doesn't work. But I feel like if I had to suggest one very, very early, very, very useful approach, I think that's sort of the modern equivalent. If you have a camera and you have a Roomba and you can put up those barcodes, you're getting to the point of actually being able to develop a map and understand how your robot moves around the world. My next question is going to be, what are the easy and fun things to do that will get people over the hurdle of just trying out computer vision for the first time but i think you just answered that yeah i mean i also feel like uh at least if you're coming from say like a
Starting point is 01:02:18 web dev background i i think the world really needs more sort of websites where you submit an image and some degree of processing happens on it. Like it's mustache-ifying or like other sorts of things. There's like this new app right now that sort of takes the, some of the neural nets trained on like classical art and then like reproject the image. Oh, yeah. Like, I mean, all of these things. Deep dreamer.
Starting point is 01:02:43 Deep dream. And like, I mean, as stupid as it is, right? Like the Snapchat sort of face filters, right? Like these, like, it's not that big of a step to go from, hey, I process an image and pick the color or found, you know, put a mustache on it to doing something like that. It's just slightly more work, but you have all the tools right there so not as somebody who doesn't use snapchat i know i use everything else um this is this is when you put mustache on people but
Starting point is 01:03:18 there are other things you open your mouth and a rainbow comes out and your eyes get big and all right this is sort of like when they're doing the uh like pixar movies and they make the faces based on actors faces and they move like you can almost see who the actor is underneath yeah and they're doing that sort of real time with cameras without putting dots all over your face? Yeah. Oh. Well, and, you know, these are all like sort of chunks of known technology that are all put together, right? Like we've had a basic face detector. Like we can see full on front faces and it's called the VL Jones detector.
Starting point is 01:04:00 It's a very classic sort of thing. And you can find faces very, very quickly. But then you have to do that every image and that's computationally expensive. So, but you have lots of images coming through time. So how do you track a face? Well, you take something like a common filter or some of these other tracking sort of filtering technologies, and you say, okay, well now I can kind of constrain where I'm looking for that face. Now that you have the face all the time, I forget the name of the particular technique, but you have this basically 3D mesh of your face, right? That has the points in the face all the time um i forget the name of the particular technique but you have this basically um 3d mesh of your face right that has the points in the corners the eye and your nose and what you're trying to do is you're trying to take the lines on that mesh
Starting point is 01:04:34 and uh adjust them such that they align with the sort of um lines that you can find visually so like the the lower end of of your eyelid and the upper line of your eyelid, your eyebrows, right? You're trying to find, you know, there generally appear to be dark lines. And so you're trying to fit that mass to those. Once you have that in 3D, then you can say, oh, reproject this image, this mustache or whatever you want onto it. And we've taken each bit of that sort of pipeline and sped it up and optimized it. And that's what's going on there.
Starting point is 01:05:09 So I think there's a really good BuzzFeed article on how this actually all happens and how they do it step by step. And it's really a good read. Let's see if I can find that for you. I will link that in the show notes. But now I'm just so full of the idea that I should be on Snapchat and I can be my own cartoon.
Starting point is 01:05:26 No, no, no, no. and using these existing tool sets versus implementing something like OpenCV or other toolkits for other platforms versus inventing brand new stuff. These seem like very different things. How much do you use MATLAB? NumPy all the way, man. She said it. Or NumPy.
Starting point is 01:06:10 Nobody's going to respond to no i i had somebody say a phrase i loved the other day the numpy space princess ah sorry um so you know it's a really it's like a practical problem versus a research and development problem. And it really, it really depends on what you're looking for. And, you know, I stronger and better understanding of the problem space, then you very much do sort of start walking into, I'm going to write something from scratch. But a lot of, you know, a lot of the more interesting things are built out of tools that we already had. Like, if you look at most audio processing, right, it's not like you're going to go re-implement the FFT library every time you want to do audio processing. Similarly, if you need a feature detector, you're not going to re-implement that. I mean, enough graduate students have written enough papers and built
Starting point is 01:07:15 enough code that at this point, it'd be a waste of your time to go and do that. You don't really need to do that. And so a lot of things are sort of building up on more and more complex things that we've already built, right? I think any sort of development at this point is sort of standing on the shoulders of giants who've been doing this work for years and years and years. But you do kind of have to learn what's underneath because you can shoot yourself in the foot if you don't understand a little bit re-implement it. But by and large, if you're working on a large Windows, Linux, Mac system, you're going to just pull from a library that does the basis of what you're doing. Cool. Alright, I'm about out of questions. Christopher, do you have
Starting point is 01:08:23 anything? Do I have anything? Well, I I'm about out of questions. Christopher, do you have anything? Do I have anything? Well, I'm not actually out of questions. I just, okay, we're about out of time. I admit it. Fine. Do you have anything? Anything at all? I have nothing. I have nothing. Okay, then I guess I do have one more here that I would like to know. Do you have anything to say about Tempo Automation? They're great. Go send boards to them. Okay, so cool. What was raising venture capital like?
Starting point is 01:08:54 It's super fun. Go send boards to them. It's an interesting world. Being an engineer, it's not exactly what I do. I just kind of explain the hard technical parts and why it's important. You know, it's not that, you know, you have to choose if venture capital is something right for you, I think is the biggest sort of parting thing I can say about that, is understand exactly what you're getting yourself into. If only we could. That's tough advice to follow. Well, do you have any final thoughts you'd like to leave us with? I would say for a final thought, you know, images are probably some of the most dense bits of information that you can put on the internet and be very careful with where you put them.
Starting point is 01:09:49 My guest has been Catherine Scott, co-author of Practical Computer Vision with SimpleCV, The Simple Way to Make Technology See. And she's a senior software engineer at Planet. Thank you for being here. Thanks. Thank you also to Christopher for producing and co-hosting. And of course, Thank you for being here. Thanks. Thank you also to Christopher for producing and co-hosting. And of course, thank you for listening. Hit the contact link or email
Starting point is 01:10:11 show at embedded.fm if you'd like to say hello. If you'd like to apply to Planet, use the embedded fm at planet.com email so we get credit. I don't know what the credits lead to, but you know, I want them. Our own satellite. Oh, that'd be cool. Our quote this week comes from Muhammad Ali. Champions aren't made in gyms. Champions are made from something they have deep inside of them. A desire, a dream, a vision. They have to have the skill and the will.
Starting point is 01:10:45 But the will must be stronger than the skill. Embedded FM is an independently produced radio show that focuses on the many aspects of engineering. It is a production of Logical Elegance, an embedded software consulting company in California. If there are advertisements in the show, we did not put them there and do not receive any revenue from them. At this time, our sole sponsor remains Logical Elegance.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.