Embedded - 253: We’ll Pay Them in Fun

Starting point is 00:00:00 Hello, this is Embedded. I am Alicia White, here with Christopher White, and this week, here also with Kathleen Toot. We're going to dive into computer vision, augmented reality games, and meetups. Hi, Kathleen. Thanks for joining us. Hello, I'm happy to be here. Could you tell us about yourself as though we met at a technical conference? Okay, my name is Kathleen Toot, as you just said, and I am currently a software engineer at a computer vision AI company called

Starting point is 00:00:38 GrokStyle. My background from this current job and past projects uh involves computer vision game design crowdsourcing human computer interaction all those things kind of wrapped up together and um what i really like doing in general is like taking interesting computer vision systems and building interactive like other things around them that people can actually use and play with. Yes, we have so much to talk about. First, we have Lightning Round. You've heard the show, so you know the goal is fast and snappy. Do you want to start? Okay, Christopher is shaking his head, which goes over well in podcast land minecraft or pokemon go um i like both of them a lot favorite open cv

Starting point is 00:01:32 function i don't like open cv as much favorite open cv function um i i one thing i do like about open cv i like just like reading in an image and then also displaying it. So the ones that read images and the ones that show stuff, those are probably my two favorites, just so I know what's going on and that I can get started and build something on top of that. Favorite computer vision library, then? library then um so when i was a grad student i did a lot of stuff with uh this this tool this system called bundler which is a structure for motion pipeline so i'm pretty fond i love hate relationship with bundler um and now there's a there's a python version of it called OpenSFM that is run by this company called Mapillary. OpenSFM. Okay.

Starting point is 00:02:30 Is it my turn or yours? It's my turn. Oh, go, go, go. Favorite VR game? I went to a like a capstone demo Sammy Showcase at UC Santa Cruz a bunch of games that students have been working on this past year.

Starting point is 00:02:48 And I played this like ghost cat VR game where you're a little ghost cat and you have another buddy ghost cat and you have to stay near each other. Cause they're your source of illumination. And then you have to like jump around and get to other places. And so that ghost cat VR game. That no one else can get to other places and so that ghost cat vr game that no one else can get no one else i mean they're trying to make like a research platform out of it so maybe you can play it soon i don't know that's like the only recent thing that i can think of tip every you think everyone um stop writing bugs just stop it okay so uh um a long time ago I was like programming stuff with my husband my boyfriend at the time and um and like I was just kind of being sloppy with with

Starting point is 00:03:44 what I was doing and I'd like write some code and then I'd run it. And then I'd, like, read the error and, like, oh, I spelled that thing wrong. And it was just, like, this really slow process. And he was, like, just stop writing these bugs. I was, like, okay, I'm going to try. I'm going to be more mindful about this and just, like, go a little bit slower. And think, like, I'm a human. I can do this as best I can and, like, try to just not write the bugs in the first place.

Starting point is 00:04:07 And then whatever errors do come up, they're still there for me to figure out, but the really basic ones, I can kind of just try not to do that. Yes, I know what you mean. That's when monkey coding is what it is for me when I do it, where I'm just like, oh, I'm just going to type at this until it works. Yeah. And I'm not going to sit there and think about how it should work and how I can get from where it doesn't work to where it does work.

Starting point is 00:04:35 I'm just going to keep incrementing this variable until the timeout is the right length. Yeah, yeah, just like stumbling through it. And that works sometimes. Thinking about it actually is kind of better. Actually better in so many ways. Stop writing bugs. That's great advice. Computer vision. When people say computer vision, what do they mean?

Starting point is 00:05:17 So I would say computer vision is the ability for a computer that's gotten some sensor, some picture of the real world. Maybe it's like a picture from a normal RGB camera. Maybe it's a depth sensor some more enhanced picture um the computer's ability to like make sense of that and understand what is going on in that scene whether it's recognizing the objects in the scene or the activity that's happening um or just like more information about what the scene really represents um a facial expression that someone has, or who the identity of a person is, those are all like computer vision things, like the ability to understand things about the real world from a picture or a movie or something kind of like a picture, like a depth.

Starting point is 00:05:57 I like the way you put it, because it is about the computer, not just acquiring the data, but being able to do something. You said understand, which computers don't usually do, but it's that level. It's an intelligence. Yeah, to make sense of it enough at some, whatever understanding level is possible, that then you can actually use that in some other system. You did graduate research. And I know I'm not going to get the word right.

Starting point is 00:06:30 Photogrammetry? Photogrammetry. Photogrammetry. I missed the R. Okay. What is that? So photogrammetry is the ability to get a bunch of images of one thing all from different angles and kind of come up with the 3D structure of the item that all the cameras are looking at and also the pose of each of the

Starting point is 00:06:55 cameras, how they relate to one another and to the object themselves. And so if I go and I take a picture of the Eiffel Tower and then I take another picture of the Eiffel Tower, and then I take another picture of the Eiffel Tower, you can build the Eiffel Tower in 3D from my photos? Yeah, yeah. But two isn't enough? close enough that they're seeing roughly the same view, but they're also like spread out enough that they're not exactly on top of each other. And you can get some kind of 3D information from them being split apart. You could know where the pictures are, where the Eiffel Tower is, but if you want to go further and get like a fuller 3D model of the Eiffel Tower, you'd want many pictures of it from many different angles.

Starting point is 00:07:45 And that might be enough to fill in, um, the actual structure of that object. Although the Eiffel tower has a lot of like, uh, like cross braces and things where you can see through it. And that might, that will probably be a little bit challenging for the computer to,

Starting point is 00:08:02 to make sense of like, uh, something. how about the washington monument um the washington monument that has not enough detail right it's yeah and it kind of looks the same from all four sides the tall one uh the canonical examples of this where like structure for motion photogrammetry sort of became a thing that other people started really running with was uh this photo tour photo tourism project at uw where they took a bunch of photos from um flickr of popular tourist places like notre dame cathedral

Starting point is 00:08:39 and trevi fountain in rome and used those photos. So those are places that there's enough texture and structure there, but it is kind of this continuous surface that you can't see through, like the Eiffel Tower. So it's better if you have something that has a lot of detail, but not see-through and not repeating detail. Right. This is a lot of caveats. It definitely not repeating detail. Right. This is a lot of caveats. It definitely is, yeah.

Starting point is 00:09:09 Okay. And then, so the way I think it happens, like my mental image is you have picture one, and then you take picture two, and you try to map up all of the same points. And then you take picture three, and you map up all the same points on picture one and picture two. But then I'm kind of lost. I mean, I know you can use convolution to map up points that are the same, but what happens after that?

Starting point is 00:09:39 Is that even right? That is totally the first step of getting two images or more images or pairs of images in a whole big collection of images and figuring out what all the interesting points in these images are and these like 10 images and it was in these pixel coordinates of these images over here and you just have a whole bunch of data of like those correspondences and then you throw it into something called bundle adjustment and that will figure out the 3d positioning of where all those points should be in 3D space and where the cameras should be, like what pose they should have based on all these camera pinhole math equations there. Okay, we're going to ask you about that too.

Starting point is 00:10:35 So don't get comfortable with me skipping that. But even that first step, are you using the RGB images or are you trying to find vertices? I know they're, what, what kind of algorithms do you use to even find the points? And then what does this bundle thing do? So the algorithm to find the points initially, um, SIFT is a good one. And I know, I think your, your typing robot uses the same SIFT feature points to figure some stuff out. It does, it does.

Starting point is 00:11:11 But when I did it, I just used OpenCV, and it magically worked. I have no idea what the algorithm was. That was part of when I was trying to figure out where the keys were. And I had a perfect image of a keyboard. And then I had my current camera image of the keyboard. And it was SIFT and FLAN and homography. And I just typed them in. And wow, it just found them. And I did nothing.

Starting point is 00:11:42 Even when I changed the lighting, it was pretty good. So what does it do? So to break it down a bit more, SIFT stands for Scale Invariant Feature Transform. Yeah, transform sounds good. And basically for a computer to start understanding an image, Yeah, transform sounds good. a building and there's a windowsill and the part where like the outside of the windowsill comes together in an angle and it's on top of this like there's a brick facade or something so that the the sill and the brick are different colors and the light is casting shadows in a certain way

Starting point is 00:12:39 like that particular corner of that building might have a, or will have a distinctive look. And, um, the sift feature of that particular point would capture something about the colors there, but more importantly, the edges and like what angle they're at and how strong, how, how edgy they are, how cornery they are. And, um, and the scale invariant part of sift means that if you have a picture of that windowsill up close and you have another one that's maybe far away and maybe it's like rotated a little bit that that particular piece of those two images will still look very similar like it will have a descriptive way that the computer can represent it that it can tell that they're the same point okay okay so now we found all of these point correspondence

Starting point is 00:13:32 points correspondence yeah i mean they start out as just like these are feature points these are points of interest these are little little corners or things that a computer can can say like i i know what that is i know where it is versus on a like a plain blank wall there's nothing special about like a pixel um in the middle of that space it could be anywhere um and then when you have multiple images like two images that both have sift points and you kind of figure out the correspondence between them that's when the correspondence part comes in and so i can i can sort of understand with two images, because that's kind of how my eyes work. It's 3D vision.

Starting point is 00:14:11 And if my eyes were further apart, if I had a really big head, I would be able to see 3D vision further away. But right now, after about 10 feet, everything's kind of flat. I know there's actual math that would tell me how far it is. But realistically, I'm pretty cross-eyed. So 10 feet is really about it for me. Don't play basketball with me.

Starting point is 00:14:45 And so when you have two photos taken apart, far apart, then you can get more depth. Yes. But my eyes work because they're always in the same place. They always have the same distance between them. It seems like a chicken and an egg problem that you can find these points and you can find the 3D-ness of it, but you also find where they are. How do you, which one's the chicken and which one's the egg and which one comes first um so you're totally right that like our our eyes where our brains have calibrated the fact that these eyes that we have are always in the same relative position to one

Starting point is 00:15:19 another and um and i i think like 3d reconstruction techniques from two images have existed for a while and they've started out with we need to calibrate these two images relative to each other first like they're going to be mounted on some piece of hardware and they're never going to change and if some intern bonks them like then they have to go recalibrate the whole thing. Yeah. Yeah, I remember doing, yeah. And they have these, like, calibration checkerboards that you can set up. And there's probably some OpenCV function for, like, look at this checkerboard and figure it out. Like, figure out what the camera is. There totally is.

Starting point is 00:15:59 Yeah. So, getting from two cameras where you've calibrated them already, and also you have to calibrate like the internal, like lens distortion and all of that of a camera. And that's where the checkerboards come in. Um, but having more cameras, yeah, you need to figure out like, what is the 3D structure of the points I'm looking at that will help me figure out where the cameras are. And you also need to figure out where the cameras are to figure out where the 3d points that you're looking at are and um and what this bundle adjustment technique is well i guess there's you had mentioned homography or alluded to it um homography is like an an initialization step of there's two cameras and they're looking at the same thing.

Starting point is 00:16:48 And if that thing is like a planar surface, it's kind of understanding the relationship between those two cameras. Yes, in my typing robot, I have the keyboard, the perfect keyboard. And then I have my scene of however I put the camera up today. And then the homography, I take a picture and it like maps the escape key onto the escape key and the space key onto the space key. And then it gives me the matrix that I can use to transform from my perfect keyboard world to my current image world. And so that matrix I can then just use to transfer coordinates between them. Right. So if you have all these pictures that tourists took of the Eiffel Tower, you can look at the pairs of cameras and look at the SIFT correspondence points that you found between them and kind of estimate a homography by like, what is that matrix that

Starting point is 00:17:53 says how this one camera moves to become the other camera? And it might not be perfect because the points that you're looking at in the world, there's maybe stuff that you don't you don't have enough information yet you don't know what the internal camera parameters are for that particular camera but you can get some initial guess and then um what bundle adjustment does is takes all of these initial guesses of how all these cameras and points and tracks of points seen by multiple cameras fit together, and it kind of comes up with an optimization that solves for both of those things at the same time. So it takes all of the correspondence points for each pair,

Starting point is 00:18:36 and then it minimizes the error for all of them. Yeah. And so if you end up with a bogus pair, like on my keyboard if I was mapping A to Q, if I took a bunch of pictures, it would eventually toss that one because nobody else agreed with it. Yeah, it might toss it. Or it might be like, I think this is right, and it might just be wrong. And then it skews everything.

Starting point is 00:19:04 Yeah. So in this project that I worked on in grad school called Photo City, which was a game for having people take lots of photos of buildings and make 3D models, I saw a lot of this 3D reconstruction stuff gone wrong, where a person would take photos and the building the wall of the building would grow but then it would just like curve off into the ground or just like the model would just like totally flip out and and fall apart because the the like this bundle adjustment this effort to kind of figure out cleanly where everything goes would just get really confused or sometimes there would be like itsy bitsy teeny tiny upside down versions of of a model that were like really

Starting point is 00:19:49 close because the computer was like this this makes sense to make like a tiny version of this uh this building here it kind of looks the same as having like one that's really far away yeah i mean you get a discoloration in a building that has bricks, and then you end up with the small discoloration of the bricks, and it can't tell the difference because size and variant. Yeah.

Starting point is 00:20:17 Computers, man. They mess up sometimes. When you do the minimization problem of finding all the matrices, which gives you the 3D aspect, that's when you can start figuring out where the people are. Because you can backtrack. Once you're confident that these points are in this space, you can backtrack to where the camera person must have been. It's doing both at the same time and kind of going back and forth between optimizing where

Starting point is 00:20:45 the points are and optimizing where the cameras or the people holding the cameras must have been. And you can say, I have a pretty good guess of where the 3D points out in the world are. But if I wiggle the cameras around a little bit, then we'll come up with a better configuration that minimizes that error even more and um and that the error that we're trying to minimize is like do these points in the world do they project back onto the right pixel coordinate of the image or are they they off we're trying to sort of get everything to make sense um across all these different pictures. And in the end, this is a massive linear algebra problem. Yeah, pretty much.

Starting point is 00:21:30 That's weird. I mean, it sounds like you put photos in, you get locations and 3D out, and so it sounds so smart, but in the end, it's just massive amounts of A plus B, X plus C, Y. Yeah, yeah. It's totally like magic that this is possible, but it's also totally not magic. It's just like just a bunch of math.

Starting point is 00:22:00 It used to be whenever we were doing computer vision stuff or machine vision or whatever we were calling it, there was the requirement that things be lit very brightly. That went away. Why did that go away? How did that go away? That was like the core thing with object identification and location. When did lighting stop mattering? Or does it still and I'm just using better tools? There can be a number of things involved.

Starting point is 00:22:32 Lighting still matters, but SIFT is pretty good at matching things even when the lighting is a bit different. Another big thing might be that the quality of cameras that we have is better now. Like the webcam that you have or the camera on your phone or the camera that's built into your laptop, those can, they can work better in like lower lighting, crappy lighting. They will also just take clearer pictures. So I imagine that it was more critical in the past because having like the cameras just like couldn't see very well. And so you really had to make it easy for the cameras. And then a third aspect is that we have a bunch of data online taken by cameras. And, um, and so there's a lot more that we can do to say crappy cameras not very good

Starting point is 00:23:28 cameras um um and we can like learn more from all of this data that's available so we can kind of compensate for the fact that the lighting might not be as good because we've seen enough examples of something with not very good lighting that we can still understand what it's supposed to be. It's interesting that it's the camera technology that is one of the drivers. I hadn't really... It's probably the application, too,

Starting point is 00:23:59 because if you're doing a manufacturing thing, you want everything to be exactly the same all the time. So, okay, we have good lighting and we know the lux and everything. Every time, just know the circumstances don't change. Whereas for a more general vision application, you might be taking pictures anywhere. And so you have to be able to adapt. Yeah.

Starting point is 00:24:19 If you don't have to be able to adapt, then it's easier, right? Yeah, yeah. Like, because there's like, that's working well enough this technology of like taking a picture and adding to a model or taking a picture like recognizing some object in it um and those are getting into the hands of consumers uh you're totally right that that now people want to use that in a wider variety of applications so it's kind of pushing the limits of like we need to work on making this better we need to work on making it still

Starting point is 00:24:46 figure out what it's doing even if it's some random person taking a picture in their like dark living room and i think that has gone back to the manufacturing areas that even there you don't need the bright lights because we've learned to adjust to people taking pictures. It's cheaper not to have to do that. You can use consumer-level stuff, yeah. Yeah. Okay, so at the end of taking a bunch of pictures, you get a bunch of points on your Notre Dame or your Eiffel Tower,

Starting point is 00:25:20 although we agreed that was kind of iffy. And then you get the location of the people. Which one is more important, and what do you do with it then? I mean, part of me is like, oh, this is a surveillance thing. I should never take another photo in my life. The locations of the camera is probably a lot there's more information there because you can understand where the people were who was taking these these pictures where they were standing where people can can go um like the points themselves there might not be enough of them to really

Starting point is 00:26:05 do something like like the points on notre dame or the points on the eiffel tower they it's kind of like okay now we have a crummy point cloud of this this place and we could just get our 3d model of that object another way um but then to know where all the humans were standing and um there's a project that was like a follow-up to this photo tourism project of looking where people walk in the world when they're taking pictures of things. And they like made a little map of people walking into the Pantheon and where most people took photos. And you could see that you'd like walk in and you kind of go to the right and lots of people would take photos right when they got in of like the ceiling and other stuff and then

Starting point is 00:26:47 they'd walk around and the amount of photos that they took kind of trailed off collectively because people just got it out of the way at the beginning and uh and i went to the pantheon in rome and i was like i've never been in this building but i i know what to expect where people are going to flow in this space and where everyone's going to be taking pictures. And sure enough, like you go inside and you're routed around to the right in like a counterclockwise position, and all these tourists are pointing at the ceiling in the beginning and not so much at the end. Museums could use this to figure out which artworks are getting the most attention.

Starting point is 00:27:31 I mean, I guess just the number of pictures taken of each artwork, but where people stand, there are a lot of times where how the crowd moves is an interesting problem. But that was not what I asked you there for. Now I totally want to talk about that. Building the 3D models. That was what you were doing. You were taking the point clouds and making 3D models, right? Yeah.

Starting point is 00:27:56 I mean, I was building this game, this crowdsourcing platform around this structure for motion system where people could be empowered to go take pictures of wherever and make 3D models of wherever. So in some sense, it was about getting the 3D models, but it was also about just like, how do we get an entirely new kind of data that doesn't exist online already?

Starting point is 00:28:24 But that data does exist online not really like we have a bunch of pictures of the front of all these fancy tourist buildings but we don't have enough around the side like the like people aren't going to be like walking down some alley taking a bunch of pictures on their vacation unless they're they're playing photo city and they're gonna or they're they're they're doing some other like crowdsource street view thing like like mapillary which i mentioned before um but the data it's not there there's there's gaps in what is what people have taken just of their own accord and posted online this is something that i have heard you speak on some, that the data we have for so many things is,

Starting point is 00:29:10 I mean, biased, even visually, but biased in all kinds of ways with gaps. And you want to gamify filling in the gaps. Yes. That's cool. you want to gamify filling in the gaps yes that's cool weird strange cool how do you how do you convince humans that they should help their robot overlords get more data and understand just the world around them better, there can be better applications built for humans to use in our daily lives. Can you give me examples of gamification of this sort of thing um oh there's like two two tangents here

Starting point is 00:30:10 uh like one part is is about gamification and one part is about how like just applications built on data ai applications like there's data out there and then people try to use it and it works for some things, but it doesn't work for other things. And like it needs, there's needs to be more data that directly relates to what a person is trying to do. And because there's some system of some like human trying to do something and an AI system isn't working for them or it works sometimes, maybe that can turn into a fun game. Like what is the computer good at knowing? What is it not good at knowing? How can I stump the

Starting point is 00:30:51 computer? So an example of things that they may not like be called games, but they're kind of game-like. A couple of years ago, there was this, this how old robot, like age guessing thing that Microsoft put out where you uploaded a picture of your face and it found the face in the image and then it estimated an age for that. because it would either have some really accurate response or it would have some really hilariously wrong response, like, oh, this picture of Gandalf says he's like 99 years old, like, ha ha ha, or this picture of me like says I'm way younger than I actually am, how flattering, or kind of funny things like that. People found ways to play with it and figure out all its limitations and what its capabilities were.

Starting point is 00:31:57 And they kind of had this communication around it. Last week we talked to Katie Malone about AI and one of the things we talked about was fooling the AI and the Labrador puppies and the chicken image. The fried chicken. Where the AI is confused

Starting point is 00:32:18 as to which things are dogs. And there's a whole set of dogs or not. Like chihuahuas that look like blueberry muffins. I loved those. Yeah. Yeah. Like chihuahuas that look like blueberry muffins. I loved those. Although when I told the chihuahua owner that their dog was a cute blueberry muffin, they totally didn't get it. Oh, man.

Starting point is 00:32:36 Yeah. Okay. So there's the fun aspect of making fun of the computer. And also trying to help it along and like oh you're i want to help teach you to to do better and like if we can kind of elevate what uh what computers are capable of then there might be areas where then we are suddenly more powerful more capable because now we have these better trained tools at our disposal okay so so there's the aspect of wanting to train slash one up the ais and then there's straight up gamification yeah that's where you compete with other people to provide the AI with more information. Yes.

Starting point is 00:33:27 So there's like a history of gamification, especially regarding data collection. There's a, there's a series of games or there's a genre called GWAPs or games with a purpose. GWAPs, really? That's how we're gonna pronounce that yeah i thought it was g waps a g waps okay um games with a purpose okay and i i i actually like like i like i've built games with a purpose, but I also am highly critical of games with a purpose and gamification.

Starting point is 00:34:09 And when it's done shallowly and when it's like, oh, we'll just sprinkle points and leaderboards and badges on top of something to try to get people to do this task for us for free. We'll pay them in fun. And sometimes it's not fun. The game wasn't designed very well. It doesn't make sense to be a game. There's many cases where maybe you should just build some task on Mechanical Turk and pay people fairly to do that task instead of trying to go in this like roundabout game way

Starting point is 00:34:47 okay so so you're ambivalent about gamification and i totally understand that what would make it be done well i mean what what are the hallmarks of actual fun so okay um there's a book by raf koster called the theory of fun and one of the the ideas of that book is that learning is what makes games fun there's some picture in the book it has lots of pictures it's got like kittens rolling around and it says like the young of all species play and like kids and kittens and puppies are are playing but they're learning a ton as they're playing and one of the i think a thing that's that like almost basically every game has is you're learning the mechanics of that game you're learning the rules you're learning the system and you start out like not knowing that game, but that game will help you gain the skills that you need to do more interesting things in that game.

Starting point is 00:35:50 And this also fits into this theory of flow by this guy with a name that I can't pronounce. It's like Chixiameth. It has a lot of C's and Z's and H's and stuff in it, and I can look it up later. But this idea that... Wow, that is a lot of... Mihaly? Csikszentmihalyi. Yeah, flow, the psychology of optimal experience.

Starting point is 00:36:23 Okay. Sorry, go ahead. Okay. Yeah. I'm glad you all tried to pronounce that. I didn't do any good job. So, like, in a lot of more basic gamification, there might not be anything interesting that the person is learning or there's not any skill that they're trying to practice or get better at and i think that's that's when i get kind of suspicious and judgmental i'm like what what is how is this fun if the person isn't learning something here maybe they're learning to game your weird gamification system instead of actually like doing the task that you want them to do so

Starting point is 00:37:07 having skill having something that a person is learning over time that they're getting better at that they're like interested in getting better at uh and also um you're making me judge the games i play so hard right now oh Games for games sake are a different category, right? Well, my I have been playing a game on my fitness thing that now I'm judging

Starting point is 00:37:36 very badly. I like the idea of learning in games. It makes sense to me. I mean, when you think about Minecraft, that was all about learning. Yeah. It was all about learning the world and learning how the rules worked and even then learning more about how to make things in it that you wanted elsewhere. And as I think about some of the other even silly games I play, like Threes, which I think is 2048 and other places,

Starting point is 00:38:12 but there are times when I'm still learning the rules on this game that I have played for so long. Because it's like, okay, I think right now this is what's going to happen. And whether it does or doesn't. Yeah. Okay. I totally get the learning. Now, can they teach me useful things?

Starting point is 00:38:30 Yeah, totally. Well, so one of the original games with a purpose was this game called, I'm almost going to call it Duolingo, but I'm getting to that. It was called the ESP game, and it was a data collection game of two random people on the internet are shown the same picture, and they can't talk to each other, but they have to come up with the same words to describe that image. And if they match what the other person is saying, then that becomes a label for that image. So two people will see a picture of like sheep in a green field and so they'll type sheep green field sky clouds and some of them may type like or something yeah and another person will be like well i didn't type butts because i wasn't thinking that i was thinking

Starting point is 00:39:16 of the sheep and so the ones that match up yeah like idyllic they will those will become the labels for that image and that uh that had this game mechanic of like am i gonna figure out the words to describe this that another human will also come up with the same words um yeah if you're sitting there identifying the sheep species in latin that may not be what the other human does yeah you may not you may be right but you may not be winning points exactly so so you won't go with those labels you'll find the ones that are more common and shared um and this this game was by this guy luis van on and because it was like making image labeling fun through a game, it kicked off this whole series of other games with a purpose.

Starting point is 00:40:10 And, and then other people's kind of like, they didn't get the mechanics quite right in a way that like, I don't know. They weren't some things that came after. I just felt like weren't good games. Like the mechanics of the game didn't match whatever the, the purpose was trying to do.

Starting point is 00:40:26 Like, um, there's just can't throw points at people. No, you have to give them more than that. Yeah. At least a little bit more. I mean,

Starting point is 00:40:34 points, sometimes they work enough that, that people keep trying it. They're like, Oh, I do like to see my name on a leaderboard, but, um,

Starting point is 00:40:43 but not everyone is like that. And there really needs to be something deeper where the person by playing the game is actually contributing to the whatever underlying scientific or data cleanup purpose otherwise they may just be like racking up points but not actually helping you out it sounds like to properly design a game, you actually need to have some psychological understanding to know what motivates people. And also, if you just do a naive thing, like you're saying with points, you can end up with these holes, like you said,

Starting point is 00:41:14 where the game goes off in a different direction and people figure out ways to game the system and you don't get the data you want. Yeah, exactly. Like having the mechanics aligned with the underlying purpose is super important um but you asked about like can can i learn useful things from these games and um what louis van on is doing now probably other things but one of his main things is this app called Duolingo for learning new languages and

Starting point is 00:41:46 it has like it's not a straight up game but it has a lot of like elements of a game like ramping you up in a very gradual way and the idea of Duolingo in the first place was like there's a bunch of text on the internet we need to translate more of the internet wouldn't it be great if we had that and this was before like automated translation techniques were good enough to use so like we need humans to do the translation but maybe people aren't skilled in translating between english or you know obscure language one and obscure language two um or even english and like some other obscure language and maybe not obscure but like any pairs of languages and um and so this this idea of like maybe we can just teach people new languages and then they can start to help translate stuff

Starting point is 00:42:42 on the internet yeah i can totally see this working. Because for me, it would be probably English and Spanish or English and French. And you could give me an English phrase with an idiom in it, and I would have to go figure out how to say that in Spanish in a way that represented the idiom part of it, as well as maybe the words part of it. And that would force me to go learn more Spanish, which is something I always want to do. And it would help other people that if multiple people translated it similarly, then you can start saying, oh, this is probably a reasonable translation.

Starting point is 00:43:25 Yeah, exactly. And then by being in this process where you're learning a little bit of new skills and then applying them, you'll be able to translate more, more effectively, and you'll just kind of grow and grow and grow in what you know and what you're able to do. And even if you presented me with, these are five things other people said, which of this is right, you could do that and I would play and learn and not care so much about just points. It would be about fun. So now Duolingo is a free, sometimes ad-supported app that you can use to learn new languages. And I don't know how much the translating stuff on the internet plays into it anymore, but it's this accessible language learning tool that seems really great. Especially compared to like, pay $500 for Rosetta Stone or something. Yeah, we don't need to talk about that. I want to switch topics entirely

Starting point is 00:44:30 because you are part of this company that is weird and cool and I have trouble explaining it because I get lost in AR and furniture. And can you explain what GrokStyle is? Yes, totally. So GrokStyle is the company that I currently work for. We do visual search for furniture and home decor and sort of expanding to AI for retail in general. And what our core visual search technology does is allows you to take a picture of a piece of furniture, like some chair that you like at your friend's house, and identify what what it is. Or we can go beyond that to understand all of the products in designer showroom images and chair and this coffee table and this rug and these would all actually look nice together and you don't have to worry about not having that stylistic judgment yourself if you don't actually

Starting point is 00:45:54 have that and that seems hard that does seem hard so there's it's just it's just math and data and linear algebra. It's just math. Okay, so I go to a friend's house, I take a picture, and their, I don't know, 15th century throne that I have taken a picture of, it then tries to find a similar throne that can be purchased now at some major retailer. So it says, oh yeah, if you get this at Target, it's really similar. And so you have to have a huge database of existing furniture. You're not just like, I'm taking this picture and then I'm going out to the internet and searching. You have to already know a lot about furniture right yeah we have our own huge internal database of photos of furniture all like millions of products millions of scenes of like ways that people have used this product in the real world and we have learned this this like understanding of visual style.

Starting point is 00:47:07 Some way for anyone that takes a new picture of something, for us to project that into some style embedding and look up what's nearby, what products are similar to this thing. So if I take a picture of a mission-style couch, which is a very specific style you would be able to say oh yeah you might want a chair and this style of end table we're working on the the recommendations part like for now we have a mobile app where we could

Starting point is 00:47:37 you take a picture of your mission style couch and we'll find more of those. More mission style couches for different prices from different places. Yeah. And how do you identify mission style? How do you identify the style of what you're looking at? Is this part of finding terms, search terms? We are... Tags? The core of this is visual understanding.

Starting point is 00:48:06 So just from tons and tons of images of couches of different styles, we'll identify, like, these are the ones that look closest to this one. And then we can look at the associated metadata to see what the name of the nearby matches are or what styles might be like tagged on those already but it starts from like the visual path when we talked about sift and and how the eiffel tower isn't really a good candidate because it has holes and because it has repeats um chairs so in this case we're just doing deep learning on tons and tons of images and sift isn't involved in

Starting point is 00:48:53 like sift is a feature that a human would say I'm going to use sift in this pipeline and I've done some other computer vision stuff with faces where I was like we're're going to match faces by comparing SIFT features across faces. And I have to decide, like, I'm going to use SIFT, I'm going to look at these regions of the image, I've got to get all my faces lined up first. But in this deep learning era, we can say, here's a bunch of images of all these things, and I'll tell you how they're similar and how they're different.

Starting point is 00:49:26 And the computer can figure out what features and what internal representations are most useful, most discriminative for its purposes. Does it have multiple stages? Does it figure out it's a chair before it figures out what kind of chair? And figure out chair versus couch versus table? Our system does predict what category something is so yeah it'll say like i'm pretty sure this is a chair so then it will go look up chairs instead of like looking across the entire database of everything that we have because it would be more computationally optimal to say okay this is a chair now let's go into the chair subcategory and finish looking up is it a 1916 chair post-modern chair right another thing we can do

Starting point is 00:50:16 though is we can say you took a picture of this this wicker chair and we know it's a chair but if we start looking for tables that are nearby instead we might find like wicker like some other aspect that's stylistically similar but in a different category so our our like learned style embedding does kind of cluster um objects even if there are different categories but they're still visually stylistically similar it will kind of still put them together i should have asked you what your favorite machine learning language was. Keras? TensorFlow? Straight math?

Starting point is 00:50:58 We're using several different of these machine learning libraries and rolling our own in certain cases and using Python to strap it all together. All right. Wow. Ikea. Tell me about Ikea.

Starting point is 00:51:16 Okay. So Grokstyle is this visual search service provider and Ikea is one of our big public clients right now where they have an augmented reality app called IKEA Place. And within that app, you can access a visual search feature, and that is powered by us. And so I go to my friend's house. I see a chair I like.

Starting point is 00:51:41 I take a picture of it. I say, you know what I want? I want this chair in my house. So I go home and I go to Ikea app and then I say search and it says your chair is something that has weird letter O's and then it just plops it into my... Yeah yeah so you could be at your friend's house use the ikea place app to search there and say i'm going to figure out what this chair is and it'll be like oh this is the poeing chair this is something else that you might struggle to remember and type in later especially with all the accents and you can like favorite it in the app from there and then bring it home and then place it into your home and see oh i like

Starting point is 00:52:25 i like how this fits i'm gonna consider buying this even though their chair may not be an ikea chair right it's gonna find whatever's similar because that's what croxtile does yeah if you take a picture of that cool throne that they have find the like closest ikea throne like item how does it how does it deal with size I mean it's just one picture it's just there's no 3d how do I how do I know it isn't like a six foot by ten foot chair as opposed to a normal size chair is that is that the future no No. It will find... If you take a picture of a chair and there also happens to be miniature versions of that chair, we might still find the little mini one. And Amazon sells tiny little...

Starting point is 00:53:18 That's right. Dollhouse chair. Yeah. We can't tell if you're taking a picture of a dollhouse chair. This is not what I was thinking. This doesn't fit in my space at all but once you're in ar um those models are are all like true true scale true to life and um with the like current capabilities of ar the scale like moving your phone around in your space and looking at what's in your space, that does like estimate what size your space is and what the scale of everything is.

Starting point is 00:53:53 So that if you put like a three foot tall chair or something out there, it will actually be the appropriate size and you can measure things. So the AR part is okay. It's just that I can take a picture of a doll chair yeah or a giant chair and it will find the most similar and but it will then be normal size because right the ar will show me what size yeah and i do have a little like ikea chair on my desk i should do the demo of like take a picture of the dollhouse chair and then place the full-size one in my space. Okay, I should ask you more about IKEA, but we're almost out of time and I wanted one more thing. You started a Santa Cruz Pie Ladies meetup.

Starting point is 00:54:36 Yes. Why? So, Santa Cruz is like close to Silicon Valley, but not directly in it. Close and yet so far. Yeah. And I wanted to meet more developers, more technical people, especially women. I was like, they must be here in Santa Cruz somewhere, but I don't know where. I don't know where they are.

Starting point is 00:55:02 I need this community around me. So I started this PyLadies chapter in Santa Cruz to bring people together and it's worked out really well so far. How much does it cost to be the person who organizes all this? I mean, is this expensive? It is not terribly expensive. I work out of a co-working space called Next Space in Santa Cruz, and they have rooms, conference rooms, and they allow me to host PyLadies for free because it brings people from outside of Next Space into the space. So that would probably be the hugest cost otherwise, just getting space. Although you could probably get companies to sponsor it as well. And then on top of that,

Starting point is 00:55:48 there's like meetup fees for meetup.com. But I think I can get a grant from the Python Foundation to help pay for those. And they're not that much. It's like 40 or 80 bucks a year. And then there's food and snacks, but sort of been like figuring that out over the last few months of how much food we need.

Starting point is 00:56:07 And people like yourself bring snacks as well. So it's sort of community supported right now. And one of the reasons that I wanted to have this meetup in the first place was I went to some of the other meetups. There's some JavaScript meetup at a bar and there was a lot of dudes there. And I took my two-year-old daughter with me. So there was two of us women, but it was like I had to bring my own extra female that I had made. And so it is limited to women or people who are? People who identify as ladies, as pie ladies. I mean, it's open to anyone that would feel comfortable in that space. Although if you are a man, we request that you come as a guest of another person in attendance.

Starting point is 00:57:11 And do you spend a lot of time organizing it? I should probably spend a little bit more time finding more people to give talks and stuff, but not too much, no. So it's not that big of a cost so it's not that big of a cost it's not that big of an effort but you do get a fair amount out of it yeah what do you get out of it i mean i didn't know there was a vi game but yeah uh so uh like 10 or so people show up to the meetings and we have them every two weeks. And it alternates between whether it's like a project night and people come and work on projects together or we just like talk about all kinds of things together or a speaker night where someone presents. space with other women and other tech people in the area and seeing what other people are working

Starting point is 00:58:06 on and sharing ideas and just getting excited about things. It just brings like warm fuzzies to my heart. I enjoy it and I'm glad you started it because it is hard to find a good technical community and many of our meetups do tend to meet in bars and i'm unlikely to go to a bar to meet people just because i yeah it's not where i want to talk because i can't hear anything yeah it's hard to to get into the nitty-gritty technical details sometimes if it's dark and loud and you don't have a computer around and you don't get to like really know what other people are passionate about and what they're excited about and how that can sort of rub off on you and get you really excited about something but if you're in like a sort of more collaborative space or environment like i'd love to have longer pie ladies meet up sometime like little dev house style pilot Saturday morning yeah we could um

Starting point is 00:59:17 and i thought it was interesting that one of the presenters then went to go to a job interview and was asked a question that was basically from her presentation. Yeah. And it was funny because it wasn't, because it is every two weeks, or every four weeks there's a presenter. It's pretty easy to sign up on the presenter list, let me tell you. But it is good practice. Yeah. I mostly wanted to ask you about it because I want to encourage people who have this idea

Starting point is 00:59:50 that it doesn't have to be a lot of effort, and sometimes it doesn't work. I mean, there's a decent chance that it may, in five years, just be you and me looking at each other going, well, maybe this has run its course yeah which is also fine yeah but for the those five years or whatever that it exists like it can be all kinds of great opportunities i'm meeting new people people who have sent me to other meetups which were then way too crowded but yeah it's neat and there's there's two women there that run a python study group in felton

Starting point is 01:00:26 yeah so like they're they're on top of like we're just gonna do this thing for ourselves yeah so if you're out there thinking gosh i wish there were other people that i could talk to whether it's py PyLadies or JavaDev. The space is the hardest part, but if you can find a space, even if it's a coffee shop that has a back room, it might be worth it. It might be worth it to try it. And $40 or $80, yeah, that's a lot to try it,

Starting point is 01:01:02 but how much do you spend on conferences? This is like a year-long conference, one hour at a time. And those fees are only for meetup.com. Which is kind of the easiest way. Yeah, it has made it very easy. And people have found the Pilates Meetup through meetup.com. But if that was a cost or something, maybe there's more organic ways to advertise and just get people

Starting point is 01:01:25 together that you want to share your technical interests with yeah i found a writing group on next door of all places so it's all kinds of stuff yeah all right we have kept you for quite a while given um oh we have so much more that we could talk about. We do. We totally do, which just means that you can come back. And since you're local, come back. Yeah, that'll be easy. Chris, do you have any? I was wondering if you had advice for people who want to get into this whole space,

Starting point is 01:02:00 either if they're in college or hobbyists or people who are professionals who want to change to something i mean what's the right what's the right path to start learning about this whole space because it seems like a lot of different things which yeah which part of the space like the computer vision part the like building interactive systems that people can play with, part the game design part. I guess the computer vision part, yeah. Because it's a popular thing right now, there are a lot of tools coming out,

Starting point is 01:02:36 including tools for making your own models and using them. So I think TensorFlow is being ported to javascript and trying to make it as easy as possible for people that might be like in a web programming language to get access to these tools and then build things that are like running in other people's browsers so like they're the easiest possible thing to share i think personally like going that route where you are using like JavaScript type things where you can make something small and share it with your friends and your friends will be like, wow, that's so cool. That will just like give you a ton of encouragement to like keep going. And and then i think with javascript

Starting point is 01:03:27 you can like look around and see how other people are doing this because you can maybe get access to the code a little bit more easily um so i'm doing it like a social kind of way yeah i mean there's there's a lot of good social benefits to being able to share especially if you're just getting started and trying to figure it out cool what about getting started in games with a purpose uh this morning there were tweets from this human computation conference called H-Comp, which is happening in Zurich right now. And I think there's a keynote from the people doing Zooniverse, which is a platform for all these different citizen science projects. And some of them may not be game-flavored at all,

Starting point is 01:04:27 but they're probably game-flavored ones or ones that could be more engaging if they were more game-like and helping ramp people up and learn things. Zooniverse is the citizen science place that does Galaxy Zoo, where you can identify different galaxies or different features in pictures. that are out in wild where animals will walk by and a motion sensor will trigger and the picture the camera will take a picture and then citizen science people have to go and tag those to say

Starting point is 01:05:10 there's actually an animal here it's a it's a fox it's a bunny it's a deer it's a elephant and so i think there's there's lots of these that are out there um like ones that you can go find and participate in um and then i like zooniverse has a platform for making more of those. So if you have an interest in kind of working on the building of those tools, buildings of those projects, I'm sure like there's space for that as well, like whatever your passion is or even getting involved with the existing ones. Do you have any thoughts you'd like to leave us with? Last brief thought on augmented reality.

Starting point is 01:05:51 Visual search is going to be a big part of that, understanding what your environment has in it already so you can do more meaningful, more intelligent augmented reality. Our guest has been Kathleen Toot, computer vision expert and software engineer at CrocStyle. If you'd like to join us at PyLadies in Santa Cruz, there will be a link to the meetup in the show notes. And if you're not local to Santa Cruz, there are lots of PyLadies and lots of meetups. Check around. It's worth it. Thank you for being with us, Kathleen. Thank you for having me.

Starting point is 01:06:28 Thank you to Vicky Toot for introducing me to Kathleen and for producing her. Thank you to Christopher for producing and co-hosting this show. And thank you for listening. You can always contact us at show at embedded.fm or hit the contact link on embedded.fm. Thank you to Exploding Lemur for his help with questions this week. If you'd like to find out guests and ask early questions, support us on Patreon. Now a quote to leave you with from Douglas Engelbart. In 20 or 30 years, you'll be able to hold in your hand as much computing knowledge as exists now in the whole city or even the whole world.

Starting point is 01:07:06 I don't know when he said that, but I bet it's still true. Embedded is an independently produced radio show that focuses on the many aspects of engineering. It is a production of Logical Elegance, an embedded software consulting company in California. If there are advertisements in the show, we did not put them there and do not receive money from them. At this time, our sponsors are Logical Elegance and listeners like you.

CODACE Plant Stand

Embedded - 253: We’ll Pay Them in Fun

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

Embedded - 253: We’ll Pay Them in Fun

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.