Embedded - 253: We’ll Pay Them in Fun (Repeat)

Starting point is 00:00:00 Hello, this is Embedded. I am Alicia White, here with Christopher White, and this week, here also with Kathleen Toot. We're going to dive into computer vision, augmented reality games, and meetups. Hi, Kathleen. Thanks for joining us. Hello, I'm happy to be here. Could you tell us about yourself as though we met at a technical conference? Okay. My name is Kathleen Toot, as you just said. And I am currently a software engineer at a computer vision AI company called

Starting point is 00:00:38 GrokStyle. My background from this current job and past projects uh involves computer vision game design crowdsourcing human computer interaction all those things kind of wrapped up together and um what i really like doing in general is like taking interesting computer vision systems and building interactive like other things around them that people can actually use and play with. Yes, we have so much to talk about. Yes. First, we have lightning round. You've heard the show, so you know that the goal is fast and snappy. Okay.

Starting point is 00:01:16 Do you want to start? Okay, Christopher is shaking his head, which goes over well in podcast land. Minecraft or Pokemon Go? ever well in podcast land minecraft or pokemon go um i like both of them a lot favorite open cv function i don't like open cv as much favorite open cv function um i i one thing i do like about open cv i like just like reading in an image and then also displaying it. So the ones that read images and the ones that show stuff, those are probably my two favorites, just so I know what's going on and that I can get started and build something on top of that. Favorite computer vision library, then? library then um so when i was a grad student i did a lot of stuff with uh this this tool this

Starting point is 00:02:09 system called bundler which is a structure for motion pipeline so i'm pretty fond i love hate relationship with bundler um and now there's a there's a python version of it called OpenSFM that is run by this company called Mapillary. OpenSFM. Okay. Is it my turn or yours? It's my turn. Oh, go, go, go. Favorite VR game? I went to a like a

Starting point is 00:02:40 capstone demo Sammy Showcase at UC Santa Cruz a bunch of games that students have been working on this past year. And I played this like ghost cat VR game where you're a little ghost cat and you have another buddy ghost cat and you have to stay near each other because they're your source of illumination. And then you have to like jump around and get to other places. And so that ghost cat VR game.

Starting point is 00:03:07 That no one else can get to other places and so that ghost cat vr game that no one else can get no one else i mean they're trying to make like a research platform out of it so maybe you can play it soon i don't know that's like the only recent thing that i can think of tip ever you think everyone Um, stop writing bugs. Just stop it. Okay. So, uh, um, a long time ago I was like programming stuff with my husband, my boyfriend at the time. And, um, and like, I was just kind of being sloppy with, with what I was doing and I'd like write some code and then I'd run it and then I'd like read the error. I'm like, Oh, I was just kind of being sloppy with what I was doing. And I'd, like, write some code and then I'd run it. And then I'd, like, read the error and, like, oh, I spelled that thing wrong. And it was just, like, this really slow process.

Starting point is 00:03:54 And he was, like, just stop writing these bugs. I was, like, okay, I'm going to try. I'm going to be more mindful about this and just, like, go a little bit slower. And think, like, I'm a human. I can do this as best I can and, like, try to just not write the bugs in the first place. And then whatever errors do come up are, they're still there for me to figure out, but the really basic ones, I can kind of just try not to do that.

Starting point is 00:04:19 Yes, I know what you mean. That's when monkey coding is what it is for me when I do it, where I'm just like, oh, I'm just going to type at this until it works. Yeah. And I'm not going to sit there and think about how it should work and how I can get from where it doesn't work to where it does work. I'm just going to keep incrementing this variable until the timeout is the right length. Yeah, yeah, just like stumbling through it. And like, that works sometimes, but thinking about it actually is kind of better, like actually better in so many ways. Yeah.

Starting point is 00:04:53 Okay, so stop writing bugs. That's great advice. Computer vision. Yes. When people say computer vision, what do they mean? So I would say computer vision is the ability for a computer that's gotten some sensor, some picture of the real world. Maybe it's like a picture from a normal RGB camera. Maybe it's a depth sensor some more enhanced picture um the computer's ability to like make sense of that and understand what is going on in that scene whether it's recognizing the objects in the scene or the activity that's happening um or just like more information about what the scene really represents um a facial expression that someone has, or who the identity

Starting point is 00:05:45 of a person is, those are all like computer vision things, like the ability to understand things about the real world from a picture or a movie or something kind of like a picture, like a depth. I like the way you put it, because it is about the computer, not just acquiring the data, but being able to do something, you said understand, which computers don't usually do, but it's that level. It's an intelligence. Yeah. To make sense of it enough at some, whatever understanding level is possible, that then you can actually use that in some other system. You did graduate research.

Starting point is 00:06:26 And I know I'm not going to get the word right. Photogrammetry? Photogrammetry. Photogrammetry. I missed the R. Okay. What is that? So photogrammetry is the ability to get a bunch of images of one thing all from different angles and kind of come up with

Starting point is 00:06:47 the 3D structure of the item that all the cameras are looking at and also the pose of each of the cameras, how they relate to one another and to the object themselves. And so if I go and I take a picture of the Eiffel Tower and then I take another picture of the Eiffel Tower, you can build the Eiffel Tower in 3D from my photos? Yeah, yeah. But two isn't enough? close enough that they're seeing roughly the same view, but they're also like spread out enough that they're not exactly on top of each other and you can get some kind of 3D information from them being split apart. You could know where the pictures are, where the Eiffel Tower is, but if you want to go further and get like a fuller 3D model of the Eiffel Tower, you'd want many pictures

Starting point is 00:07:43 of it from many different angles, and that might be enough to fill in the actual structure of that object. Although, the Eiffel Tower has a lot of cross braces and things where you can see through it, and that will probably be a little bit challenging for the computer to make sense of. Okay, how about the washington monument um the washington monument that has not enough detail right it's yeah and it kind of looks the same from all four sides the tall one uh the canonical examples of this where like structure for motion photogrammetry sort of became a thing that

Starting point is 00:08:26 other people started really running with was uh this photo tour photo tourism project at uw where they took a bunch of photos from um flickr of popular tourist places like notre dame cathedral and trevi fountain in rome and used those photos. So those are places that there's enough texture and structure there, but it is kind of this continuous surface that you can't see through, like the Eiffel Tower. So it's better if you have something that has a lot of detail, but not see-through and not repeating detail. Right. This is a lot of caveats.

Starting point is 00:09:06 It definitely is, yeah. Okay. And then, so the way I think it happens, like my mental image, is you have picture one, and then you take picture two, and you try to map up all of the same points. And then you take picture three, and you map up all the same points on picture one and picture two. But then I'm kind of lost. I mean, I know you can use convolution to map up points that are the same, but how do you, what happens after that? Is that even right? That is totally the first step of getting two images or more images or pairs of images in a whole big collection of images and figuring out what all the interesting points in these images are and these like 10 images and it was in these pixel coordinates of these images over here and you just have a whole bunch of data of like those correspondences

Starting point is 00:10:11 and then you throw it into something called bundle adjustment and that will figure out the 3d positioning of where all those points should be in 3D space and where the cameras should be, like what pose they should have based on all these camera, pinhole math equations there. Okay, we're going to ask you about that too. Don't get comfortable with me skipping that. But even that first step, are you using the RGB images or are you trying to find vertices i know they're

Starting point is 00:10:48 what what kind of algorithms do you use to even find the points and then what does this bundle thing do so the algorithm to find the points initially um sift is a good one and i know i think your your typing robot uses these same Sift feature points to figure some stuff out. It does. It does. But when I did it, I just used OpenCV, and it magically worked. I have no idea what the algorithm was. That was part of when I was trying to figure out where the keys were. And I had a perfect image of a keyboard. And then I had my current camera image of the keyboard.

Starting point is 00:11:34 And it was SIFT and FLAN and homography. And I just typed them in and, wow, it just found them. And I did nothing. Even when I changed the lighting, it was pretty good. So what does it do? So to break it down a bit more, SIFT stands for Scale Invariant Feature Transform. Yeah, transform sounds good. a building and there's a windowsill and the part where like the outside of the windowsill comes together in an angle and it's on top of this like there's a brick facade or something so that the

Starting point is 00:12:32 the sill and the brick are different colors and the light is casting shadows in a certain way like that particular corner of that building might have a or will have a distinctive look. And the sift feature of that particular point would capture something about the colors there. But more importantly, the edges and like what angle they're at and how strong, how edgy they are, how cornery they are and um and the scale invariant part of sift means that if you have a picture of that windowsill up close and you have another one that's maybe far away and maybe it's like rotated a little bit that that particular piece of those two images will still look very similar like it will have a descriptive way that the computer can represent it that it can tell that they're the same point okay okay so now we found all of these point correspondence

Starting point is 00:13:32 points correspondence yeah i mean they start out as just like these are feature points these are points of interest these are little little corners or things that a computer can can say like i i know what that is i know where it is versus on a like a plain blank wall there's nothing special about like a pixel um in the middle of that space it could be anywhere um and then when you have multiple images like two images that both have sift points and you kind of figure out the correspondence between them that's when the correspondence part comes in and so i can i can sort of with two images, because that's kind of how my eyes work. It's 3D vision.

Starting point is 00:14:11 Right. And if my eyes were further apart, you know, if I had a really big head, I would be able to see 3D vision further away. But right now, after about 10 feet, everything's kind of flat. I know there's actual math that would tell me how far it is. But realistically, I'm pretty cross-eyed. So 10 feet is really about it for me. Don't play basketball with me.

Starting point is 00:14:35 And so when you have two photos taken apart, far apart, then you can get more depth. Yes. But my eyes work because they're always in the same place. They always have the same distance between them. It seems like a chicken and an egg problem that you can find these points and you can find the 3D-ness of it, but you also find where they are. How do you, which one's the chicken and which one's the egg and which one comes first um so you're totally right that like our our eyes where our brains have

Starting point is 00:15:15 calibrated the fact that these eyes that we have are always in the same relative position to one another and um and i i think like 3d reconstruction techniques from two images have existed for a while and they've started out with we need to calibrate these two images relative to each other first like they're going to be mounted on some piece of hardware and they're never going to change and if some intern bonks them like then they have to go recalibrate the whole thing. Yeah. Yeah, I remember doing, yeah. And they have these, like, calibration checkerboards that you can set up. And there's probably some OpenCV function for, like, look at this checkerboard and figure it out. Like, figure out what the camera is.

Starting point is 00:15:58 There totally is. Yeah. So getting from two cameras where you've calibrated them already, and also you have to calibrate like the internal like lens distortion and all of that of a camera i'm looking at that will help me figure out where the cameras are and you also need to figure out where the cameras are to figure out where the 3d points that you're looking at are and um and what this bundle adjustment technique is well i guess there's you had mentioned homography or alluded to it um homography is like an an initialization step of there's two cameras and they're looking at the same thing. And if that thing is like a planar surface, it's kind of understanding the relationship between those two cameras. Yes, in my typing robot, I have the keyboard, the perfect keyboard.

Starting point is 00:17:01 And then I have my scene of however I put the camera up today. And then the homography, I take a picture and it like maps the escape key onto the escape key and the space key onto the space key. And then it gives me the matrix that I can use to transform from my perfect keyboard world to my current image world. And so that matrix I can then just use to transfer coordinates between them. Right. So if you have all these pictures that tourists took of the Eiffel Tower, you can look at the pairs of cameras and look at the SIFT correspondence points that you found between them and kind of estimate a homography by like, what is that matrix that

Starting point is 00:17:53 says how this one camera moves to become the other camera? And it might not be perfect because the points that you're looking at in the world, there just may be stuff that you don't you don't have enough information yet you don't know what the internal camera parameters are for that particular camera but you can get some initial guess and then um what bundle adjustment does is takes all of these initial guesses of how all these cameras and points and tracks of points seen by multiple cameras fit together, and it kind of comes up with an optimization that solves for both of those things at the same time.

Starting point is 00:18:32 So it takes all of the correspondence points for each pair, and then it minimizes the error for all of them. Yeah. And so if you end up with a bogus pair, like on my keyboard if I was mapping A to Q, it would, if I took a bunch of pictures, it would eventually toss that one because nobody else agreed with it. Yeah, it might toss it or it might be like, this is, this is, I think this is right. And it might just be wrong. And then it skews everything. think this is right and it it might just be wrong excuse everything yeah um so in this in this

Starting point is 00:19:07 project that i worked on in grad school called photo city which was a game for having people take lots of photos of buildings and make 3d models um i saw a lot of like this 3d reconstruction stuff gone wrong where a person would take photos and the building the wall of the building would grow but then it would just like curve off into the ground or just like the model would just like totally flip out and and fall apart because the the like this bundle adjustment this effort to kind of figure out cleanly where everything goes would just get really confused or sometimes there would be like itsy bitsy teeny tiny upside down versions of of a model that were like really close because the computer was like this this makes sense to make like a tiny version of this

Starting point is 00:19:56 uh this building here it kind of looks the same as having like one that's really far away yeah i mean you get a discoloration in a building that has bricks and then you end up with the small discoloration of the bricks and it can't tell the difference because size and variant. Yeah. Computers, man. They mess up sometimes.

Starting point is 00:20:20 When you do the minimization problem of finding all the matrices, which gives you the 3D aspect, that's when you can start figuring out where the people are. Because you can backtrack. Once you're confident that these points are in this space, you can backtrack to where the camera person must have been. It's doing both at the same time and kind of going back and forth between optimizing where the points are and optimizing where the cameras or the people holding the cameras must have been. And you can say, I have a pretty good guess of where the 3D points out in the world are.

Starting point is 00:20:57 But if I wiggle the cameras around a little bit, then we'll come up with a better configuration that minimizes that error even more and um and that the error that we're trying to minimize is like do these points in the world do they project back onto the right pixel coordinate of the image or are they they off we're trying to sort of get everything to make sense um across all these different pictures and And in the end, this is a massive linear algebra problem. Yeah, pretty much. That's weird. I mean, it sounds like you put photos in,

Starting point is 00:21:36 you get locations and 3D out, and so it sounds so smart, but in the end, it's just like massive amounts of A plus B, X plus C, Y. Yeah, yeah. It's totally like magic that this is possible, but it's also totally not magic. It's just like just a bunch of math. It used to be whenever we were doing computer vision stuff or machine vision or whatever we were calling it, there was the requirement that things be lit very brightly. That went away.

Starting point is 00:22:12 Why did that go away? How did that go away? That was like the core thing with object identification and location. When did lighting stop mattering? Or does it still and I'm just using better tools um there can be a number of things involved um lighting still matters but sift is pretty good at matching things even when the lighting is a bit different um another big thing might be that the quality of cameras that we have is better now. Like the webcam that you have or the camera on your phone or the camera that's built into your

Starting point is 00:22:53 laptop, those can, they can work better in like lower lighting, crappy lighting. They will also just take clearer pictures. So I imagine that it was more critical in the past because having like the cameras just like couldn't see very well. And so you really had to make it easy for the cameras. And then a third aspect is that we have a bunch of data online taken by cameras. And, um, and so there's a lot more that we can do to say crappy cameras not very good cameras um um and we can like learn more from all of this data that's available so we can kind of compensate for the fact that the lighting might not be as good because we've seen enough examples of something with not very good lighting

Starting point is 00:23:46 that we can still understand what it's supposed to be. It's interesting that it's the camera technology that is one of the drivers. I hadn't really... Well, that's probably the application too because if you're doing like a manufacturing thing, you want everything to be exactly the same all the time. So, okay, we have good lighting and we know the lux and everything every time just know

Starting point is 00:24:09 uh the circumstances don't change whereas for a more general vision application you might be taking picture pictures anywhere and so you have to be able to adapt yeah if you don't have to be able to adapt then it's easier right yeah yeah like because there's like that's working well enough this technology of like taking a picture and adding to a model or taking a picture like recognizing some object in it um and those are getting into the hands of consumers uh you're totally right that that now people want to use that in a wider variety of applications so it's kind of pushing the limits of like we need to work on making this better we need to work on making it still figure out what it's doing even if it's some random person taking a picture in their like

Starting point is 00:24:50 dark living room and i think that has gone back to the manufacturing areas that even there you don't need the bright lights because we've learned to adjust to people taking crummy pictures. It's cheaper not to have to do that. You can use consumer-level stuff, yeah. Yeah. Okay, so at the end of taking a bunch of pictures, you get a bunch of points on your Notre Dame or your Eiffel Tower, although we agreed that was kind of iffy.

Starting point is 00:25:23 And then you get the location of the people which one is more important and what do you do with it then? I mean part of me is like oh this is a surveillance thing I should never take another photo in my life the locations of the camera is probably a lot there's more information there because you can understand where the people were who was taking these these pictures where they were

Starting point is 00:25:59 standing where people can can go um like the points themselves there might not be enough of them to really do something like like the points on Notre Dame or the points on the Eiffel Tower they it's kind of like okay now we have a crummy point cloud of this this place and we could just get our 3d model of that object another way um but then to know where all the humans were standing and um there's a project that was like a follow-up to this photo tourism project of looking where people walk in the world when they're taking pictures of things. And they like made a little map of people walking into the Pantheon and where most people took photos. And you could see that you'd like walk in and you kind of go to the right

Starting point is 00:26:43 and lots of people would take photos right when they got in of like the ceiling and other stuff and then they'd walk around and the amount of photos that they took kind of trailed off collectively because people just got it out of the way at the beginning and uh and i went to the pantheon in rome and i was like i've never been in this building but i i know what to expect where people are going to flow in this space and where everyone's going to be taking pictures. And sure enough, you go inside and you're routed around to the right in a counterclockwise position, and all these tourists are pointing at the ceiling in the beginning and not so much at the end. Museums could use this to figure out which artworks are getting the most attention.

Starting point is 00:27:24 I mean, I guess just the number of pictures taken of each artwork, but where people stand, there are a lot of times where how the crowd moves is an interesting problem. But that was not, that's not what I asked you there for. Now I totally want to talk about that. Building the 3D models. That was what you were doing. You were taking the point clouds and making 3D models, right? Yeah.

Starting point is 00:27:56 I mean, I was building this game, this crowdsourcing platform around this structure for motion system where people could be empowered to go take pictures wherever and make 3d models of wherever so in some sense it was about getting the 3d models but it was also about just like how do we get an entirely new kind of data that doesn't exist online already but but that data does exist online not really like we have a bunch of pictures of the front of all these fancy tourist buildings but we don't have enough around the side like the

Starting point is 00:28:36 like people aren't going to be like walking down some alley taking a bunch of pictures on their vacation unless they're they're playing photo city and they're gonna or they're they're they're doing some other like crowdsource street view thing like like mapillary which i mentioned before um but the data it's not there there's there's gaps in what is what people have taken just of their own accord and post it online. This is something that I have heard you speak on some, that the data we have for so many things is, I mean, biased, even visually, but biased in all kinds of ways with gaps. And you want to gamify filling in the gaps. Yes. That's cool weird strange cool how do you how do you convince humans that they should help their robot overlords get more data and understand uh just the world around them better

Starting point is 00:29:52 there can be better applications built for humans uh to use in in our daily lives um um making me examples of gamification of this sort of thing um oh there's like two two tangents here uh like one part is is about gamification and one part is about how like just applications built on data ai applications like there's data out there and then people try to use it and it works for some things, but it doesn't work for other things. And like it needs, there's needs to be more data

Starting point is 00:30:30 that directly relates to what a person is trying to do. And because there's some system of some like human trying to do something and an AI system isn't working for them or it works sometimes,

Starting point is 00:30:42 maybe that can turn into a fun game. Like what is the computer good at knowing? What is it not good at knowing? How can I stump the computer? Um, so an example of things that they may not like be called games, but they're kind of game like, um, a couple of years ago there was this, this how old robot, like age guessing thing that Microsoft put out where you uploaded a picture of your face and it found the face in the accurate response or it would have some really hilariously wrong response. Like, oh, this picture of Gandalf says he's like 99 years old. Like, ha ha ha. Or, or this picture of me, um, like says I'm, I'm way younger than I actually am. How flattering.

Starting point is 00:31:43 Or kind of funny things like that. People found ways to play with it and figure out all its limitations and what its capabilities were. And they kind of had this communication around it. Last week we talked to Katie Malone about AI and one of the things we talked about was fooling the AI and the Labrador puppies and the chicken image.

Starting point is 00:32:14 The fried chicken. Where the AI is confused as to which things are dogs. And there's a whole set of dogs or not. Like chihuahuas that look like blueberry muffins. I loved those. Although when I told the chihuahua owner that their dog was a cute blueberry muffin, they totally didn't get it.

Starting point is 00:32:32 Oh, man. Yeah, okay. So there's the fun aspect of making fun of the computer. And also trying to help it along and like oh you're i want to help teach you to to do better and like if we can kind of elevate what uh what computers are capable of then there might be areas where then we are suddenly more powerful more, because now we have these better trained tools at our disposal. Okay. So there's the aspect of wanting to train slash one up the AIs, and then there's straight up gamification.

Starting point is 00:33:17 Yeah. That's where you compete with other people to provide the AI with more information. Yes. So there's like a history of gamification, especially regarding data collection. There's a, there's a series of games or there's a genre called GWAPs or games with a purpose. GWAPs, really? That's how we're gonna pronounce that

Starting point is 00:33:46 yeah i thought it was g waps a g waps okay um games with a purpose okay and i i actually like like i like i've built games with a purpose, but I also am highly critical of games with a purpose and gamification. And when it's done shallowly and when it's like, oh, we'll just sprinkle points and leaderboards and badges on top of something to try to get people to do this task for us, like for free, we'll pay them in fun. And sometimes like it's not fun like the game wasn't designed very well it's it doesn't make sense to be a game it like there's there's many cases where like maybe you should just build some task on mechanical turk and and pay people fairly to to do that task instead of trying to go in this roundabout game way. Okay, so you're ambivalent about gamification.

Starting point is 00:34:52 And I totally understand that. What would make it be done well? I mean, what are the hallmarks of actual fun? So, okay, there's a book by raf koster called the theory of fun and uh one of the the ideas of that book is that learning is what makes games fun there's some picture in the book it has lots of pictures it's got like kittens rolling around and it says like the young of all species play and like kids and kittens and puppies are are playing but they're learning a ton as they're playing and one of the i think a thing that's that like almost basically every game has is you're learning the mechanics of that game

Starting point is 00:35:38 you're learning the rules you're learning the system and you start out like not knowing that game but that game will help you gain the skills that you need to do more interesting things in that game and this also fits into this theory of flow by this guy with a name that i can't pronounce it's like it um it it has a lot of c's and Z's and H's and stuff in it. And I can like look it up later. But this idea that. Wow, that is a lot of.

Starting point is 00:36:14 Mihaly. Chicks and Mihaly. Yeah. Flow, the psychology of optimal experience. Okay. Sorry, go ahead. Okay. Yeah. I'm glad you all tried to pronounce that.

Starting point is 00:36:30 I didn't do any good job. So, like, in a lot of more basic gamification, there might not be anything interesting that the person is learning or there's not any skill that they're trying to practice or get better at and i think that's that's when i get kind of suspicious and judgmental i'm like what what is how is this fun if the person isn't learning something here maybe they're learning to game your weird gamification system instead of actually doing the task that you want them to do. So having skill, having something that a person is learning over time that they're getting better at, that they're interested in getting better at,

Starting point is 00:37:14 and also... You're making me judge the games I play so hard right now. Games for game's sake are a different category, right? Well, I have been playing a game on my fitness thing that now I'm judging very badly. I like the idea of learning in games.

Starting point is 00:37:42 It makes sense to me. I mean, when you think about Minecraft, that was all about learning. Yeah. It was all about learning the world and learning how the rules worked and even then learning more about how to make things in it that you wanted elsewhere. And as I think about some of the other even silly games I play, like Threes, which I think is 2048 and other places, but there are times when I'm still learning the rules on this game that I have played for so long. Because it's like, okay, I think right now this is what's going to happen.

Starting point is 00:38:23 And whether it does or doesn't. Yeah. Okay. I totally get the learning. Now, can they teach me useful things? Yeah, totally. Well, so one of the original games with a purpose was this game called, I'm almost going to call it Duolingo, but I'm getting to that. It was called the ESP game, and it was a data collection game of two random people

Starting point is 00:38:46 on the internet are shown the same picture, and they can't talk to each other, but they have to come up with the same words to describe that image. And if they match what the other person is saying, then that becomes a label for that image. So two people will see a picture of like sheep in a green field and so they'll type sheep green field sky clouds and some of them may type like or something yeah and another person will be like why i didn't type butts because i wasn't thinking that i was thinking of the sheep and so the ones that match up yeah like idyllic they will those will become um the labels for that image. And that had this game mechanic of like, am I going to figure out the words to describe this, that another human will also come up with the same words. Yeah, if you're sitting there identifying the sheep species in Latin,

Starting point is 00:39:42 that may not be what the other human does yeah you may not you may be right but you may not be winning points exactly so so you won't go with those labels you'll find the ones that are more common and shared um and this this game was by this guy luis van on and because it was like making image labeling fun through a game, it kicked off this whole series of other games with a purpose. And, and then other people's kind of like, they didn't get the mechanics quite right in a way that like,

Starting point is 00:40:17 I don't know. They weren't some things that came after. I just felt like weren't good games. Like the mechanics of the game didn't match whatever the, the purpose was trying to do like um you just can't throw points at people no you have to give them more than that yeah at least a little bit more i mean points sometimes they work enough that that people keep trying it they're like oh i do like to see my name on a leaderboard but um but not everyone is like that and there really needs to be something

Starting point is 00:40:46 deeper where the person by playing the game is actually contributing to the whatever underlying scientific or data cleanup purpose otherwise they may just be like racking up points but not actually helping you out it sounds like to properly design a game you actually need to have some psychological understanding to know what motivates people and also if you just do a naive thing like you're saying with points you can end up with these holes like you said where the game goes off in a different direction and people figure out ways to game the system yeah and you don't get the data you want yeah exactly like having the mechanics aligned with the underlying purpose is super important um but you asked about like can can i learn useful things from these games and um what louis van on is doing now probably other

Starting point is 00:41:39 things but one of his main things is this app called Duolingo for learning new languages and it has like it's not a straight up game but it has a lot of like elements of a game like ramping you up in a very gradual way and the idea of Duolingo in the first place was like there's a bunch of text on the internet we need to translate more of the internet wouldn't it be great if we had that and this was before like automated translation techniques were good enough to use so like we need humans to do the translation but maybe people aren't skilled in translating between english or you know obscure language one and obscure language two um or even english and like some other obscure language and maybe not obscure but like any pairs of languages and um and so this this idea of like

Starting point is 00:42:36 maybe we can just teach people new languages and then they can start to help translate stuff on the internet yeah i can totally see this working. Because for me, it would be probably English and Spanish or English and French. And you could give me an English phrase with an idiom in it, and I would have to go figure out how to say that in Spanish in a way that represented the idiom part of it, as well as maybe the words part of it. And that would force me to go learn more Spanish, which is something I want to always want to do. And it would help other people that if, if multiple people translated it similarly,

Starting point is 00:43:21 then you, you can start saying, oh, this is probably a reasonable translation. Yeah, exactly. And then by being in this process where you're learning a little bit of new skills and then applying them, you'll be able to translate more, more effectively, and you'll just kind of grow and grow and grow in what you know and what you're able to do. And even if you presented me with, these are five things other people said, which if this is right, you could do that and I would play and learn and not care so much about just points. It would be about fun. Right.

Starting point is 00:43:57 And learning. And learning. All right. So now like Duolingo is like a free, sometimes ad-supported app that you can use to learn new languages. And I don't know how much the translating stuff on the internet plays into it anymore, but it's this accessible language learning tool that seems really great, especially compared to pay $500 for Rosetta Stone or something. Yeah, we don't need to talk about that.

Starting point is 00:44:27 I want to switch topics entirely because you are part of this company that is weird and cool and I have trouble explaining it because I get lost in AR and furniture. And can you explain what GrokStyle is? Yes, totally. So GrokStyle is the company that I currently work for. We do visual search for furniture and home decor and sort of expanding to AI for retail in general. And what our core visual search technology does is allows you to take a picture of a piece of furniture, like some chair that you like at your friend's house, and identify what that product is, either want to buy it. I want to know exactly what it is. Or we can go beyond that to understand all of the products in designer showroom images and know what things go together and then recommend either stylistically similar options or complementary options like you want to buy this sofa or maybe you could also buy this this chair and this coffee table and this rug and these would all actually look nice together

Starting point is 00:45:48 and you don't have to worry about not having that stylistic judgment yourself if you don't actually have that and that seems hard that does seem hard so there's it's just it's just math and data and linear algebra and it's just math okay so i go to a friend's house i take a picture and they're i don't know 15th century throne that i have taken a picture of it then tries to find a similar throne that can be purchased now at some major retailer so like it says oh yeah if you get this at target it's really similar and so you have to have a huge database of existing furniture you're not just like i'm taking this picture and then i'm going out to the internet and searching you have to already know a lot about furniture. Right. Yeah. We have our own huge internal database of photos of furniture,

Starting point is 00:46:51 all like millions of products, millions of scenes of like ways that people have used this product in the real world. And we have learned this, this like understanding of visual style. Some way for anyone that takes a new picture of something, for us to project that into some style embedding and look up what's nearby, what products are similar to this thing. If I take a picture of a mission-style couch, which is a very specific style,

Starting point is 00:47:27 you would be able to say, oh, yeah, you might want a chair and this style of end table. We're working on the recommendations part. For now, we have a mobile app where we could take a picture of your mission-style couch and we'll find more of those. More mission-style couches for different prices from different places. Yeah. And how do you identify mission style?

Starting point is 00:47:53 How do you identify the style of what you're looking at? Is this part of finding terms, search terms? We are... Tags? Like the core of this is visual understanding so just from tons and tons of images of couches of different styles we'll identify like these are the ones that look closest to this one and then we can look at the associated metadata to see what the name of this of the nearby matches are or what styles might be like tagged on those already but it starts from

Starting point is 00:48:27 like the visual path when we talked about sift and and how the eiffel tower isn't really good candidate because it has holes and because it has repeats um chairs so in case, we're just doing deep learning on tons and tons of images, and SIFT isn't involved. SIFT is a feature that a human would say, I'm going to use SIFT in this pipeline. And I've done some other computer vision stuff with faces where I was like, we're going to match faces by comparing SIFT features across faces.

Starting point is 00:49:09 And I have to decide, I'm going to use SIFT, I'm going to look at these regions of the image, I've got to get all my faces lined up first. But in this deep learning era, we can say, here's a bunch of images of all these things and I'll tell you how they're similar and how they're different. And the computer can figure out what features and what internal representations are most useful, most discriminative for its purposes. Does it have multiple stages? Does it figure out it's a chair before it figures out what kind of chair? And figure out chair versus couch versus table? Our system does predict what category something is so yeah it'll say like i'm pretty sure this is a chair so then it will go look up chairs instead of like looking across the entire database of everything

Starting point is 00:49:58 that we have because it would be more computationally optimal to say, okay, this is a chair. Now let's go into the chair subcategory and finish looking up, is it a 1916 chair or postmodern chair? Right. Another thing we can do though, is we can say, you took a picture of this wicker chair, and we know it's a chair, but if we start looking for tables that are nearby instead we might find like wicker like some other aspect that's stylistically similar but in a different category so our our like learned style embedding does kind of cluster um objects even if there are different categories but they're still visually stylistically similar it will kind of still put them together.

Starting point is 00:50:49 I should have asked you what your favorite machine learning language was. Keras? TensorFlow? Straight math? We're, you know, using several different of these machine learning libraries and rolling our own in certain cases and using Python to strap it all together. All right. Wow. Ikea. Tell me about Ikea.

Starting point is 00:51:16 Okay. is this visual search service provider. And IKEA is one of our big public clients right now where they have an augmented reality app called IKEA Place. And within that app, you can access a visual search feature, and that is powered by us. And so I go to my friend's house. I see a chair I like. I take a picture of it. I say, you know what I want?

Starting point is 00:51:46 I want this chair in my house. So I go home and I go to Ikea app and then I say search and it says your chair is something that has weird letter O's and then it just plops it into my... Yeah yeah so you could be at your friend's house use the ikea place app to search there and say i'm going to figure out what this chair is and i'll be like oh this is the poeing chair this is something else that you might struggle to remember and type in later especially with all the accents and you can like favorite it in the app from there and then bring it home and then place it into your home and see oh i like i like how this fits i'm gonna consider buying this even though their chair may not be an ikea chair right it's gonna find whatever's similar because that's what croxtile does yeah if you take a

Starting point is 00:52:35 picture of that cool throne that they have find the like closest ikea throne like item how does it how does it deal with size I mean it's just one picture it's just there's no 3d how do I how do I know it isn't like a six foot by ten foot chair as opposed to a normal size chair is that is that the future no like will find, if you take a picture of a chair and there also happens to be, like, miniature versions of that chair, we might still find, like, the little mini one. And, like, Amazon sells, like, tiny little. That's funny. Dollhouse chair. Yeah. Like, we can't tell if you're taking a picture of a dollhouse chair.

Starting point is 00:53:23 This is not what I was thinking. This doesn't fit in my space at all but once you're in ar um those models are are all like true true scale true to life and um with the like current capabilities of ar the scale like moving your phone around in your space and looking at what's in your space, that does like estimate what size your space is and what the scale of everything is. So that if you put like a three foot tall chair or something out there, it will actually be the appropriate size and you can measure things. So the AR part is okay. It's just that I can take a picture of a doll chair or a giant chair and it will find the most similar, but it will then be normal size because

Starting point is 00:54:13 the AR will show me what size it is. Yeah. And I do have a little like Ikea chair on my desk. I should do the demo of like, take a picture of the dollhouse chair and then place the full size one in my space. Okay, I should ask you more about Ikea, but we're almost out of time and I wanted one more thing. You started a Santa Cruz Pie Ladies meetup. Yes. Why? So, Santa Cruz is like close to Silicon Valley, but not directly in it.

Starting point is 00:54:48 Close and yet so far. Yeah. And I wanted to meet more developers, more technical people, especially women. I was like, they must be here in Santa Cruz somewhere, but I don't know where. I don't know where they are. I need this community around me. So I started this PyLadies chapter in Santa Cruz to bring people together. And it's worked out really well so far. How much does it cost to be the person who organizes all this?

Starting point is 00:55:16 I mean, is this expensive? It is not terribly expensive. I work out of a co-working space called next space in santa cruz and they they have rooms that like conference rooms and they allow me to host pyladies for free because it brings people from outside of next space into the space so that would probably be like the hugest cost otherwise just like getting space i need need a good space. You can probably get companies to sponsor it as well. And then on top of that, there's like meetup fees for meetup.com. But I think I can get a grant from the Python Foundation to help pay for those.

Starting point is 00:55:55 And they're not that much. It's like 40 or 80 bucks a year. And then there's food and snacks. But sort of been like figuring that out over the last few months of how much food we need. And people like yourself bring snacks as well. So it's sort of community supported right now. And one of the reasons that I wanted to have this meetup in the first place was I went to some of the other meetups. There's some JavaScript meetup at a bar and there was a lot of dudes there. And I took my two-year-old

Starting point is 00:56:34 daughter with me. So there was two of us women, but it was like I had to bring my own extra female that I had made. And so it is limited to women or people who are? People who identify as ladies, as pie ladies. I mean, it's open to anyone that would feel comfortable in that space. Although if you are a man we request that you come as a guest of another person in attendance and do you spend a lot of time organizing it i should probably spend a little bit more time like finding more people to to like give talks and stuff but not too much no so it's not that big of a cost. It's not that big of an effort, but you do get a fair amount out of it. Yeah. What do you get out of it? I mean,

Starting point is 00:57:34 I didn't know there was a VI game, but yeah. Uh, so, uh, like 10 or so people show up to the meetings and we have them every two weeks. And it alternates between whether it's like a project night and people come and work on projects together or we just like talk about all kinds of things together or a speaker night where someone presents. with other women and other tech people in the area and seeing what other people are working on and sharing ideas and just getting excited about things. It just brings like warm fuzzies to my heart. I enjoy it and I'm glad you started it because it is hard to find a good technical community and many of our meetups do tend to meet in bars and I'm unlikely to go to

Starting point is 00:58:29 a bar to meet people just because it's not where I want to talk because I can't hear anything. Yeah, it's hard to get into the nitty gritty technical details sometimes if it's dark and loud and you don't have a computer around and you don't get to like really know what other people are passionate about and what they're excited about and how that can sort of rub off on you and get you really excited about something but if you're in like a sort of more collaborative space or environment like i'd love to have longer pie ladies meet up sometime like little dev house style pilot Saturday morning yeah we could and i thought it was interesting that one of the presenters then went to go to a job interview and was asked a question that was basically from her presentation.

Starting point is 00:59:26 Yeah. And it was funny because it wasn't, because it is every two weeks or every four weeks there's a presenter. It's pretty easy to sign up on the presenter list, let me tell you. But it is good practice. Yeah. I mostly wanted to ask you about it because I want to encourage people who have this idea that it doesn't have to be a lot of effort. And sometimes it doesn't work. I mean, there's a decent chance that it may in five years just be you and me looking at each other going, well, maybe this has run its course.

Starting point is 01:00:05 Yeah. Which is also fine. Yeah. But for those five years or whatever that it exists, like, it can be all kinds of great opportunities. I'm meeting new people, people who have sent me to other meetups, which were then way too crowded. But, yeah, it's neat. And there's two women there that run a python study group in felton yeah so like they're they're on top of like we're just gonna do this thing for ourselves yeah so if you're out there thinking gosh i wish there were other people

Starting point is 01:00:38 that i could talk to whether it's PyLadies or JavaDev. The space is the hardest part, but if you can find a space, even if it's a coffee shop that has a back room, it might be worth it. It might be worth it to try it. And $40 or $80, yeah, that's a lot to try it, but how much do you spend on conferences? This is like a year-long conference, one hour at a time. And those fees are only for meetup.com. Which is kind of the easiest way. Yeah, it has made it very easy.

Starting point is 01:01:26 And people have found the Pilates Meetup through meetup.com. cost or something like maybe there's more organic ways to advertise and just get people together that you want to share your technical interests with yeah i found a writing group on next door of all places so it's all kinds of stuff yeah all right we have kept you for quite a while given um oh we have so much more that we could talk we do we totally do which just means that you can come back and since you're local come back that'll be easy uh just do you have any i was wondering if you had advice for people who want to get into this whole space either if they're in college or hobbyists or people who are professionals who want to change to something i mean what's the right what's the right path to start learning about this whole space because it seems like a lot of different things which yeah which part of the space like

Starting point is 01:02:17 the computer vision part the like building interactive systems that people can play with, part the game design part. I guess the computer vision part, yeah. There are, there's like, because it's a popular thing right now, there are a lot of tools coming out, including tools for making your own models and using them. So I think like TensorFlow is being ported to javascript and trying to make it as easy as possible for people that might be like in a web programming language to get access to these tools and then build things that are like running in other people's browsers so like

Starting point is 01:02:59 they're the easiest possible thing to share i think personally like going that route where you are using like JavaScript type things where you can make something small and share it with your friends and your friends will be like, wow, that's so cool. That will just like give you a ton of encouragement to like keep going. and then I think with JavaScript you can like look around and see how other people are doing this because you can maybe get access to the code a little bit more easily so I don't know doing it like a social kind of way yeah I mean there's there's a lot of good social benefits to being able to share especially if you're just getting started and trying to figure it out cool what about getting started in games

Starting point is 01:03:54 with a purpose uh this morning there were tweets from this human computation conference called H-Comp, which is happening in Zurich right now. And I think there's a keynote from the people doing Zooniverse, which is a platform for all these different citizen science projects and some of them may not be game flavored at all but like they're probably they're probably game flavored ones or ones that could be more engaging um if they were sort of more game like and helping ramp people up and learn things uh zooniverse is the citizen science place that does Galaxy Zoo, where you can identify different galaxies or different features in pictures. Yeah. And they have a bunch of other projects, too, like looking at pictures that camera traps have taken on traps. Like cameras that are out in the wild where animals will walk by and a motion sensor will trigger and the picture the

Starting point is 01:05:05 camera will take a picture and then citizen science people have to go and tag those to say there's actually an animal here it's a it's a fox it's a bunny it's a deer it's a elephant and so i think there's there's lots of these that are out there um like ones that you can go find and participate in um and then i like zooniverse as a platform for making more of those so if you have an interest in kind of working on the the building of those tools buildings of those projects um i'm sure like there's space for that as well like whatever your passion is or um or even getting involved with the existing ones. Do you have any thoughts you'd like to leave us with? Last brief thought on augmented reality.

Starting point is 01:05:51 Visual search is going to be a big part of that, understanding what your environment has in it already so you can do more meaningful, more intelligent augmented reality. Our guest has been Kathleen Toot, computer vision expert and software engineer at CrocStyle. If you'd like to join us at PyLadies in Santa Cruz, there will be a link to the meetup in the show notes. And if you're not local to Santa Cruz,

Starting point is 01:06:20 there are lots of PyLadies and lots of meetups. Check around. It's worth it. Thank you for being with us, Kathleen. Thank you for having me. Thank you to Vicky Toot for introducing me to Kathleen and for producing her. Thank you to Christopher for producing and co-hosting this show. And thank you for listening. You can always contact us at show at embedded.fm or hit the contact link on embedded.fm. Thank you to Exploding Lemur for his help with questions this week. If you'd like to find out guests and ask early questions, support us on Patreon. Now a quote to leave you with from Douglas Engelbart.

Starting point is 01:06:56 In 20 or 30 years, you'll be able to hold in your hand as much computing knowledge as exists now in the whole city or even the whole world. I don't know when he said that, but I bet it's still true. Embedded is an independently produced radio show that focuses on the many aspects of engineering. It is a production of Logical Elegance, an embedded software consulting company in California. If there are advertisements in the show, we did not put them there and do not receive money from L'Occitane.

Your Ad Here

Embedded - 253: We’ll Pay Them in Fun (Repeat)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.