Embedded - 253: We’ll Pay Them in Fun (Repeat)
Episode Date: January 1, 2021We spoke with Kathleen Tuite (@kaflurbaleen) about augmented reality, computer vision, games with a purpose, and meetups. Kathleen’s personal site (filled with many interesting projects we didn’t ...talk about) is SuperFireTruck.com. Her graduate work was in using photogrammetry to build models. Kathleen works for GrokStyle, a company that lets you find furniture you like based on what you see. GrokStyle is used in the Augmented Reality try-it-at-home IKEA Place app. Theory of Fun for Game Design by Raph Koster Flow: The Psychology of Optimal Experience by Mihaly Csikszentmihalyi Language translating/learning app and online game is Duolingo TensorFlow in Javascript HCOMP 2018: Human Computer Conference with Keynote by Zooniverse’s Lucy Fortson (no video for that yet but we hope)
Transcript
Discussion (0)
Hello, this is Embedded.
I am Alicia White, here with Christopher White,
and this week, here also with Kathleen Toot.
We're going to dive into computer vision, augmented reality games, and meetups.
Hi, Kathleen. Thanks for joining us.
Hello, I'm happy to be here. Could you tell us
about yourself as though we met at a technical conference? Okay. My name is Kathleen Toot,
as you just said. And I am currently a software engineer at a computer vision AI company called
GrokStyle. My background from this current job and past projects uh involves computer vision game design
crowdsourcing human computer interaction all those things kind of wrapped up together and um what i
really like doing in general is like taking interesting computer vision systems and building
interactive like other things around them that people can actually use and play with. Yes, we have so much to talk about.
Yes.
First, we have lightning round.
You've heard the show, so you know that the goal is fast and snappy.
Okay.
Do you want to start?
Okay, Christopher is shaking his head, which goes over well in podcast land.
Minecraft or Pokemon Go? ever well in podcast land minecraft or pokemon go um i like both of them a lot favorite open cv
function i don't like open cv as much favorite open cv function um i i one thing i do like about
open cv i like just like reading in an image and then also displaying it.
So the ones that read images and the ones that show stuff, those are probably my two favorites,
just so I know what's going on and that I can get started and build something on top of that.
Favorite computer vision library, then? library then um so when i was a grad student i did a lot of stuff with uh this this tool this
system called bundler which is a structure for motion pipeline so i'm pretty fond i love hate
relationship with bundler um and now there's a there's a python version of it called OpenSFM that is run by this company called Mapillary.
OpenSFM. Okay.
Is it my turn or yours?
It's my turn. Oh, go, go, go.
Favorite VR game?
I went to a
like a
capstone demo
Sammy Showcase at UC Santa Cruz
a bunch of games that students have been working on this past year.
And I played this like ghost cat VR game where you're a little ghost cat
and you have another buddy ghost cat and you have to stay near each other
because they're your source of illumination.
And then you have to like jump around and get to other places.
And so that ghost cat VR game.
That no one else can get to other places and so that ghost cat vr game that no one else can get no one else i mean they're trying to make like a research platform out of it so maybe you can play it soon
i don't know that's like the only recent thing that i can think of tip ever you think everyone Um, stop writing bugs.
Just stop it.
Okay.
So, uh, um, a long time ago I was like programming stuff with my husband, my boyfriend at the time.
And, um, and like, I was just kind of being sloppy with, with what I was doing and I'd like write some code and then I'd run it and then I'd like read the error. I'm like, Oh, I was just kind of being sloppy with what I was doing. And I'd, like, write some code and then I'd run it.
And then I'd, like, read the error and, like, oh, I spelled that thing wrong.
And it was just, like, this really slow process.
And he was, like, just stop writing these bugs.
I was, like, okay, I'm going to try.
I'm going to be more mindful about this and just, like, go a little bit slower.
And think, like, I'm a human.
I can do this as best I can and, like, try to just not write the bugs in the first place.
And then whatever errors do come up are,
they're still there for me to figure out,
but the really basic ones, I can kind of just try not to do that.
Yes, I know what you mean.
That's when monkey coding is what it is for me when I do it,
where I'm just like, oh, I'm just going to type at this until it works.
Yeah.
And I'm not going to sit there and think about how it should work and how I can get from where it doesn't work to where it does work.
I'm just going to keep incrementing this variable until the timeout is the right length.
Yeah, yeah, just like stumbling through it.
And like, that works sometimes, but thinking about it actually is kind of better, like actually better in so many ways. Yeah.
Okay, so stop writing bugs. That's great advice.
Computer vision. Yes. When people say computer vision, what do they mean?
So I would say computer vision is the ability for a computer that's gotten some sensor, some picture of the real world.
Maybe it's like a picture from a normal RGB camera.
Maybe it's a depth sensor some more enhanced picture um the computer's ability to
like make sense of that and understand what is going on in that scene whether it's recognizing
the objects in the scene or the activity that's happening um or just like more information about
what the scene really represents um a facial expression that someone has, or who the identity
of a person is, those are all like computer vision things, like the ability to understand
things about the real world from a picture or a movie or something kind of like a picture,
like a depth.
I like the way you put it, because it is about the computer, not just acquiring the data,
but being able to do something, you said understand, which
computers don't usually do, but it's that level. It's an intelligence.
Yeah. To make sense of it enough at some, whatever understanding level is possible,
that then you can actually use that in some other system. You did graduate research.
And I know I'm not going to get the word right.
Photogrammetry?
Photogrammetry.
Photogrammetry.
I missed the R.
Okay.
What is that?
So photogrammetry is the ability to get a bunch of images of one thing all from different angles and kind of come up with
the 3D structure of the item that all the cameras are looking at and also the pose of each of the
cameras, how they relate to one another and to the object themselves. And so if I go and I take
a picture of the Eiffel Tower and then I take another picture of the Eiffel Tower, you can build the Eiffel Tower in 3D from my photos?
Yeah, yeah.
But two isn't enough? close enough that they're seeing roughly the same view, but they're also like spread out enough that
they're not exactly on top of each other and you can get some kind of 3D information from them
being split apart. You could know where the pictures are, where the Eiffel Tower is, but if
you want to go further and get like a fuller 3D model of the Eiffel Tower, you'd want many pictures
of it from many different angles,
and that might be enough to fill in the actual structure of that object. Although,
the Eiffel Tower has a lot of cross braces and things where you can see through it, and
that will probably be a little bit challenging for the computer to make sense of.
Okay, how about the
washington monument um the washington monument that has not enough detail right it's yeah and
it kind of looks the same from all four sides the tall one uh the canonical examples of this
where like structure for motion photogrammetry sort of became a thing that
other people started really running with was uh this photo tour photo tourism project at uw where
they took a bunch of photos from um flickr of popular tourist places like notre dame cathedral
and trevi fountain in rome and used those photos. So those are places that there's enough texture and structure there,
but it is kind of this continuous surface that you can't see through, like the Eiffel Tower.
So it's better if you have something that has a lot of detail,
but not see-through and not repeating detail.
Right.
This is a lot of caveats.
It definitely is, yeah.
Okay.
And then, so the way I think it happens, like my mental image, is you have picture one, and then you take picture two, and you try to map up all of the same points.
And then you take picture three, and you map up all the same points
on picture one and picture two. But then I'm kind of lost. I mean, I know you can use convolution
to map up points that are the same, but how do you, what happens after that? Is that even right?
That is totally the first step of getting two images or more images or pairs of images in a whole big collection of images and figuring out what all the interesting points in these images are and these like 10 images and it was in these pixel coordinates of these
images over here and you just have a whole bunch of data of like those correspondences
and then you throw it into something called bundle adjustment and that will figure out
the 3d positioning of where all those points should be in 3D space and where the cameras should be,
like what pose they should have based on all these camera,
pinhole math equations there.
Okay, we're going to ask you about that too.
Don't get comfortable with me skipping that.
But even that first step, are you using the RGB images
or are you trying to find vertices i know they're
what what kind of algorithms do you use to even find the points and then what does this bundle
thing do so the algorithm to find the points initially um sift is a good one and i know
i think your your typing robot uses these same Sift feature points to figure some stuff out.
It does. It does.
But when I did it, I just used OpenCV, and it magically worked.
I have no idea what the algorithm was. That was part of when I was trying to figure out where the keys were.
And I had a perfect image of a keyboard.
And then I had my current camera image of the keyboard.
And it was SIFT and FLAN and homography.
And I just typed them in and, wow, it just found them.
And I did nothing.
Even when I changed the lighting, it was pretty good.
So what does it do?
So to break it down a bit more, SIFT stands for Scale Invariant Feature Transform.
Yeah, transform sounds good. a building and there's a windowsill and the part where like the outside of the windowsill comes
together in an angle and it's on top of this like there's a brick facade or something so that the
the sill and the brick are different colors and the light is casting shadows in a certain way
like that particular corner of that building might have a or will have a distinctive look.
And the sift feature of that particular point would capture something about the colors there.
But more importantly, the edges and like what angle they're at and how strong, how edgy they are, how cornery they are and um and the scale invariant part of sift means that if you have a picture of
that windowsill up close and you have another one that's maybe far away and maybe it's like
rotated a little bit that that particular piece of those two images will still look
very similar like it will have a descriptive way that the computer can represent it that it can
tell that they're the same point okay okay so now we found all of these point correspondence
points correspondence yeah i mean they start out as just like these are feature points these are
points of interest these are little little corners or things that a computer can can say like i i
know what that is i know where it is versus on a like a plain blank wall
there's nothing special about like a pixel um in the middle of that space it could be anywhere
um and then when you have multiple images like two images that both have sift points and you
kind of figure out the correspondence between them that's when the correspondence part
comes in and so i can i can sort of with two images, because that's kind of how my eyes work.
It's 3D vision.
Right.
And if my eyes were further apart, you know, if I had a really big head, I would be able
to see 3D vision further away.
But right now, after about 10 feet, everything's kind of flat.
I know there's actual math that would tell me how far it is.
But realistically, I'm pretty cross-eyed.
So 10 feet is really about it for me.
Don't play basketball with me.
And so when you have two photos taken apart, far apart, then you can get more depth.
Yes.
But my eyes work because they're always in the same place.
They always have the same distance between them.
It seems like a chicken and an egg problem that you can find these points
and you can find the 3D-ness of it, but you also find where they are.
How do you, which one's the chicken and which one's the egg and
which one comes first um so you're totally right that like our our eyes where our brains have
calibrated the fact that these eyes that we have are always in the same relative position to one
another and um and i i think like 3d reconstruction techniques from two images have existed for a
while and they've started out with we need to calibrate these two images relative to each other
first like they're going to be mounted on some piece of hardware and they're never going to
change and if some intern bonks them like then they have to go recalibrate the whole thing. Yeah. Yeah, I remember doing, yeah.
And they have these, like, calibration checkerboards that you can set up.
And there's probably some OpenCV function for, like, look at this checkerboard and figure it out.
Like, figure out what the camera is.
There totally is.
Yeah.
So getting from two cameras where you've calibrated them already, and also you have to calibrate like the internal like lens distortion and all of that of a camera i'm looking at that will help me figure out where the cameras are and you also need to figure out where the cameras are to figure out where the 3d
points that you're looking at are and um and what this bundle adjustment technique is well i guess
there's you had mentioned homography or alluded to it um homography is like an an initialization
step of there's two cameras and they're looking at the same thing.
And if that thing is like a planar surface, it's kind of understanding the relationship between those two cameras.
Yes, in my typing robot, I have the keyboard, the perfect keyboard.
And then I have my scene of however I put the camera up today.
And then the homography, I take a picture and it like maps the escape key onto the escape key and
the space key onto the space key. And then it gives me the matrix that I can use to transform from my perfect keyboard world to my current image world.
And so that matrix I can then just use to transfer coordinates between them.
Right.
So if you have all these pictures that tourists took of the Eiffel Tower,
you can look at the pairs of cameras and look at the SIFT correspondence points
that you found between them and kind of estimate a homography by like, what is that matrix that
says how this one camera moves to become the other camera? And it might not be perfect because the
points that you're looking at in the world, there just may be stuff that you don't you don't
have enough information yet you don't know what the internal camera parameters are for that
particular camera but you can get some initial guess and then um what bundle adjustment does
is takes all of these initial guesses of how all these cameras and points and tracks of points
seen by multiple cameras fit together,
and it kind of comes up with an optimization
that solves for both of those things at the same time.
So it takes all of the correspondence points for each pair,
and then it minimizes the error for all of them.
Yeah.
And so if you end up with a bogus pair,
like on my keyboard if I was mapping A to Q, it would, if I took a bunch of pictures, it would eventually toss that one because nobody else agreed with it.
Yeah, it might toss it or it might be like, this is, this is, I think this is right.
And it might just be wrong.
And then it skews everything. think this is right and it it might just be wrong excuse everything yeah um so in this in this
project that i worked on in grad school called photo city which was a game for having people
take lots of photos of buildings and make 3d models um i saw a lot of like this 3d reconstruction
stuff gone wrong where a person would take photos and the building the wall of the building would grow but
then it would just like curve off into the ground or just like the model would just like totally
flip out and and fall apart because the the like this bundle adjustment this effort to kind of
figure out cleanly where everything goes would just get really confused or sometimes
there would be like itsy bitsy teeny tiny upside down versions of of a model that were like really
close because the computer was like this this makes sense to make like a tiny version of this
uh this building here it kind of looks the same as having like one that's really far away
yeah i mean you get a discoloration in a building that has bricks
and then you end up with the small discoloration
of the bricks and it can't
tell the difference because size and variant.
Yeah.
Computers, man.
They mess up sometimes.
When you do the minimization problem
of finding all the matrices, which
gives you the 3D aspect, that's when you can start figuring out where the people are.
Because you can backtrack.
Once you're confident that these points are in this space, you can backtrack to where the camera person must have been.
It's doing both at the same time and kind of going back and forth between optimizing where
the points are and optimizing where the cameras or the people holding the cameras must have been.
And you can say, I have a pretty good guess of where the 3D points out in the world are.
But if I wiggle the cameras around a little bit, then we'll come up with a better configuration
that minimizes that error even more
and um and that the error that we're trying to minimize is like do these points in the world do
they project back onto the right pixel coordinate of the image or are they they off we're trying to
sort of get everything to make sense um across all these different pictures and And in the end, this is a massive linear algebra problem.
Yeah, pretty much.
That's weird.
I mean, it sounds like you put photos in,
you get locations and 3D out,
and so it sounds so smart,
but in the end, it's just like massive amounts
of A plus B, X plus C, Y.
Yeah, yeah. It's totally like magic that this is possible, but it's also totally not magic.
It's just like just a bunch of math.
It used to be whenever we were doing computer vision stuff or machine vision or whatever we were calling it, there was the requirement that things be lit very brightly.
That went away.
Why did that go away?
How did that go away?
That was like the core thing with object identification and location.
When did lighting stop mattering?
Or does it still and I'm just using better tools um there can be a
number of things involved um lighting still matters but sift is pretty good at matching
things even when the lighting is a bit different um another big thing might be that the quality of cameras that we have is better now.
Like the webcam that you have or the camera on your phone or the camera that's built into your
laptop, those can, they can work better in like lower lighting, crappy lighting. They will also
just take clearer pictures. So I imagine that it was more critical in the past
because having like the cameras just like couldn't see very well. And so you really had to
make it easy for the cameras. And then a third aspect is that we have a bunch of data online
taken by cameras. And, um, and so there's a lot more that we can do to say crappy cameras not very good
cameras um um and we can like learn more from all of this data that's available so we can kind of
compensate for the fact that the lighting might not be as good because we've seen enough examples
of something with not very good lighting
that we can still understand what it's supposed to be.
It's interesting that it's the camera technology
that is one of the drivers.
I hadn't really...
Well, that's probably the application too
because if you're doing like a manufacturing thing,
you want everything to be exactly the same all the time.
So, okay, we have good lighting and we know the lux and everything every time just know
uh the circumstances don't change whereas for a more general vision application you might be
taking picture pictures anywhere and so you have to be able to adapt yeah if you don't have to be
able to adapt then it's easier right yeah yeah like because there's like that's working well
enough this technology of like taking a picture and adding to a model or taking a picture like
recognizing some object in it um and those are getting into the hands of consumers uh you're
totally right that that now people want to use that in a wider variety of applications so it's
kind of pushing the limits of like we need to work on making this better we need to work on making it still
figure out what it's doing even if it's some random person taking a picture in their like
dark living room and i think that has gone back to the manufacturing areas that even there you
don't need the bright lights because we've learned to adjust to people taking crummy pictures. It's cheaper not to have to do that.
You can use consumer-level stuff, yeah.
Yeah.
Okay, so at the end of taking a bunch of pictures,
you get a bunch of points on your Notre Dame
or your Eiffel Tower,
although we agreed that was kind of iffy.
And then you get the location of the people
which one is more important
and what do you do with it then?
I mean part of me is like
oh this is a surveillance thing
I should never take another photo in my life
the locations of the camera is probably a lot there's more information there because you
can understand where the people were who was taking these these pictures where they were
standing where people can can go um like the points themselves there might not be enough of them to really
do something like like the points on Notre Dame or the points on the Eiffel Tower they it's kind
of like okay now we have a crummy point cloud of this this place and we could just get our 3d model
of that object another way um but then to know where all the humans were standing and um there's
a project that was like a follow-up to
this photo tourism project of looking where people walk in the world when they're taking
pictures of things. And they like made a little map of people walking into the Pantheon and where
most people took photos. And you could see that you'd like walk in and you kind of go to the right
and lots of people would take photos right when they got in of like the ceiling and other stuff and then
they'd walk around and the amount of photos that they took kind of trailed off collectively because
people just got it out of the way at the beginning and uh and i went to the pantheon in rome and i
was like i've never been in this building but i i know what to expect where people are going to flow
in this space and where everyone's going to be taking pictures.
And sure enough, you go inside and you're routed around to the right in a counterclockwise position,
and all these tourists are pointing at the ceiling in the beginning and not so much at the end.
Museums could use this to figure out which artworks are getting the most attention.
I mean, I guess just
the number of pictures taken of each artwork, but where people stand, there are a lot of times where
how the crowd moves is an interesting problem. But that was not, that's not what I asked you
there for. Now I totally want to talk about that.
Building the 3D models.
That was what you were doing.
You were taking the point clouds and making 3D models, right?
Yeah.
I mean, I was building this game,
this crowdsourcing platform
around this structure for motion system
where people could be empowered to
go take pictures wherever and make 3d models of wherever so in some sense it was about getting
the 3d models but it was also about just like how do we get an entirely new kind of data that
doesn't exist online already but but that data does exist online not really like we have a bunch of pictures of
the front of all these fancy tourist buildings but we don't have enough around the side like the
like people aren't going to be like walking down some alley taking a bunch of pictures on their
vacation unless they're they're playing photo city and they're gonna or they're they're they're doing some other like crowdsource street view thing
like like mapillary which i mentioned before um but the data it's not there there's there's gaps
in what is what people have taken just of their own accord and post it online. This is something that I have heard you speak on some, that the data we have for so many
things is, I mean, biased, even visually, but biased in all kinds of ways with gaps.
And you want to gamify filling in the gaps.
Yes.
That's cool weird strange cool how do you how do you convince humans that they should help their robot overlords get more data and understand uh just the world around them better
there can be better applications built for humans uh to use in in our daily lives um
um making me examples of gamification of this sort of thing um oh there's like two
two tangents here uh like one part is is about gamification and one part is about how like
just applications built on data ai applications like there's data out there and then people try to use it
and it works for some things,
but it doesn't work for other things.
And like it needs,
there's needs to be more data
that directly relates
to what a person is trying to do.
And because there's some system
of some like human
trying to do something
and an AI system
isn't working for them
or it works sometimes,
maybe that can turn into a fun game.
Like what is the computer good at knowing? What is it not good at knowing? How can I stump the
computer? Um, so an example of things that they may not like be called games, but they're kind
of game like, um, a couple of years ago there was this, this how old robot, like age guessing thing that Microsoft put out where you uploaded a picture of your face and it found the face in the accurate response or it would have some really hilariously wrong response.
Like, oh, this picture of Gandalf says he's like 99 years old.
Like, ha ha ha.
Or, or this picture of me, um, like says I'm, I'm way younger than I actually am.
How flattering.
Or kind of funny things like that.
People found ways to play with it and figure out all its limitations and what its capabilities were.
And they kind of had this communication around it.
Last week we talked to Katie Malone about AI and one of the
things we talked about was fooling
the AI and the
Labrador puppies and
the chicken image.
The fried chicken.
Where the AI is confused
as to which things are dogs. And there's a whole
set of dogs or not.
Like chihuahuas that look
like blueberry muffins.
I loved those.
Although when I told the chihuahua owner that their dog was a cute blueberry muffin, they totally didn't get it.
Oh, man.
Yeah, okay.
So there's the fun aspect of making fun of the computer.
And also trying to help it along and like oh you're i want to help teach you
to to do better and like if we can kind of elevate what uh what computers are capable of then there
might be areas where then we are suddenly more powerful more, because now we have these better trained tools at our disposal.
Okay. So there's the aspect of wanting to train slash one up the AIs,
and then there's straight up gamification.
Yeah.
That's where you compete with other people to provide the AI with more information.
Yes.
So there's like a history of gamification,
especially regarding data collection.
There's a, there's a series of games or there's a genre called GWAPs or games with a
purpose.
GWAPs, really? That's how we're gonna pronounce that
yeah i thought it was g waps a g waps okay um games with a purpose okay and
i i actually like like i like i've built games with a purpose, but I also am highly critical of games with a purpose and gamification.
And when it's done shallowly and when it's like, oh, we'll just sprinkle points and leaderboards and badges on top of something to try to get people to do this task for us, like for free, we'll pay them in fun.
And sometimes like it's not fun like the game
wasn't designed very well it's it doesn't make sense to be a game it like there's there's many
cases where like maybe you should just build some task on mechanical turk and and pay people
fairly to to do that task instead of trying to go in this roundabout game way.
Okay, so you're ambivalent about gamification.
And I totally understand that.
What would make it be done well?
I mean, what are the hallmarks of actual fun?
So, okay, there's a book by raf koster called the theory of fun and uh one of the the
ideas of that book is that learning is what makes games fun there's some picture in the book it has
lots of pictures it's got like kittens rolling around and it says like the young of all species play and like kids and kittens and
puppies are are playing but they're learning a ton as they're playing and one of the i think a
thing that's that like almost basically every game has is you're learning the mechanics of that game
you're learning the rules you're learning the system and you start out like not knowing that
game but that game will help you
gain the skills that you need to do more interesting things in that game and this also
fits into this theory of flow by this guy with a name that i can't pronounce it's like
it um it it has a lot of c's and Z's and H's and stuff in it.
And I can like look it up later.
But this idea that.
Wow, that is a lot of.
Mihaly.
Chicks and Mihaly.
Yeah.
Flow, the psychology of optimal experience.
Okay.
Sorry, go ahead.
Okay.
Yeah. I'm glad you all tried to pronounce that.
I didn't do any good job.
So, like, in a lot of more basic gamification, there might not be anything interesting that the person is learning or there's not any skill that they're trying to practice or get better at and i think that's that's when i get kind of suspicious
and judgmental i'm like what what is how is this fun if the person isn't learning something here
maybe they're learning to game your weird gamification system instead of actually doing the task that you want them to do.
So having
skill, having something that a person is
learning over time that they're getting better at,
that they're interested in getting better at,
and also...
You're making me
judge the games I play
so hard right now.
Games for game's sake
are a different category, right?
Well, I have been playing a game on my fitness thing that now I'm judging very badly.
I like the idea of learning in games.
It makes sense to me.
I mean, when you think about Minecraft, that was all about learning.
Yeah.
It was all about learning the world and learning how the rules worked and even then learning more about how to make things in it that you wanted elsewhere. And as I think about some of the other even silly games I play,
like Threes, which I think is 2048 and other places,
but there are times when I'm still learning the rules
on this game that I have played for so long.
Because it's like, okay, I think right now this is what's going to happen.
And whether it does or doesn't.
Yeah.
Okay.
I totally get the learning.
Now, can they teach me useful things?
Yeah, totally.
Well, so one of the original games with a purpose was this game called, I'm almost going to call it Duolingo, but I'm getting to that.
It was called the ESP game, and it was a data collection game of two random people
on the internet are shown the same picture, and they can't talk to each other, but they have
to come up with the same words to describe that image. And if they match what the other person is
saying, then that becomes a label for that image. So two people will see a picture of like sheep in a green field and so
they'll type sheep green field sky clouds and some of them may type like or something yeah and
another person will be like why i didn't type butts because i wasn't thinking that i was thinking of
the sheep and so the ones that match up yeah like idyllic they will those will become um the labels for that image. And that had this game mechanic of like, am I going to
figure out the words to describe this, that another human will also come up with the same words.
Yeah, if you're sitting there identifying the sheep species in Latin,
that may not be what the other human does yeah you may not you may be
right but you may not be winning points exactly so so you won't go with those labels you'll find
the ones that are more common and shared um and this this game was by this guy luis van on and
because it was like making image labeling fun through a game,
it kicked off this whole series of other games with a purpose.
And,
and then other people's kind of like,
they didn't get the mechanics quite right in a way that like,
I don't know.
They weren't some things that came after.
I just felt like weren't good games.
Like the mechanics of the game didn't match whatever the,
the purpose was trying to do like um you just can't throw points at people no you have to give
them more than that yeah at least a little bit more i mean points sometimes they work enough
that that people keep trying it they're like oh i do like to see my name on a leaderboard but
um but not everyone is like that and there really needs to be something
deeper where the person by playing the game is actually contributing to the whatever underlying
scientific or data cleanup purpose otherwise they may just be like racking up points but not
actually helping you out it sounds like to properly design a game you actually need to have some psychological understanding to know
what motivates people and also if you just do a naive thing like you're saying with points you
can end up with these holes like you said where the game goes off in a different direction and
people figure out ways to game the system yeah and you don't get the data you want yeah exactly like having the mechanics
aligned with the underlying purpose is super important um but you asked about like can can
i learn useful things from these games and um what louis van on is doing now probably other
things but one of his main things is this app called Duolingo for learning new languages and
it has like it's not a straight up game but it has a lot of like elements of a game like ramping
you up in a very gradual way and the idea of Duolingo in the first place was like there's a
bunch of text on the internet we need to translate more of the internet wouldn't
it be great if we had that and this was before like automated translation techniques were good
enough to use so like we need humans to do the translation but maybe people aren't skilled in
translating between english or you know obscure language one and obscure language two um or even english and like some other obscure
language and maybe not obscure but like any pairs of languages and um and so this this idea of like
maybe we can just teach people new languages and then they can start to help translate stuff
on the internet yeah i can totally see this working.
Because for me, it would be probably English and Spanish or English and French.
And you could give me an English phrase with an idiom in it,
and I would have to go figure out how to say that in Spanish
in a way that represented the idiom part of it, as well as maybe the words part of it.
And that would force me to go learn more Spanish, which is something I want to always want to do.
And it would help other people that if, if multiple people translated it similarly,
then you, you can start saying, oh, this is probably a reasonable translation. Yeah, exactly. And then by being in this process where you're learning a little bit of
new skills and then applying them, you'll be able to translate more, more effectively,
and you'll just kind of grow and grow and grow in what you know and what you're able to do.
And even if you presented me with, these are five things other people said,
which if this is right, you could do that and I would play and learn
and not care so much about just points.
It would be about fun.
Right.
And learning.
And learning.
All right.
So now like Duolingo is like a free, sometimes ad-supported app that you can use to learn new languages.
And I don't know how much the translating stuff on the internet plays into it anymore,
but it's this accessible language learning tool that seems really great,
especially compared to pay $500 for Rosetta Stone or something.
Yeah, we don't need to talk about that.
I want to switch topics entirely
because you are part of this company that is weird and cool
and I have trouble explaining it
because I get lost in AR and furniture.
And can you explain what GrokStyle is?
Yes, totally. So GrokStyle is the company that I currently work for. We do visual search for
furniture and home decor and sort of expanding to AI for retail in general. And what our core
visual search technology does is allows you to take a picture of a piece of furniture, like some chair that you like at your friend's house, and identify what that product is, either want to buy it. I want to know exactly what it is. Or we can go beyond that to understand all of the products in designer showroom images and know what things go together and then recommend either stylistically similar options or complementary options like you want to buy this sofa or maybe you could also buy this this chair and this coffee table and this rug and these would all actually look nice together
and you don't have to worry about not having that stylistic judgment yourself if you don't
actually have that and that seems hard that does seem hard so there's it's just it's just math and data and linear algebra and it's just math okay so i go
to a friend's house i take a picture and they're i don't know 15th century throne that i have taken
a picture of it then tries to find a similar throne that can be purchased now at some major retailer so like
it says oh yeah if you get this at target it's really similar
and so you have to have a huge database of existing furniture you're not just like
i'm taking this picture and then i'm going out to the internet and searching
you have to already know a lot about furniture. Right. Yeah. We have our own huge internal database of photos of furniture,
all like millions of products, millions of scenes of like ways that people have used this product
in the real world. And we have learned this, this like understanding of visual style.
Some way for anyone that takes a new picture of something,
for us to project that into some style embedding
and look up what's nearby,
what products are similar to this thing.
If I take a picture of a mission-style couch,
which is a very specific style,
you would be able to say,
oh, yeah, you might want a chair and this style of end table.
We're working on the recommendations part.
For now, we have a mobile app where we could take a picture
of your mission-style couch and we'll find more of those.
More mission-style couches for different prices from different places.
Yeah.
And how do you identify mission style?
How do you identify the style of what you're looking at?
Is this part of finding terms, search terms?
We are...
Tags?
Like the core of this is visual understanding so just from tons and tons of images
of couches of different styles we'll identify like these are the ones that look closest to this one
and then we can look at the associated metadata to see what the name of this of the nearby matches
are or what styles might be like tagged on those already but it starts from
like the visual path when we talked about sift and and how the eiffel tower isn't really good
candidate because it has holes and because it has repeats um chairs so in case, we're just doing deep learning
on tons and tons of images,
and SIFT isn't involved.
SIFT is a feature that a human would say,
I'm going to use SIFT in this pipeline.
And I've done some other computer vision stuff with faces
where I was like, we're going to match faces by comparing SIFT features across faces.
And I have to decide, I'm going to use SIFT, I'm going to look at these regions of the image, I've got to get all my faces lined up first.
But in this deep learning era, we can say, here's a bunch of images of all these things and I'll tell you how they're similar and how they're different.
And the computer can figure out what features and what internal representations are most useful, most discriminative for its purposes.
Does it have multiple stages?
Does it figure out it's a chair before it figures out what kind of chair?
And figure out chair versus couch versus table?
Our system does predict what category something is so yeah it'll say like i'm pretty sure this is a chair
so then it will go look up chairs instead of like looking across the entire database of everything
that we have because it would be more computationally optimal to say, okay, this is a chair. Now let's go into the chair subcategory
and finish looking up, is it a 1916 chair or postmodern chair?
Right. Another thing we can do though, is we can say, you took a picture of this wicker chair,
and we know it's a chair, but if we start looking for tables that are nearby instead we
might find like wicker like some other aspect that's stylistically similar but in a different
category so our our like learned style embedding does kind of cluster um objects even if there
are different categories but they're still visually stylistically similar it will kind of
still put them together.
I should have asked you what your favorite machine learning language was.
Keras? TensorFlow?
Straight math?
We're, you know, using several different of these machine learning libraries and rolling our own in certain cases and using Python to strap it all together.
All right.
Wow.
Ikea.
Tell me about Ikea.
Okay. is this visual search service provider. And IKEA is one of our big public clients right now
where they have an augmented reality app called IKEA Place.
And within that app, you can access a visual search feature,
and that is powered by us.
And so I go to my friend's house.
I see a chair I like.
I take a picture of it.
I say, you know what I want?
I want this chair in my house. So I go home and I go to Ikea app and then I say search and it says your chair is something that
has weird letter O's and then it just plops it into my... Yeah yeah so you could be at your friend's house use the ikea
place app to search there and say i'm going to figure out what this chair is and i'll be like
oh this is the poeing chair this is something else that you might struggle to remember and type in
later especially with all the accents and you can like favorite it in the app from there and then
bring it home and then place it into your home and see oh i like i like how
this fits i'm gonna consider buying this even though their chair may not be an ikea chair
right it's gonna find whatever's similar because that's what croxtile does yeah if you take a
picture of that cool throne that they have find the like closest ikea throne like item how does it how does it deal with size I mean it's just one picture it's just
there's no 3d how do I how do I know it isn't like a six foot by ten foot chair as opposed to
a normal size chair is that is that the future no like will find, if you take a picture of a chair and there also happens to be, like, miniature versions of that chair, we might still find, like, the little mini one.
And, like, Amazon sells, like, tiny little.
That's funny.
Dollhouse chair.
Yeah.
Like, we can't tell if you're taking a picture of a dollhouse chair.
This is not what I was thinking.
This doesn't fit in my space at all
but once you're in ar um those models are are all like true true scale true to life and um with the
like current capabilities of ar the scale like moving your phone around in your space and looking at what's in your space,
that does like estimate what size your space is and what the scale of everything is. So that if
you put like a three foot tall chair or something out there, it will actually be the appropriate
size and you can measure things. So the AR part is okay. It's just that I can take a picture of a doll chair
or a giant chair and it will find the most similar, but it will then be normal size because
the AR will show me what size it is. Yeah. And I do have a little like Ikea chair on my desk. I
should do the demo of like, take a picture of the dollhouse chair and then place the full size one
in my space.
Okay, I should ask you more about Ikea, but we're almost out of time and I wanted one more thing.
You started a Santa Cruz Pie Ladies meetup.
Yes.
Why?
So, Santa Cruz is like close to Silicon Valley, but not directly in it.
Close and yet so far. Yeah.
And I wanted to meet more developers, more technical people, especially women.
I was like, they must be here in Santa Cruz somewhere, but I don't know where.
I don't know where they are.
I need this community around me.
So I started this PyLadies chapter in Santa Cruz to bring people together.
And it's worked out really well so far.
How much does it cost to be the person who organizes all this?
I mean, is this expensive?
It is not terribly expensive.
I work out of a co-working space called next space in santa cruz and they they
have rooms that like conference rooms and they allow me to host pyladies for free because it
brings people from outside of next space into the space so that would probably be like the
hugest cost otherwise just like getting space i need need a good space. You can probably get companies to sponsor it as well.
And then on top of that, there's like meetup fees for meetup.com.
But I think I can get a grant from the Python Foundation to help pay for those.
And they're not that much.
It's like 40 or 80 bucks a year.
And then there's food and snacks.
But sort of been like figuring that out over the last few months of how much food we need.
And people like yourself bring snacks as well.
So it's sort of community supported right now.
And one of the reasons that I wanted to have this meetup in the first place was I went to some of the other meetups. There's
some JavaScript meetup at a bar and there was a lot of dudes there. And I took my two-year-old
daughter with me. So there was two of us women, but it was like I had to bring my own extra female that I had made. And so it is limited to women or people who are?
People who identify as ladies, as pie ladies.
I mean, it's open to anyone that would feel comfortable in that space.
Although if you are a man we request that
you come as a guest of another person in attendance and do you spend a lot of time organizing it
i should probably spend a little bit more time like finding more people to
to like give talks and stuff but not too much no so it's not that big of a cost. It's not that big of an effort,
but you do get a fair amount out of it. Yeah. What do you get out of it? I mean,
I didn't know there was a VI game, but yeah. Uh, so, uh, like 10 or so people show up to
the meetings and we have them every two weeks.
And it alternates between whether it's like a project night and people come and work on projects together or we just like talk about all kinds of things together or a speaker night where someone presents. with other women and other tech people in the area and seeing what other people are working on
and sharing ideas and just getting excited about things.
It just brings like warm fuzzies to my heart.
I enjoy it and I'm glad you started it
because it is hard to find a good technical community
and many of our meetups do tend to meet in bars and I'm unlikely to go to
a bar to meet people just because it's not where I want to talk because I can't hear anything.
Yeah, it's hard to get into the nitty gritty technical details sometimes if it's dark and loud and you don't have a
computer around and you don't get to like really know what other people are passionate about and
what they're excited about and how that can sort of rub off on you and get you really excited about
something but if you're in like a sort of more collaborative space or environment like i'd love to have longer pie
ladies meet up sometime like little dev house style pilot Saturday morning yeah we could
and i thought it was interesting that one of the presenters then went to go to a job interview and
was asked a question that was basically from her presentation.
Yeah.
And it was funny because it wasn't, because it is every two weeks or every four weeks there's a presenter.
It's pretty easy to sign up on the presenter list, let me tell you.
But it is good practice.
Yeah.
I mostly wanted to ask you about it because I want to encourage people who have this idea that it doesn't have to be a lot of effort.
And sometimes it doesn't work.
I mean, there's a decent chance that it may in five years just be you and me looking at each other going, well, maybe this has run its course.
Yeah.
Which is also fine.
Yeah.
But for those five years or whatever that it exists, like, it can be all kinds of great opportunities.
I'm meeting new people, people who have sent me to other meetups, which were then way too crowded.
But, yeah, it's neat. And there's two women there that run a python study group in felton
yeah so like they're they're on top of like we're just gonna do this thing for ourselves
yeah so if you're out there thinking gosh i wish there were other people
that i could talk to whether it's PyLadies or JavaDev.
The space is the hardest part, but if you can find a space, even if it's a coffee shop that has a back room, it might be worth it.
It might be worth it to try it.
And $40 or $80, yeah, that's a lot to try it, but how much do you spend on conferences?
This is like a year-long conference, one hour at a time.
And those fees are only for meetup.com.
Which is kind of the easiest way.
Yeah, it has made it very easy.
And people have found the Pilates Meetup through meetup.com. cost or something like maybe there's more organic ways to advertise and just get people together that you want to share your technical interests with yeah i found a writing group on next door
of all places so it's all kinds of stuff yeah all right we have kept you for quite a while given um
oh we have so much more that we could talk we do we totally do
which just means that you can come back and since you're local come back that'll be easy
uh just do you have any i was wondering if you had advice for people who want to get into this
whole space either if they're in college or hobbyists or people who are professionals who want to change
to something i mean what's the right what's the right path to start learning about this whole
space because it seems like a lot of different things which yeah which part of the space like
the computer vision part the like building interactive systems that people can play with,
part the game design part. I guess the computer vision part, yeah.
There are, there's like, because it's a popular thing right now,
there are a lot of tools coming out,
including tools for making your own models and using them.
So I think like TensorFlow is being ported to javascript and trying to make
it as easy as possible for people that might be like in a web programming language to get access
to these tools and then build things that are like running in other people's browsers so like
they're the easiest possible thing to share i think personally like going that route where you are using like
JavaScript type things where you can make something small and share it with your friends
and your friends will be like, wow, that's so cool. That will just like give you a ton of
encouragement to like keep going. and then I think with JavaScript you
can like look around and see how other people are doing this because you can maybe get access
to the code a little bit more easily so I don't know doing it like a social kind of way yeah I
mean there's there's a lot of good social benefits to being able to share especially if
you're just getting started and trying to figure it out cool what about getting started in games
with a purpose uh this morning there were tweets from this human computation conference called H-Comp, which is happening in Zurich right now.
And I think there's a keynote from the people doing Zooniverse, which is a platform for all these different citizen science projects and some of them may not be game flavored at all but like
they're probably they're probably game flavored ones or ones that could be more
engaging um if they were sort of more game like and helping ramp people up and learn things
uh zooniverse is the citizen science place that does Galaxy Zoo, where you can identify different galaxies or different features in pictures.
Yeah.
And they have a bunch of other projects, too, like looking at pictures that camera traps have taken on traps.
Like cameras that are out in the wild where animals will walk by and a motion sensor will trigger and the picture the
camera will take a picture and then citizen science people have to go and tag those to say
there's actually an animal here it's a it's a fox it's a bunny it's a deer it's a elephant and so i
think there's there's lots of these that are out there um like ones that you can go find and
participate in um and then i like zooniverse as a platform
for making more of those so if you have an interest in kind of working on the the building
of those tools buildings of those projects um i'm sure like there's space for that as well like
whatever your passion is or um or even getting involved with the existing ones. Do you have any thoughts you'd like to leave us with?
Last brief thought on augmented reality.
Visual search is going to be a big part of that,
understanding what your environment has in it already
so you can do more meaningful, more intelligent augmented reality.
Our guest has been Kathleen Toot,
computer vision expert and software engineer at CrocStyle.
If you'd like to join us at PyLadies in Santa Cruz,
there will be a link to the meetup in the show notes.
And if you're not local to Santa Cruz,
there are lots of PyLadies and lots of meetups.
Check around. It's worth it.
Thank you for being with us, Kathleen. Thank you for having me. Thank you to Vicky Toot for introducing me to
Kathleen and for producing her. Thank you to Christopher for producing and co-hosting this show.
And thank you for listening. You can always contact us at show at embedded.fm or hit the
contact link on embedded.fm. Thank you to Exploding Lemur for his help
with questions this week. If you'd like to find out guests and ask early questions,
support us on Patreon. Now a quote to leave you with from Douglas Engelbart.
In 20 or 30 years, you'll be able to hold in your hand as much computing knowledge as exists now in
the whole city or even the whole world.
I don't know when he said that, but I bet it's still true.
Embedded is an independently produced radio show that focuses on the many aspects of engineering.
It is a production of Logical Elegance, an embedded software consulting company in California.
If there are advertisements in the show, we did not put them there and do not receive money from L'Occitane.