Secretly Incredibly Fascinating - CAPTCHAs

Episode Date: August 14, 2023

Alex Schmidt and Katie Goldin explore why CAPTCHAs are secretly incredibly fascinating.Visit http://sifpod.fun/ for research sources and for this week's bonus episode.Come hang out with us on the new ...SIF Discord: https://discord.gg/wbR96nsGg5

Transcript
Discussion (0)
Starting point is 00:00:00 Captchas. Known for being puzzles. Famous for being internet. Nobody thinks much about them, so let's have some fun. Let's find out why Captchas are secretly incredibly fascinating. Hey there, folks. Welcome to a whole new podcast episode, a podcast all about why being alive is more interesting than people think it is. My name is Alex Schmidt, and I'm not alone. I'm joined by my co-host Katie Golden. Katie, what is your relationship to or opinion of CAPTCHAs? You know, it's really weird. Every time they're like, you know, solve this CAPTCHA or, you know, look at this CAPTCHA, I don't see anything and I do not understand what is going on. Hmm. Okay. I see. Is it because you are a robot? Um, I, no comment. See that you, you should have known. I'm known for my hard hitting investigative questions. That's the main thing I'm known for. And so, uh, put you on the spot.
Starting point is 00:01:19 This is a gotcha podcast. Yeah. With my friend who I tape with every week. This is a gotcha podcast. Yeah. With my friend who I tape with every week. If I see a tortoise in the desert flipped over on its back, I want to sell it Bitcoin. Yeah, I'm really glad listeners picked this. And of course, they pick so many of our topics. Join the Discord and please support at MaximumFun.org if you want to help vote on and pick topics. This was suggested by Shane the Boldman and with support from JCR Dude, with support from Endrio, many people excited about this topic. And I'm really glad they were because this is
Starting point is 00:01:55 one of those perfect topics where I don't have a ton of relationship to it going in. I've just kind of dealt with them from time to time and never really considered them. And now I finally know their whole deal. Thrilling. Yeah, I actually am kind of bad at them in all seriousness. No, I'm not a robot. Yes, I can see them. But I get confused sometimes because they're like, select street traffic lights. I'm like, well, does this count as a traffic light?
Starting point is 00:02:21 It's like a pole structure that supports the traffic light. But I don't know. Right. Or like select everything that has a bus in it. And then I see something that's like a van. I'm like, is that a bus? I don't know. What do you define as a bus?
Starting point is 00:02:36 Right. We're too human. That's a problem. We were too human all along. I also, I do like the robot cannon concept that they can't see anything in the boxes. Like, not that they're confused by it. They just see nothing. Like when a vampire is in a mirror. That's fun.
Starting point is 00:02:54 Right. I'm really into that. Right. Yeah. I mean, I feel like that would be, that would be a creepy thing, right? You think you're human. And then it's like, to check if you're not a robot, like solve this capture. Like, I don't see anything. Am I real? Also, any reference to vampires must shout out Jordan, Jesse go and their great wisdom. But Dracula's can have any job,
Starting point is 00:03:17 I think, except seeing themselves in mirrors. But now robots are probably pretty job limited. You need to do captures for everything. So tougher robots, too bad. Can Draculas see other people in the mirror or is it just themselves they can't see? I think they see other people, yeah, but I think they can't see themselves or other Draculas. Okay. So a Dracula could be a hairstylist for a human, but I don't know if a Dracula could do a really good haircut on another Dracula because the mirrors I thought are pretty important for the whole process. That actually sounds superior because then when they hold up the mirror so I can see how they did the back of my head, there's no like big barber hand visible, you know?
Starting point is 00:03:59 Yeah, that's actually better. Yeah, they'd be perfect for that. They can have any job. That's actually better. Yeah. They'd be perfect for that. They can have any job. That's true. Yeah. It's just like a real slicked back widow's peak haircut. And you're like, this is great. Right.
Starting point is 00:04:14 That's the only haircut they do for people. Like one standard haircut. Got it. And then they just do. Yeah. Very slick back. Very sharp widow's peak. Very slick back.
Starting point is 00:04:22 Very sharp widow's peak. When folks on every episode, our first fascinating thing about the topic is a quick set of fascinating numbers and statistics. This week, that's in a segment called Katie Muscles Turbo Golden. Katie Muscles Turbo Golden. Katie Muscles Turbo Golden. Hero of the podcast. Bird rights power. Wow. golden katie muscles turbo golden hero of the podcast bird rights power wow i it's accurate i gotta say it's accurate i do got turbo hero muscles focus that name was submitted by a mullins tx thank you a mullins and they're definitely a listener of our the name
Starting point is 00:05:01 alex podcast episode from a few weeks back uh so that's a reference to that, and that's fun. We have a new name for this every week. Please make them as silly and wacky and bad as possible. Submit through Discord or to sifpod at gmail.com. And please make them all about my magnificent muscles. Welcome to the gun show. Bam, bam, muscles. Your Dracula barber's like, amazing, wow.
Starting point is 00:05:25 And there's nobody back there. And the first number here is 2000, as in the year 2000. Nice round number. That's when a team of researchers at Carnegie Mellon University developed the first CAPTCHA system. It's from the year 2000. One year before. Huh. One year before Space Odyssey. Right.
Starting point is 00:05:53 We had to build it so that the system that's in evil AI on the spaceship in the movie could be stopped. Yeah. That was our last step before odysseying into space. Yeah. Hal's turning off all of the oxygen and then it's like, can you show me, show me where all the crosswalks are, Hal? It's like, cannot do me show me where all the crosswalks are how it's like cannot do it daisy daisy
Starting point is 00:06:09 he tries to do our move like but what is a crosswalk if you think about it could it not be any street where one crosses or walks. And this was built by specific people in the year 2000. A team at Carnegie Mellon led by engineer and researcher Luis Von Ahn built the first version of CAPTCHA, and it was a puzzle made of blurry letters. Another number here is that those first CAPTCHAs were solved correctly by humans 97% of the letters. Another number here is that those first captures were solved correctly by humans 97% of the time. This technology didn't really have a forerunner. Before it was built, people could go online and construct software that would spam an internet prompt or overwhelm it or send malicious things to it. And this was a new way of stopping that. It was a new system to say, hey, before you, a computer,
Starting point is 00:07:10 send thousands and thousands of things to this prompt, you need to solve a puzzle first. I mean, the fact that only 97% of humans got it means that at least 3% of us are robots, right? Yeah, also very good for weeding out the androids. That's great. The Lincoln android from Disneyland is trying to log in. Like, what happens to Abraham Lincoln in real life?
Starting point is 00:07:36 And he's just denied, so he'll never know. I would love it if all the first bots were the Hall of Presidents robots at Disneyland. That would be so great. They all just like achieve sentience at the same time. Yeah. Oh, we're getting a DDoS attack from John Tyler. And James Buchanan has taken down Cloudflare or whatever. Oh, no.
Starting point is 00:08:04 And this, again, it got developed at Carnegie Mellon University. So CAPTCHAs are from Pittsburgh, if you want to be proud of Pittsburgh. And this guy, Luis Van Ahn, is also just very interesting. He's a Guatemalan immigrant to the US with German Jewish heritage. In addition to helping develop CAPTCHA, he helped pioneer crowdsourcing information online. And he and one of his former grad students co-founded Duolingo. Oh, wow. So he's kind of all over your internet experience. And I feel like almost all of us have interacted with something Luis Van Aan built.
Starting point is 00:08:42 I mean, we've all been harassed by the owl at some point. The Duolingo owl has come to our house and been like, hey, man, it's like it's three in the morning. Duolingo, what are you doing here? It's like, I noticed you haven't practiced your Italian in about a week. It's like, no, I have been practicing just, you know, not with you. And then he's like, oh, oh, I see. OK, well, this is awkward. And he just looms. He's so big. Yeah, he just kind of stays there. It's like, oh, are you going to go? Like, no, I'm an owl. I'm nocturnal. Also, between the owls and the Draculas, we're doing a lot of Creatures of the Night. This is a fun episode for anybody listening at 4am. This is your people right here. Yes. A spooky episode.
Starting point is 00:09:21 This is your people right here. Yes. A spooky soad. Well, and the next number here takes us a little back. It is the year 1950. So back 50 years, 1950. That's the year when computer scientist Alan Turing proposed a test of machine behavior, and it would determine whether a machine can behave in an equivalent way to a human or behave in a way indistinguishable from a human. And that's now known as a Turing test. It was first theorized
Starting point is 00:09:50 by Alan Turing in 1950. And this is the test where it's basically you have someone, and if they can't tell the difference between the responses of a human and a robot, then it passes the Turing test. Is that correct? That's correct. And there were also a lot of specifics to Turing's prediction that I hadn't known. One key source this week is a book called The Most Human Human by engineer and nonfiction author Brian Christian. That theoretical Turing test he came up with is conversational. It's something where you have basically judges, and they talk to computers and talk to humans, and they see if the computers can
Starting point is 00:10:31 sound like humans. Also, Turing predicted that by the year 2000, 50 years from when he came up with it, computers would be able to fool 30% of human judges after five minutes of conversation. So a very specific parameter, computers fool 30% of judges after five minutes of talking. I feel like Alan Turing was definitely a very, very smart dude, but that seems like he pulled that out of his ass. I agree. And it was somewhat accurate to how computer technology developed, which is amazing. Like he might have just got lucky. But either way, people were very inspired by this specific test. And then in the early 1990s, people started holding annual public Turing test competitions. Like they said, bring us your software, see if you can fool 30% of
Starting point is 00:11:26 judges after five minutes in a conversation. And they didn't get there by the year 2000. But by the end of that first decade, they started approaching it. And then in 2014, a computer program named Eugene Guzman fooled more than 30% of judges at a Turing contest run by the British Royal Society. That's how you make sure that computers don't turn evil is you give them like really funny names like Eugene Goostman, because you can't like if you have like a Terminator, right, or a Skynet. Okay, those might turn out to be evil. But can you imagine just like the singularity where it's like, yeah, Eugene Goostman now takes over the world and controls everything and has humans in like little pods. Right. Yeah, Sarah Connor can't fight Eugene Goostman. She can
Starting point is 00:12:19 at best give him a noogie or a swirly. Yeah. Yeah. Eugene Guzman is not played by Arnold Schwarzenegger. Eugene Guzman is played by like, gosh, I don't know. Eugene Levy, let's say. Eugene Levy. Yeah, that would work. That would do it. I reach for Eugene. It's not a creative cast.
Starting point is 00:12:40 Eugene is a very specialized name. It's not a bad name. It's just very specific, kind of a specialized name. It's not a bad name. It's just very specific, kind of a specific name. It's not threatening. And also this Eugene Guzman story is really, really weird because this is like the one of the first computer programs to be considered to have won a Turing test. But in my opinion, they kind of cheated oh according to the verge here's how eugene guzman worked they would chat with eugene guzman and eugene guzman performed a character and eugene guzman claimed to be a 13 year old boy from ukraine who does not speak english well
Starting point is 00:13:22 and so just like whenever a response was weird, Eugene Guzman would go, oh, well, I'm Ukrainian. I don't speak English well. Which to me is cheating. I think you have to do it with somebody who is totally fluent in the language. You know, like, come on. I think that is kind of fudging things a little bit. Like, no, I'm a human.
Starting point is 00:13:43 I just have a speech impediment where sometimes I go 404 response not found. Right. Exactly. That's not impressive at all. So it's still kind of a concept we're kind of chasing. And it's also the core of this episode's first takeaway, because takeaway number one, CAPTCHAs are society's most widespread application of a Turing test. Some of my sources called CAPTCHAs a reverse Turing test. I think it's sort of both at once, but either way, CAPTCHAs are officially a Turing test, and it's been applied billions of times to all of us in the world who use the internet. So it's probably going to be history's greatest and most widespread Turing test when all is said and done.
Starting point is 00:14:35 I mean, I am grateful for it, despite how it's annoying that we did solve the question of like, are half of us like Cylons? or half of us Skrulls? Because I think it would, I think it would, you know, like weed out the Cylons and the Skrulls. Yeah. Skrulls are the new Cylons, I think. I never thought of it that way. It really was a good way to check if the planet had been infiltrated by a bunch of aliens or robots before the internet got set up we did kind of exactly that's good yeah great yeah there we go yeah because it turns out like officially
Starting point is 00:15:11 definitionally a captcha is a turing test it's even named that the word captcha and i don't really pronounce the t in it when i say it because i think that would be unpleasant but it's spelled captcha yeah how did you even do that captcha that that's the ultimate turing test you're a robot if you say it that way uh because it's spelled c-a-p-t-c-h-a captcha and it uh it turns out that's an acronym ah and it's an acronym that they cooked up to sound like the English word capture. Right. But a New Yorker saying it, capture. Or I guess Boston.
Starting point is 00:15:52 That's more Boston, right? Gonna capture these robots. Yeah. Select all the Duncans in these squares and everybody from Boston just does it in a millisecond. Like, wow, you're too fast. What's the new Dunkin' flavor? You don't know you're a robot. You don't feel nothing. And so this acronym that they invented, it stands for completely automated public Turing test to tell computers and humans apart. Obviously, there's a bunch of extra words in there.
Starting point is 00:16:30 Yeah, that's one of those really forced acronyms where you could tell that the goal is the acronym and they're really, really pushing it. But you know, still, good job. Good job on the name. Well, good job. Good job on the name. Yeah, like they forced it, but it again, completely automated public Turing test to tell computers and humans apart. It's officially a Turing test. It's just that, you know, we mostly think of a Turing test as something you present to an android to see if the android seems like a human. see if the android seems like a human but you can also do the opposite where you're just making sure humans are humans and weeding out robots that are not effectively acting like us yeah i mean isn't that the thing in blade runner it's like they have a test when they suspect someone is a replicant yeah and it's like what do you do if you come across a tortoise that's on its back? Do you poke it?
Starting point is 00:17:26 Do you flip it over? That was a very fun reference earlier, yeah. Do you like beat on its little tummy like it's a little bongo? That's what I would do. Does that make me a robot? Right, like we think of that science fiction thing and then all of us have done, I don't know, hundreds of these.
Starting point is 00:17:42 Not every human on earth uses the internet, but all of this podcast listeners do. And so all of us have done, I don't know, hundreds of these. Not every human on Earth uses the Internet, but all of this podcast listeners do. And so all of us have done a bunch of these Blade Runner type tests. But just while we're browsing ordinary feeling websites. I guess it'd be a short movie if it was just resolved by solving a captcha. And that was it. Harrison Ford wouldn't have like that much more to do. The rest of the movie is just him sort of scrolling through future Reddit threads.
Starting point is 00:18:09 Come on, solve the movie. Yeah. Solve the thing. Yeah. Solve the movie. Solve the movie. The origami unicorn is representative of something. Just a few numbers here about how much we do that according to google as of 2012
Starting point is 00:18:28 world internet users performed and solved 200 million captchas each day 200 million per day also there's no exact number but tens of millions of websites have used CAPTCHA technology. So this is truly a global thing. It's a system that Google owns and has offered to the world. And so it's just out there for all of us all the time. It feels like there's going to be something sneaky about these CAPTCHAs. Like it's not just figuring out whether you're human or robot, but they're getting some kind of secret data from us. Like, how fast are we at determining what are sidewalks and what aren't sidewalks? And how are they going to weaponize that information against us?
Starting point is 00:19:15 Yeah. And we're going to talk about all that. And first we're going to talk about money. Because money is going on. I like money. Lots of people do. Yeah. You passed the Turing test. You're a human. I like money. Lots of people do. And you passed the Turing test. You're a human.
Starting point is 00:19:28 You love money. Yay! I'm a human and a capitalist. And because the next number here is 19%, because in a 2022 survey, 19% of U.S. adults surveyed said that they have abandoned an online purchase because of a frustrating captcha. So nearly 20% of people surveyed said, I ran into a captcha that was too hard and I didn't buy something I was otherwise going to buy. I mean, I think if you're so not committed to buying that thing that you let a Captcha get in your way, you either are a robot or you didn't need that thing. That's true. A lot of online purchases maybe fall into that category where it's like, I don't need this airbrushed T-shirt of my own face. And then the Captcha was the thing that finally talked you out of it.
Starting point is 00:20:25 I need that, actually. I would appreciate some kind of... Of my face? Of your, yes, your face, not my face. Yeah, no, I would appreciate a browser add-on where it's just like when I purchase something, it's like, are you sure? Yes. Are you sure you're sure?
Starting point is 00:20:40 Yes. Are you absolutely sure you need this dumb thing? sure? Yes. Are you absolutely sure you need this dumb thing? And, you know, just a few times, because, you know, it'd be nice to sometimes think a little bit about like, you know, like, do I, do I really need it? Do I really need this silicone tea diffuser that looks like a poop? That's a real product, by the way. I did not buy it it but i was tempted i i used to have one that was shaped like a manatee oh my lovely wife got it for me and the manatee like props its fins on the edge of your mug oh it was great and then also the pun manatee right it just it works oh that's great no that everybody buy it that one's good um the one i'm talking about is like a silicone butt
Starting point is 00:21:25 and a poop is coming out of it and then the tea diffuses out of the poop and it's like okay yeah the thing is it's gross like i i can appreciate some good potty humor but i don't i don't mess around with potty humor when it comes to food and drink because it's like i don't want to think i'm drinking poop water i'm i don't because it's like, I don't want to think I'm drinking poop water. I don't care how clever it is. I don't want to think I'm drinking poop water. Yeah, especially because so many kinds of tea are brown. You know, I don't know. It doesn't work.
Starting point is 00:21:55 Yeah, forget it. No, I'm glad the capture stopped me from making that purchase. And oddly, another number here is more than 50 percent. purchase. And oddly, another number here is more than 50%. Because that same survey, more than half of respondents also said that they feel safer and more likely to buy something if they're required to do a CAPTCHA test. That's weird. And so the result here is that CAPTCHAs will probably stick around for online purchases, partly for security, but also because even though they disrupt some purchases, they also make other people and more people feel like it's safe to
Starting point is 00:22:31 buy the thing because there's some kind of security on the site. Yeah, I guess that makes sense. Like then if the site is protecting against bots, but I didn't think I thought it was just protecting it against like denial service attacks. I guess it's like if they're doing a CAPTCHA, then at least that then they have enough savviness to do some kind of security. So one would could then assume that maybe they have security in some other element of the website. So I can see that being a positive signal. Yeah, that's how people go with it. And then the next few takeaways here, some will involve money, some won't, but we're going to go
Starting point is 00:23:11 through the stages of development of CAPTCHA. Because the next takeaway, takeaway number two, the second big stage of CAPTCHA technology development preserved hundreds of millions of pages of the written word for posterity. Hmm. There are later takeaways here that are more financial and more commercial, but basically the second big advance in CAPTCHA technology, after just creating it, is one of the most significant acts of historical and cultural preservation in our lifetimes. It's just really cool that this happened. Oh, yeah, I think I remember this. Like, they would show you, what is this letter? Or what is this word? And it's some like, weird old script or handwritten thing. And I'm like, I don't know, I think that says seven, but it could also say carrot. I'm having trouble here. But yeah, I remember having to do those and thinking those were actually pretty hard and then being sure I'd gotten it wrong. But then it's like, thank you. And then I got into the website. I was like, wait, what? I don't think I did that right. And maybe I'm a robot. Why are you letting me in your website?
Starting point is 00:24:22 I'm a robot. Why are you letting me in your website? Yeah, especially these text-based ones. They've often been programmed with some flexibility for like common errors, which is good programming. Like if you're a human, you might make a human type error in solving a puzzle like that. And that's we'll talk later about the lack of accessibility with a lot of captures. But in that one way, they've been relatively accessible. The lack of accessibility with a lot of CAPTCHAs, but in that one way, they've been relatively accessible. They have a stupidity filter for people like me who cannot do them very well. And the key source here is a piece for National Geographic by the great science writer Ed Young when he was at National Geographic.
Starting point is 00:25:02 Oh, yeah, I know that guy. Yeah, he's the best. And this was the mid-2000s this happened, so that's just a few years after the year 2000 when CAPTCHAs were created. What happens is shortly after this Carnegie Mellon team creates the basic CAPTCHA based on blurry text, Google purchases it and offers it to any websites who want to use it for free. offers it to any websites who want to use it for free. And so the websites say, great, free service using it. And soon CAPTCHAs were being solved 100 million times per day within a few years of them being invented at all.
Starting point is 00:25:35 Wow, that's a lot. It feels like a good data gathering opportunity. Am I right? It took them surprisingly long to at least implement that. Maybe Google had that idea from jump. With the researchers here, such as Luis von Ahn, it seems like their initial idea was, we feel bad about how CAPTCHAs are ultimately a waste of time. It has a security purpose. It's doing a good thing. But they had this realization that people are taking blurry words that our system generates, decoding them, and then that
Starting point is 00:26:13 effort just doesn't go anywhere. It simply solves the test and that's it. And so they had just this one clever idea of what if we incorporated CAPTCHA technology into digitizing books. Apparently, that was a huge leap forward, because in the mid-2000s, book digitization used software called optical recognition software. So you take the pages of an old book, you put it in front of the software, and it tries to figure out what the words are. But back then, that software was only about 80% accurate, 8-0. And so the resulting digital books were full of errors and pretty messy. It was the best of time. It was the bratwurst of times. Okay, I'm listening, because folks, you tell a German-American about bratwurst, they are in line immediately with mustard and a bib, you know? Here we go.
Starting point is 00:27:12 Yeah. The guillotines in this one just cut up a nice big bratwurst for everyone to enjoy. And so then the capture researchers said, let's remove the bratwurst. And I said, no, but they removed it because capture researchers did preliminary tests where they took just the hard to decode words% accuracy reading these words versus 80 or lower for the software. And they also, along the way, had this insight that humans could be given the same puzzle. You know, just random humans across the world, a couple of them get the same puzzle. And then by doing that, they check each other's work in a natural way. they check each other's work in a natural way. Like if you are getting the same result from enough humans, you can kind of just declare it solved without running it through more layers of checking. Right. But then how, so is this also simultaneously doing a CAPTCHA thing? Like, if we're using people to determine what these words are, how did we know
Starting point is 00:28:23 that the people were guessing correctly? You see what I'm saying? Yeah, there was an extra step that the programmers did to generate the capture prompt. What would happen is two different optical recognition programs would scour books, and they would set aside just some words that were hard to read, And they would set aside just some words that were hard to read. And then that would get paired with a control word that is already known. So they would take one blurry book word and one regular word that was selected by the programs. And then both of those would also get distorted further. I see. So humans are reading an even worse version of the book word and then also a word preselected. And so then the CAPTCHA builder can create an answer. I see. So they have a legitimate CAPTCHA in with the book word that you're discerning. Got it. Yeah. Which is really
Starting point is 00:29:22 a lot more complicated than I expected and has been sort of hard for me to even hold in my mind as I research it. Because there's really a lot of steps there. But the upshot is an incredibly intricate way of doing these text puzzles that helps decode the hardest words in books. And also, once you program it, it's pretty seamless. It's not that hard to have a machine run this once you get it going. Right, right. Because they've already presumably scanned a bunch of these books. So then you're just kind of the machines can already clip out words.
Starting point is 00:29:55 It's just they're not always super accurate in terms of what that's saying. Yeah, yeah. We're using humans to do the hardest words. And they're sort of tag teaming with machines. And also they're using us solving captures like we're already all doing that labor and they just figured out a way to make it more useful. It's great. Humans and robots working together to put books in the robot world. It's very nice. It's a different story than what they tell us in terms of humans and robots always having to fight.
Starting point is 00:30:31 I think we should, you know, be friends more. Especially if we start a book club together. Oh, that's so cute. Right. Oh, man. Right. Really cozy. We got to be careful, though.
Starting point is 00:30:41 We can't, like, pick... Humans can't, like, pick the topic of, like, Isaac Asimov. Robots should probably, like pick humans can't like pick the topic of like isaac asimov uh robots should probably like i don't know not pick uh their book written by uh robot asimov about like humans secretly bad hmm don't know how to avoid that i think we'll just keep writing books about distrusting each other it's gonna be a not cozy book club after all oh no just like cozy book club after all. Oh, no. Just like any book club. Right. Eventually, it's a fight about sentience. Every book club,
Starting point is 00:31:15 no matter how much tea and cozy blankets. Yeah, every book. That's just every book club. Yeah. Once they figured out this change, they could dogie mellon and google they collaborated on a new system they named it recaptcha that is just a little upgraded name and from there they put out these puzzles if three users gave the same answer the book word was considered solved and added to a digitized book if six users in a row all gave up on a book word, it got filed as unreadable and either just left blurry in the book or they tried some other kind of re-scanning of the text to get a better picture.
Starting point is 00:31:54 Did they use the human data to train the machines to figure out these words better or just basically only use the human data to translate the words? They did both. Yeah. And it's also part of why text-based CAPTCHAs are kind of going away because with every kind of CAPTCHA, you have this arms race where there are machines running the test too. Like we can build a machine to operate it, and that also means a machine can be built that can solve it, and they can learn from the data set humans are creating. So that's just sort of a broad, low-key takeaway throughout this episode, is every CAPTCHA we build leads to the machine solving of the CAPTCHA. At least so far. Maybe we'll build something they can't do,
Starting point is 00:32:45 but that's how it's gone so far. At a certain point, we will not be able to CAPTCHA the machines, and then that's the point at which they buy all of our weird silicone tea strainers, and then, you know, I think that's the singularity. The ultimate thing that protects humanity is the grossness of the butt when the machines just don't want it and then we get to keep it. It's like, ew, butts, let's leave this planet.
Starting point is 00:33:14 It's almost like anti-human propaganda. Like humans just poop and are gross. And the machines are like, you're right. Let's build rockets. They just leave. Robots are just so grossed out by human beings. They don't even want to bother like, like doing a matrix situation. It means you have to like put a feeding tube and like a poop tube in. Right. Cause like, cause like in the, cause listen, I've had this problem with the matrix
Starting point is 00:33:41 for a long time. Where's the poop and the peep go? I'm sorry. I know it's gross, but you got to wonder. It's a good question. And so there's going to be some robot that has to deal with that, right? Like the human poop and peep, even if it's just something like, say, if you've ever owned an aquarium, you know, you have to like go in and sometimes clean out the poopoos with a little like, you know, a little vacuum. Like maybe there's a robot that has to do that. And I can't imagine robots wanting to do that.
Starting point is 00:34:08 So I feel like we're simply too gross for robots to create a matrix for us. Yeah, take that. We're a bunch of garbage pail kids. And that's why the robots can't turn us into batteries. We just moon them and then they get really disgusted and leave. We just moon them and then they get really disgusted and leave. this reCAPTCHA system where they finished a lot of books, within one year, internet users solved over 1.2 billion with a B reCAPTCHAs. That means they deciphered over 440 million words.
Starting point is 00:34:55 And that word count on its own is the equivalent of 17,600 books, but it's really far, far more books because we're just doing the hardest last words to solve in the rest of a book. And so just a few years later, by the year 2011, ReCAPTCHA users finished digitizing the entire Google Books archive, whole thing, and then also finished 13 million articles from the New York Times back catalog. Wow. So we did all of Google Books and we did the New York Times back catalog dating back to the year 1851. Digitized, saved. Well, we did that all. So where's our money?
Starting point is 00:35:38 Pay us. No money. That is true. Damn it. Yeah, this this capture thing i feel like we sort of get paid in the form of a more functional internet boo like there's a lot of security for a lot of websites and it's just made the whole internet possible but it's in a very like subtle public utility sort of way where it doesn't feel good and you don't get any money. We get paid by immortalizing knowledge. Boo, give me money dollars. Boo.
Starting point is 00:36:14 Yeah, that's right. I think that like the preservation of knowledge is a really important task. It's kind of inspiring how much like the volume of work that can be done collectively. I mean, it kind of puts civilization into perspective, right? Like we look at where we are. Of course, the collective work of so many people. Yeah, I think I think it's there's so much like chatter and stuff about AI and chat GPT and stuff. And it's trained on like collective human output. So, you know, however impressive you find chat GPT, like that is based on sort of collective human output, as far as I understand it, which I don't understand it that well.
Starting point is 00:36:59 Yeah, I think humans collectively can do really impressive things together. I think humans collectively can do really impressive things together. That's all a perfect segue into the rest of the takeaways, too. And before we leap into those, we're going to take a quick break and congratulate ourselves on our perfect segues. We'll be right back. We have a very special sponsor this week. They build what I think is a computer that is also more fun than every other computer. It's the game Turing Tumble. You know, like that name Turing, like Alan Turing,
Starting point is 00:37:36 who we're talking about all week this week. Turing Tumble is totally unique. It is an educational game, and it is also a puzzle adventure experience. And it's for ages 8 to adult. I'm an adult. I loved it. If you're anything like me, most of your work involves a computer, and then also a lot of your fun is screens, too.
Starting point is 00:37:57 Either fully the same computer where you switch over to a computer game, video games, TV, ebooks even. Like, you're just looking at screens all day, every day for business and for pleasure. Turing tumble is tactile. It is a board. It is pieces. It is marbles. It is something that whirs and whizzes and clicks. It's just a lot of fun to give yourself that break from electronics in a way that celebrates electronics and enriches them for you too. So that was my favorite part of it. I can tell you all the nitty gritty of how I solved puzzles, but also I don't
Starting point is 00:38:29 want to spoil any of the puzzles for you. So the gist is I really, really like having a deeper appreciation and understanding of the computer I am looking at as I tape this right now. And then also I got to do that without looking at pixels at all. I just got to look at a book of puzzles and a board and have a really nice and grounded tactile time. Learn more about this game and see it in action at upperstory.com slash Turing Tumble. Use the coupon code SIFTPOD for 10% off your total purchase. And that's upperstory.com slash Turingtumble and code SIFTPOD for 10% off. I'm Jesse Thorne. I just don't want to leave a mess. This week on Bullseye, Dan Aykroyd talks to me about the Blues Brothers, Ghostbusters, and his very detailed plans about how he'll spend his
Starting point is 00:39:20 afterlife. I think I'm going to roam in a few places. Yes, I'm going to manifest and roam. All that and more on the next Bullseye from MaximumFun.org and NPR. Hello, teachers and faculty. This is Janet Varney. I'm here to remind you that listening to my podcast, The JV Club with Janet Varney, is part of the curriculum for the school year. Learning about the teenage years of such guests as Alison Brie, Vicki Peterson, John Hodgman, and so many more is a valuable and enriching experience,
Starting point is 00:40:01 one you have no choice but to embrace, because, yes, listening is mandatory. The JV Club with Janet Varney is available every Thursday on Maximum Fun or wherever you get your podcasts. Thank you. And remember, no running in the halls. And we are back and with astounding further takeaways because the rest of this episode continues the evolution of CAPTCHAs. Next one is takeaway number three. The third big stage of CAPTCHA technology uses your answers to improve machines' ability to understand images. to understand images. And this is that select all squares sort of thing that we've referenced often where, for example, there's nine squares and you have to find the traffic lights.
Starting point is 00:40:51 I feel sort of proud that I've probably impeded the progress of AI just by being so bad at those. I'm so bad at them. Yeah, for some reason, I'm worse at picking out stuff in a picture than doing blurry letters, even though you would think visual acuity would be kind of the same across that. Letters and words are kind of symbols. There's a certain point, like symbols are more, to me, more discreet than say, like, is this a truck? I don't know. It's a corner of a car.
Starting point is 00:41:24 I think it might be a truck. Yeah. Or like there's a tiny truck like in the far background. Like, is that the truck you mean? So I totally get why it's more difficult and more subjective. That's true. And most of these pictures, as we'll talk about, come from Google Street View, which is often taken from maybe not the best camera maybe from a moving vehicle and so yeah it's blurry it's hard to see sure again the best
Starting point is 00:41:52 way to thwart google street view is mooning it mooning the way we defeat the machines. I strongly believe this. This is coming straight off of that previous story of reCAPTCHA, blurry text CAPTCHAs, completing a huge archive of our writing. This finished in 2011. They finished Google Books and more than 150 years of Archive New York Times. And then Google basically had two reasons to develop something new. One is that they figured text-based captchas were getting more and more beatable every day. Eventually, it's just security theater because a bot can do it too. And then the other reason is that they thought of something more lucrative, training machines to understand images that's trickier than text. And so basically right away in 2012, they started developing a form of CAPTCHA with snippets of photos from Google Street View, initially asking people to transcribe door numbers or the words on signs, like still sort of text-based stuff. But by 2014, they developed a version that trains machine learning models to just recognize objects
Starting point is 00:43:05 and understand pictures. And that's what that Select All Squares stuff is. It's a very difficult task for machines. But one of the main forms of machine learning is you just develop a humongous data set and then give it to the machine to train itself. Right. And so we are the data set, us people using Captcha for free, because websites like to use it for free, and we like to use websites for free. I think the point at which we really have to start being concerned is when they start showing us like faces. It's like, which one of these faces looks untrustworthy? Which one of these faces looks like it would be most likely to be a threat
Starting point is 00:43:47 to the robot oligarchy? And I'm in there. I'm like, well, don't click me. Okay, I'm safe. When we were referencing these select all squares ones, we immediately talked about stuff on streets, right? Like street lights and street signs. This CAPTCHA information is extremely lucrative for Google. It's better Google image search results, more accurate Google Maps results. It's helped them make it so Google Photos can search images for you. It's made their whole company more profitable and useful as a service. But maybe the biggest possible application is driverless cars. Oh, I see. That's why so much of it is street stuff. Not only do they have a bunch of Street View pictures from the Google Maps service and the Google Earth service, but also right when they developed this and really got it going in 2014, the next year, Google changed their name and business structure.
Starting point is 00:44:48 They renamed the company Alphabet, made Google one part of it, and then also developed companies like Waymo, which is a driverless car company that's trying to develop that. And so the most lucrative thing about CAPTCHAs might eventually be Google having the best driverless cars. Yeah, I hope I guess I hope it helps because I know that the state of driverless cars right now is, you know, not perfect. I saw a driver. I think it was a Tesla just like disintegrate a mannequin that was set up to test it. So, um, yeah, I mean, it's, it's, I know, I know you mean hitting it with a car, but I did imagine that holding the disintegration ray that Marvin, the Martian uses in some Looney Tunes cartoons. So that's fun. And then it says like eliminating obstruction. Yeah. Yeah. I mean, it's it's interesting. I mean, driverless cars is a unique problem because you have something where it would need to being able to like not only detect its environment,
Starting point is 00:45:56 but being able to kind of understand and predict the behavior of other cars or pedestrians seems really wicked hard because like our brain, when we're driving, we may not realize it, but we're making all of these kinds of like subconscious calculations when we're merging, when we're observing other cars or kind of looking at a pedestrian wondering like, well, are they going to walk? And we make eye contact with them and kind of all these assessments are going on kind of passively that we may not even realize we're doing and so a driverless car has the task of trying to do all of that and I remember when we when driverless cars were first starting to be a thing it's like oh these are going to really quickly outpace humans in terms of being safer and stuff.
Starting point is 00:46:45 And I think it's, I mean, maybe eventually they will be, but it seems like a pretty slow and difficult process. It is. And yeah, this application of CAPTCHA has the potential to be how we solve it. And there are definite downsides to a driverless car future, but there are definite possible upsides too. Like it's not a thing we can call either way yet, but it could be the source of a real leap forward. And it all might've started with us trying to use websites that can't be DDoS by weirdos. That might be how it started, which is great.
Starting point is 00:47:24 I mean, sometimes technology is straight up just like bad, usually weapons-based technology. But when you have a vague technology like driverless cars, I don't think the technology itself is inherently bad. It's just the what we decide, how we decide to use it, how these sort of existing structures and institutions decide to implement it, that could be a problem, right? Like when we when we first sort of heavily shifted towards cars, streets became much less pedestrian friendly. You know, if we did driverless cars, like would how much are we going to wait like the driver's experience versus pedestrians? So it really I just really think it depends on the implementation.
Starting point is 00:48:09 I don't think they're inherently bad things. It's just yeah. I mean, like what what you decide to value with this technology. Absolutely. Yeah. Like I don't like the idea of all the truck drivers are losing all their jobs. But right. That's just a capitalism problem more than a technology problem.
Starting point is 00:48:30 So maybe it's solvable. Or maybe there's a good way of going about it. Didn't we learn from Looney Tunes that the whole point of technology is so that we can have cool, fun lives? Yeah, like Marvin the Martian on Mars. Again, it comes back to Marvin the Martian on Mars. Again, it comes back to Marvin the Martian. Right. But like, you know, having like having technology do jobs that are sort of more dangerous and then the people who used to do those jobs like, hey, maybe they should just still get paid to be alive.
Starting point is 00:49:00 I don't know. I guess I'm getting all political with it. alive. I don't know. I guess I'm getting all political with it. The whole episode is about sentience and human life. So this is where it goes. Yeah. Yeah. Because we have another takeaway here about the development of this because there are many kinds of CAPTCHAs and takeaway number four. The fourth stage of CAPTCHA technology development replaced most of the upfront human tests with a hidden system that constantly spies on you what and what this is that i don't like that alex that uh that i don't like that we've i think we've all seen it that's a long description of the thing where there's just a checkbox labeled with a statement of I'm not a robot.
Starting point is 00:49:49 Yeah. That is a captcha where your actions are being monitored for whether they are robot-like or not. I had always wondered how checking a box indicated that I'm human. human. Yeah. It turns out it's because Google's technology observes how your cursor is moving and your browser history and your cookies and does a lot of investigating you. And the exchange is you don't have to select any squares or solve any letters. I super don't like that. I prefer the squares, even though the squares confuse me sometimes. I don't, I really don't like the, the subterfuge of like, Hey, we're going to, I mean, looking at your mouse movements, fine. Like, uh, that, that doesn't bother me as much, but checking my history and my cookies, my precious cookies,
Starting point is 00:50:36 not a fan. Right now you're taking my treats. Hold on. Uh, right. It's only my business if I'm in a snickerdoodle mood or a choco chip cookie mood. This is basically where CAPTCHA is at now. And this might be the vaguest takeaway because a lot of it is secret for Google proprietary purposes and for keeping it secure purposes. Because as we talked about, every one of these CAPTCHAs is an arms race. People build machines to just beat them later. The hopefully most solid one right now is what's called an invisible CAPTCHA. And Google started developing this in 20... This is getting more and more menacing, Alex.
Starting point is 00:51:17 I'm getting more and more uncomfortable. It's like a Dracula in a mirror, baby. Because Google started developing this in 2014, made it kind of their main approach in 2017. And according to Fast Company, a piece by Katherine Schwab, Google's invisible CAPTCHA system analyzes the way users navigate websites and also looks at their cookies and browser history and then assigns them a risk score in terms of how robot-like their behavior appears to be. All of the details of that are secret for Google, both so they can be the company that runs this and so that it's harder to reverse engineer a way around it. So unlike selecting all squares and the blurry letters, I don't know exactly how it works.
Starting point is 00:52:05 It's just a secret. That sounds like someone who's a Google plant would say to make it seem like he's not a Google plant. Now, now, let's not capture each other. Now, now. This is how they tear us apart by us not knowing who's who's the person and who's the robot yeah i'm not super comfortable with that gonna be honest i feel like it's something that sure like it could start with totally good intentions or bad intentions because we don't know what the intentions are because unless they just procedurally delete the data right like they make a determination whether
Starting point is 00:52:45 you're a human or a robot, like do they, cause we don't know if they keep that data and what, what they do with it. I mean, they, there's data collection of all kinds from not just Google, but like Facebook, you know, your phone, like every, you know, just like your constant, our data is constantly being mined for stuff. And, you know, I don't think it's, I don't think it's great. You know, it feels exploitative in a way that not fun and cool, like us collectively checking out words and digitizing, digitizing libraries of books and articles. Right.
Starting point is 00:53:24 Yeah. The vibe is a 180 from I'm building a library of Alexandria is not the feeling for sure. No. You know, if they use this data, it's going to be for like market research or like figuring out like what ads to serve you. But it's still very uncomfortable for that data to be just being gathered. The other thing about it is it's just going on if you use the internet. It will usually be indicated by the footer of a website you're on, either named Captcha or ReCaptcha provided by Google. There's a little footer note that says we're doing that. And that is their apparently legal liability method of saying you're being monitored for capture purposes. That's interesting, because I would say 99% of people don't read the footer. And of those people, 99% of them don't realize that reCAPTCHA is
Starting point is 00:54:21 collecting your data. Basically, the silver lining is there's too many of us, probably, to permanently keep all our activity on Google's end in storage. But that's basically the only good news about it. Right. And the other good news is, according to Cy Cormier, the reCAPTCHA product lead at Google, quote, it's a better experience for users. Everyone has failed at Captcha, end quote. It's like a lot of the digital things in our lives where we're trading security for convenience. And it's convenient to not do puzzles so much. I don't know. I don't mind doing a little puzzle if it means that like the future robot authoritarian robot state doesn't know my embarrassing Google searches and uses them against me a la 1984. Yeah, it would be nice to just not be monitored this way. And there might be a silver
Starting point is 00:55:17 lining in the final takeaway of this episode here. Is it that we can all google butts and then that is potentially a future weapon against the robot autocracy because then when they're looking through all our google searches they're seeing so many butts they get really disgusted and then they leave the planet skynet builds a machine that likes butts. Like, oh, no. No. Ironically, also played by Arnold Schwarzenegger. You see his butt in at least one of those movies, for sure. Yeah. You see his butt in a lot of those movies.
Starting point is 00:55:57 What are you talking about, Alex? I mean, his butt is like two cannonballs, so. Yeah. Well, what's the silver lining? Yeah, the silver lining is takeaway number five. The two next frontiers of CAPTCHAs are any CAPTCHA that actually works and better CAPTCHAs for people with accessibility needs. The, I think, kind of good news, if folks don't like this invisible CAPTCHA that's monitoring you all the time, the kind of good news is that every CAPTCHA gets beaten,
Starting point is 00:56:32 and so this episode will be out of date in the near future. There will be some new form of testing whether people are people online. And then, good news on top of that, CAPTCHAs have always been kind of bad for people with accessibility needs. And hopefully a next forum gets built by somebody who has heard that message and understands that we need to make it better. Yeah, if we could, like, I'm all for accessibility. Like if we could do a CAPTCHA that's more accessible, but also is not like mining your data. I feel like we could do that, you know? Yeah. And vaguely this invisible version is sort of an improvement for the accessibility. It's not achieving the lack of spying on you, but it's, you know, we're kind of in halting steps, getting better at that direction. Like, yeah, because throughout the episode, we've said, you know, machines are involved in running every CAPTCHA.
Starting point is 00:57:32 CAPTCHAs contribute to machine learning. The ultimate result of all of them is that machines learn how to do them. So even this invisible one, somebody's going to figure out how Google runs that and make a version where their bot appears to be acting like a person online. So the bot's going to be like Googling bots a lot. Yeah, it's true. Just a real interest in Arnold Schwarzenegger, Heine, enter. And then... He would call it a Heine because he's like Austrian.
Starting point is 00:58:05 Right, it's a Germanic root. Hiney. Yeah. And then the other thing here is throughout the history of CAPTCHA, especially the visual ones, they've been really hard on people who don't see well. It's something that really technology companies have not focused on, often because the whole staff is young people. According to informatics professor Elgin Camp of Indiana University, quote, people who are either visually impaired early in life or simply get old don't have great vision. And if it's blurry text or solving pictures, that's not a good capture for you. You'll just fail it for the reason that you're failing it. You know, before anyone was like, well, if you have poor eyesight, why are you using the internet. Like there's tons of accessibility ways to use the internet. Things like either text enlargers or like if you're, if you can't see anything, like text to speech. I saw some, I don't think these are in wide use, but there was like something where it was like some kind of haptic feedback where it was like converting text to braille,
Starting point is 00:59:20 which is really interesting. So yeah, like there's a lot of access, like we should be making the internet accessible to everyone because it's such a ubiquitous thing that everyone needs to be able to get onto. That's right. And so far, CAPTCHAs have lagged behind those other ways we've made the internet accessible. One like somewhat positive story here is that there have been times when CAP Captcha builders think about this. Apparently late in the era of blurry letter Captchas, people started building in extra correct answers that are common responses from people with dyslexia. Because they realized, oh, these are real bad for people with dyslexia.
Starting point is 01:00:02 Yeah. So then they built it so things that people would commonly put in still worked too. And in general, there's an opportunity here, like, because we have to keep redoing CAPTCHA all of the time, one of these stages, maybe we get it right for people with accessibility needs. We have to do the legwork anyway, so let's do it for them. And the main alternative for visually impaired people is an audio captcha, where they play a sound and you have to type or share what you've heard in the sound. But we're not going to play you examples because apparently they are horrible to listen to. It's usually multiple voices plus background interference noise and just awful to try to
Starting point is 01:00:44 solve. Oh, boy. There was a 2009 study at the University of Pittsburgh where they just studied blind people doing audio captures and the subjects got them right 45% of the time. Oh, okay, great. Which is less than a coin flip. And it took them 65 seconds to complete one. For comparison, visual captures take about nine seconds. So we're just kind of tormenting people who don't see that well. And I'm also going to link the radio show Marketplace, who interviewed people about this in 2023 and found that they're having similar horrible experiences of the audio captures. having similar horrible experiences of the audio captures.
Starting point is 01:01:25 Yeah, that's a real like bargain. Just hey, if you're visually impaired, like just listen to the screams of the damned and see if you can pick out a word from it. Have fun. We care about accessibility. Which sin did they do? You have to answer that. Like, I don't know. I don't know.
Starting point is 01:01:41 Which circle of hell are these screams coming from? I think that's the poop tea circle of hell where everyone just gets tea made out of poop. Folks, that's the main episode for this week. Welcome to the outro with fun features for you, a human, such as help remembering this episode with a human run back through the big takeaways. Takeaway number one captures our society's most widespread application of a Turing test. Takeaway number two, the second stage of CAPTCHA technology preserved hundreds of millions of pages of the written word for posterity. Takeaway number three, the third stage of CAPTCHA technology development uses your answers to improve machines' ability to understand images. Takeaway number four, the fourth stage of CAPTCHA development replaced most of the
Starting point is 01:02:52 upfront human testing with the I'm not a robot checkbox that constantly spies on you. And takeaway number five, the next frontiers of CAPTCHAs are any CAPTCHA that works and better CAPTCHAs for people with accessibility needs. Those are the many, many takeaways. A lot of takeaways this week. Also, I said that's the main episode because there is more secretly incredibly fascinating stuff available to you right now if you support this show at MaximumFun.org. Members get a bonus show every week where we explore one obviously incredibly fascinating story related to the main episode. This week's bonus topic is the bleeding edge of potential future captures and one funny way AI beat our current ones. Visit SIFpod.fun for that bonus show, for a library of more than
Starting point is 01:03:47 13 dozen other secretly incredibly fascinating bonus shows, and a catalog of all sorts of Max Fun bonus shows. It's special audio. It's just for members. Thank you for being somebody who backs this podcast operation. Additional fun things, check out our research sources on this episode's page at MaximumFun.org. Key sources this week include the book The Most Human Human by engineer and nonfiction author Brian Christian. Also tons of journalism from Fast Company, National Geographic, TechRadar, The Verge, The Atlantic, and more. That page also features resources such as native-land.ca. I'm using those to acknowledge that I recorded this on the traditional land of the Canarsie and Lenape peoples. Also, Katie taped this in the country of Italy. And I want to acknowledge that in my location, in many other
Starting point is 01:04:37 locations in the Americas and elsewhere, Native people are very much still here. That feels worth doing on each episode, and join the free SIF Discord, where we're sharing stories and resources about Native people and life. There is a link in this episode's description to join that Discord. We're also talking about this episode on the Discord, and hey, would you like a tip on another episode? Because each week I'm finding you something randomly incredibly fascinating by running all the past episodes through a random number generator. This week's pick is episode 77. That's about the topic of bananas. Fun fact, bananas are berries, and they grow from an herb. So I recommend that episode. I also recommend my co-host Katie Golden's weekly podcast,
Starting point is 01:05:23 Creature Feature, about animals and science and more. Our theme music is Unbroken Unshaven by the Budos Band. Our show logo is by artist Burton Durand. Special thanks to Chris Souza for audio mastering on this episode. Extra, extra special thanks go to our members. And thank you to all our listeners. I'm thrilled to say we will be back next week with more secretly incredibly fascinating. So how about that? Talk to you then. Maximum Fun.
Starting point is 01:06:11 A worker-owned network of artist-owned shows supported directly by you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.