StarTalk Radio - Cosmic Queries – Algorithms and Data, with Hannah Fry

Episode Date: June 22, 2020

What is an algorithm? How do you interpret large amounts of data? Neil deGrasse Tyson and comic co-host Chuck Nice answer fan-submitted Cosmic Queries exploring algorithms and big data alongside mathe...matician and author Hannah Fry, PhD. NOTE: StarTalk+ Patrons and All-Access subscribers can watch or listen to this entire episode commercial-free here: https://www.startalkradio.net/show/cosmic-queries-algorithms-and-data-with-hannah-fry/ Thanks to our Patrons Dan McGowan, Sullivan S Paulson, Zerman Whitley, Solomon Nadaf, Eric Justin Morales, Matthew Iskander, and Cody Stanley for supporting us this week. Photo Credit: Storyblocks. Subscribe to SiriusXM Podcasts+ on Apple Podcasts to listen to new episodes ad-free and a whole week early.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to StarTalk, your place in the universe where science and pop culture collide. StarTalk begins right now. This is StarTalk. I'm your host, Neil deGrasse Tyson, your personal astrophysicist. And this is a Cosmic Queries edition. It's becoming a fan favorite, and when I do Cosmic Queries, you know Chuck Nice can't be far behind. Chuck. Hey, what's happening, Neil? Dude, welcome back in the house. Yeah, yeah, well, yeah, virtually in the house. Virtually in the Coronaverse house. That's right. We're all in the same coronavirus house. Right. Today we're doing Cosmic Queries on algorithms and data. Algorithms and data?
Starting point is 00:00:51 You've got to love me some algorithms and data, because nothing happens. I do not. I'm not a big fan. Not a fan of either. I mean, I'm a fan of data. Both the actual information kind and the Android from Star Trek. I'm a fan of data algorithms. Not so much. Oh, yeah, I forgot. We had a whole entity named data, an Android, basically.
Starting point is 00:01:14 So what we have here is we've invited into studio today data. We have an expert on algorithms and data, a mathematician, Hannah Fry, who's dialing in from the UK. Hannah, welcome. Thank you very much. You know, Chuck, you're not the only person who hates the word algorithms. I was at a tech conference and I was just chatting to this guy and I'm like, I think it's a word that makes about 85% of people want to
Starting point is 00:01:46 gouge out their own eyes. He agreed with me. He said, yeah, but it does make the remaining 15% of people mildly aroused. So I know what I'm in. Mildly aroused, that's because Al is only mildly sexy.
Starting point is 00:02:03 Mr. Go-Rhythm. Oh, Al Go-Rhythm, Oh, Al Go-Rhythm. Yeah, there you go. Yeah, exactly. So let me get your full bio here. You're Associate Professor of Mathematics, University College London. And you co-host a BBC radio show, Radio 4, because BBC has very stove-piped channels. And it's The Curious Cases of Rutherford and Fry.
Starting point is 00:02:27 Wow. Now, you have to do some splaining on that one. Author of the book recently released, just last year, Hello World, Being Human in the Age of Algorithms. So you're the person for this Cosmic Queries. I mean, I know a thing or two. I'm not going to big myself up too much, but I've dabbled. Hannah, if I may, can I just make you feel a little more at home digitally
Starting point is 00:02:55 as we cross the great pond? Here we go. Here we go. Oh, BBC News time. Five o'clock GMT. There you go. That was actually quite good. Did you like that? Did you GMT there you go that was actually very good
Starting point is 00:03:05 did you like that? did you like that? yeah that was very good very good he's good he can get a job I'm sure and the Brits show off
Starting point is 00:03:14 that they have the prime meridian so yes exactly universal time Greenwich mean you know in fact I live in Greenwich
Starting point is 00:03:21 the prime meridian is about 100 metres from my house. Oh, do you feel it, though? Do you feel it? What's quite nice is you go for a little walk around. There's like a peninsula. And along the prime meridian, as it goes across the water,
Starting point is 00:03:38 there's just an arrow that says here. Oh, no, I've forgotten the circumference of the earth now. 16,000 kilometers, something like that? No, no, way bigger than that. People once thought it was 16,000 in ancient Greece. So we've got to update you on that. I made a guess, and I made a guess with the wrong person. She's a fan of history, that's all.
Starting point is 00:04:00 It says back to here, which I quite like. We're about 50,000 kilometers, 40 to 50,000 kilometers. Well, I was within a factor of 10. I know it to miles and you're the guys who gave us miles, so it's 25,000 miles is what that is. But I'm just delighted you live near the prime meridian and you can get a little vibe from it when you take a stroll. So tell me about data.
Starting point is 00:04:26 What's the state of data today relative to, like, when any of us were kids or even before there were computers? The word data, of course, predates computers. So what's going on today? Well, in some ways, I mean, in a lot of ways, not that much has changed. I mean, it's still a case of, you know, people collecting statistics about humans, about how we behave, about what we do, using kind of mathematical techniques to analyse it and trying to infer scientifically everything that you can about
Starting point is 00:04:55 our behaviour from it. So, you know, this is like a subject that has a long history going back to the 50s and the 60s. I think that what's really changed is just the volume of data. I mean, you don't need me to tell you just the incredible amounts of data that is collected on us. But I think for me, it's not just the amount of data that's directly collated about us. It's the things that you can infer from that data. The guesses that you can make about people that you wouldn't necessarily realize that you could do. So a really nice example is, what I'm talking about here is, I was talking to a chief data officer for supermarkets, a chain of supermarkets in the UK.
Starting point is 00:05:37 And this supermarket, they have access to everything that everyone buys. So they know what's in your basket. They know your weekly shop. But they also sell home insurance, right? So they can tell who makes higher claims on their home insurance and compare it to what people are buying. And they realized in their data that you can tell that people who claim less on their home insurance are people who tend to cook food at home, which is kind of like, oh, okay. I mean, once you hear it, it's like, well, that sort of makes sense, I guess, right?
Starting point is 00:06:11 If you're a very house-proud person and you're spending ages creating a meal from scratch, then you're not going to let your kids play football in the house, right? It's like kind of the groups connect. But the question is, how do you decide who's a home cook? Anyway, it turns out that there is one item that's like the strongest indicator of all. There's this one item that's in your basket that's the biggest giveaway that you are a home cook more than any other. And it is- Frozen pizza. I'm going with frozen pizza.
Starting point is 00:06:36 I mean, it's half kind of as cooking, I guess. Want to guess? Do you want to guess, Neil? Let me see. I like to cook at home. And see, I cook a certain type of food, but I can't cook without olive oil. How about anchovy paste?
Starting point is 00:06:52 Oh, yeah. I think it is. Or tomato paste. Something that's so base level kitchen preparation. So it's actually a little bit more, I little bit more, I guess, of, it's fresh fennel. It's fresh fennel. Wow. I think it's kind of nice. And I mean, I think that's right. Like you just wouldn't, if you're buying fresh fennel, you must be a home cook. No, you're only cooking at home. That is the only purpose for fresh fennel. I mean, seriously, nobody's saying
Starting point is 00:07:21 like, oh, I have to pick up some fresh fennel. I need to shine my shoes. Exactly. So I started buying fresh fennel to see if my home insurance prices would start going down. But as yet, nothing. Whoa, okay. So that's one thing where the access people have to who and what you are when the data is consumer-based data. So I guess we get that.
Starting point is 00:07:48 And a lot of people fear that or are angered by it. But before we land there more securely, let me just comment that as the volume of data has grown, because computers are obtaining data constantly, hasn't the computing ability to analyze data risen with it so that we're not really feeling the stress of being smothered in data that we might have feared a few decades ago? Yeah, well, as analysts, you may be not feeling the fear of being smothered. Okay, so I first heard from John Allen Paulus. Now, this is now mid-90s, late 90s.
Starting point is 00:08:19 What he said was, the internet is the world's biggest library. The problem is all the books are scattered on the floor. And I said, wow, there's a brilliant analogy. But at the time, there wasn't a Google search engine or any kind of way to organize that information. Within a few years, there were search engines. So the books were no longer on the floor. Who knows where the hell they are now,
Starting point is 00:08:41 but they're not on the floor. Wow. I really like that. I really like that. I really like that. Yeah, it was clever, but it doesn't apply today because we can get through the data. And my people, we have very big telescopes getting a lot of data on the universe. Half of the effort in prepping for that telescope is the data pipeline to handle the data. So are you, can you say that you're awash in data? Are you on top of the situation? Well, okay.
Starting point is 00:09:07 Are we on top of the situation? Okay, so definitely. The amount of grunt that you have now, I mean, you just didn't have access to before. You can handle vast data sets with just incredibly intricate detail about everything from the cosmos to human behavior that you just weren't able to before.
Starting point is 00:09:25 But the thing is that I think the biggest challenge when it comes to data isn't so much about volume, but more about quality. And the thing is, is that anyone who has worked with data knows that cleaning data is like so much the battle. You get like this, this massive, great, big wallop of data. You're like, brilliant. It's going to be so good. I can't wait to like dive in. And then you realize just how long it takes to make it,
Starting point is 00:09:56 to get it into a shape where it does anything that you want it to. Let me think of an example of something that I can give you that will be. I can tell you in my field, we just call it reducing the data. If I showed you raw data from the universe, you'd say, what the hell is that? Where's my pretty Hubble picture? Well, if you knew what the hell happened between the photon hitting the telescope and the photo that ended up in the press, you'd be, you might be shocked. No, you wouldn't be, Hannah, but others might.
Starting point is 00:10:23 So you have an example. I'm trying to think. Sorry, forgive me. There must be one that's really funny and quite stupid. Well, that might come out in the Q&A part. That's the one we want, definitely. Yeah, I know. You definitely want one where something's just really stupid.
Starting point is 00:10:37 There's definitely got to be one. Another quick thing about your bio. Tell me about your BBC4 radio show. Oh, okay. So this is a show that I host with the geneticist Adam Rutherford. And we've been going for about, I think we're recording our 16th series now. Whoa. So the idea is that people send us in questions and we go out and investigate them.
Starting point is 00:11:03 And initially we, because Radio 4 is like, it's kind of the posh channel. Okay. It's very highbrow. It's very, there's no music. It's all, it's where the politicians go and it's where they have like these deep intellectual debates. I mean, they have like, you know,
Starting point is 00:11:20 programs on philosophy on it, right? It's like the very kind of like, the very highbrow brow stuff so initially we wanted our program to be very high brow too and to be very serious like we're very serious scientists but we discovered quite quickly that actually what works better is if you just basically muck around um and as a result of that the questions that have been coming in have been from families with younger kids and they end up being like the best questions. So we had a question that I really liked, which was, what's the tiniest dinosaur? Which is a question that was asked of us by like an eight-year-old. Seems like a really, you know, trivial, silly question that you can just dismiss with a quick Google search. But actually unfolds this whole thing of like, how do you define size? How do
Starting point is 00:12:04 you define dinosaur? You know, this whole kind of like, how do you define size? How do you define dinosaur? You know, this whole kind of world underneath it. So yeah, that's really what it's morphed into is just this, yeah, like wonderful playground where all the annoying questions that kids want to ask their parents, they send them into us instead. So to the benefit, I think, of the listeners. So it kind of undoes some of their poshitude. It does. It does. It's definitely not posh. Let's see if we can lead off with a question here.
Starting point is 00:12:30 Chuck, we got a first question on algorithms and data for one of the world's experts on those subjects. Sure thing. Let's start off. We always start off with a Patreon patron because they give us money. So let's go to tj monroe uh from patreon says um dr tyson and dr fry the two best radio voices in science oh thank you well wait so how is rutherford's voice in your show is rubbish yours is much better oh okay so we gotta do we gotta do our own show then. All right, work that out. He says, can you walk through the process of creating a predictive algorithm for something like the path of a lightning bolt or ocean currents?
Starting point is 00:13:16 Now, one is a lot easier than the other. Yeah, sure. One repeats itself. I'm sorry, go ahead. You say that, Chuck, because you're an expert on this. Well, yes, you know, so Neil, in my spare time. But what I like, that's a great question. What I like about it is there are two things that are highly sort of, you know, oceans can be turbulent.
Starting point is 00:13:41 You have storms and things, but there is a prevailing thing. Right. You don't know where lightning is going to strike, but you know it's going to strike somewhere over there. Yeah. So I love that question. So, Hannah, has that reduced itself to algorithms at this point or not? So I haven't seen an algorithm for predicting lightning strikes, but I'm just thinking through how you could do it.
Starting point is 00:14:00 So certainly there are going to be certain things that go into it, as you say, right? Like there are certain days when you can look out in the sky and be pretty confident that no lightning strikes are going to happen. And other days where you can be fairly confident that they will. So there's certain things that you can measure in the atmosphere, the atmospheric pressure, the humidity, all of those kind of things that you could plug into a system that could help you predict the likelihood of a lightning strike. if you predict the likelihood of a lightning strike. But the exact path of it, I mean, Neil, you probably know more about this than me, but I would say that the exact path of it is going to be very difficult to deduce precisely where it will end up.
Starting point is 00:14:33 It's easy, yeah. Would you call it an algorithm if you are checking the atmospheric pressure and the humidity and the size of the clouds and the moisture? If those are just inputs to something that calculates, does that come under your category of algorithm as well? Yeah, I think so. I think so.
Starting point is 00:14:51 I mean, obviously there are different kinds of algorithm and, you know, the artificial intelligence is one that gets a lot of attention and the algorithms that deal with sort of data on the internet is another. But I think anything that is taking something from the real world, like a recipe, right? Like taking something from the real world, like a recipe, right?
Starting point is 00:15:07 Like taking ingredients from the real world, doing something with it, and then spitting out some kind of answer. I think, for me, that counts as an algorithm. I mean, you know, technically, if you stop and ask someone for directions, if you're in your car and you stop and ask someone for directions and they say, go down there, go that way, that way, that way, I mean, technically, they're giving you an algorithm. Right, right.
Starting point is 00:15:25 So, okay. So, algorithm is a very wide catch basin then for accounting for things that you want to predict or understand. So, with lightning, you'll only discharge a cloud if the buildup in charge is very different, either from one cloud to another cloud or between the cloud and the ground. And so, like you said, Hannah, if you measure the humidity,
Starting point is 00:15:49 you can check to see what is the propensity of electricity to cross humid air versus dry air and look at a threshold for that. And you say, when it hits this threshold, it's going. And how much of algorithms is also thresholding phenomenon oh lots and lots so i think yeah um i think that uh so that's the difference between um sometimes you are trying to predict exactly what's going to happen you know you're going to you're trying to predict let's say uh you know if a ball is rolling down a hill or whatever like exactly where it's
Starting point is 00:16:22 going to end up that kind of thing and sometimes you're just saying, as you said, like, what is the probability that this might happen? And at what point do you set this threshold to say, okay, like for instance, you know, the example that I gave you earlier about the fennel thing, right? It's like, it's not definitely, you know, it's not like you buy fennel, therefore you're not going to claim on your home insurance. It's like you buy fennel, therefore you're likely to be a home cook, therefore you're likely to do this. And then you sort of, all the way through those, you're not going to claim on your home insurance. It's like you buy fennel, therefore you're likely to be a home cook, therefore you're likely to do this. And then you sort of, all the way through those, you're going to be setting thresholds where you say, if it tips over this, then we assume you're a home cook. If it tips over this, we assume you're whatever. Interesting. Okay. So
Starting point is 00:16:59 that's an important fact here because it's not the one piece of data that gives you the, it's not the one piece of data that tells you what everything is. It's the one piece of data that might put you over the, over the edge of that conclusion. Is that a way to think about that? Yeah, I think that's right. I think when you're, when you're dealing with uncertainty, I mean, very, there are very few things, especially when you're handling data to do with human behavior, very few things are cold hard facts, right? Very few things are, you're rarely dealing in absolutes. So when you're handling uncertainty, the only way that you can possibly convert uncertainty into a yes, no answer is by saying, here's the line, if we cross it,
Starting point is 00:17:43 we'll assume it's a yes. Got it. And the ocean currents, those, like you said, Chuck, those have prevailing, they're not catastrophic the way a lightning bolt is. So presumably, that doesn't have this kind of thresholding. No, true. But I mean, ocean currents, there are, you know, much, they're sort of very sophisticated equations that can describe fluid flow, right? So they're still not absolute, especially when you're dealing with turbulence, there's still a lot of probability and randomness and chaos, right?
Starting point is 00:18:15 That's involved in all of that. But you can say with more, it's not a thresholding problem. You're, as you say, right? You're like, you can say with more certainty where things are going to be and how they're going to be moving. I was going to say, there's also connection
Starting point is 00:18:28 when you're dealing with ocean currents as opposed to lightning bolts. Ocean currents are all connected because it's not one ocean. It's an entire oceanic system that happens on the globe, whereas lightning bolts are isolated incidents. So that's that, there's that too. That's very true. Although if you ask a mathematician, a mathematician who studied the ocean,
Starting point is 00:18:52 I mean, they assume it's two-dimensional. So, I think it's a bit lost then. Yeah, Chuck, you start subtracting dimensions to make the problem easier to solve. Whether or not your answer is correct at the end. Elegant there, right? Elegant. Yeah, we'll get back to that. Let's take our first break.
Starting point is 00:19:11 And when we come back, more Cosmic Queries with data mathematician Hannah Fry when we return. We're back. StarTalk Cosmic Queries. Algorithms and data. Yeah, I said it. Algorithms and data. I'm so sorry when you say it. Chuck Nice, co-host.
Starting point is 00:19:58 Tweeting at Chuck Nice Comic. Thank you, sir. And we have as our special guest, Professor of Mathematics at University College London, Hannah Fry. I'm an associate professor. You just gave me a promotion there. Thank you.
Starting point is 00:20:10 I will do that. You take that to the bank and to your department chair. Hannah, how would you like to be a chancellor? I want to be sir. That's what I want. Oh, there you go. So Chuck, you got another question about data? Sure, sure, sure.
Starting point is 00:20:28 Let's go back to Patreon and let's go to Shawshank Submarinian who says, Neil, hello, and Hannah, hello. I would like to know how impactful with solving the P versus non-P problem with respect to our capabilities of understanding the universe. Excuse me, the P versus non-P problem?
Starting point is 00:20:51 Are we all fluent in P? Yes, exactly. If you have consumed copious amounts of liquid, the P versus non-P problem becomes quite the conundrum. What is the proximity to your water closet? So, Hannah, what is P versus non-P? And is that a real outstanding problem? Yeah, yeah, totally.
Starting point is 00:21:18 So this is one of the millennium math problems. So if you solve this, Shawshank, what was his name? Shawshank? Shawshank Submarinian. Okay. He's showing off, by the way, with this question. He's showing off.
Starting point is 00:21:30 If you solve this problem, you win a million dollars. So, I mean, it's kind of maybe the change to the university bigger, but definitely a big change to your life. Okay. So let's say that I gave you a gigantic Sudoku puzzle
Starting point is 00:21:43 and asked you to solve it. I mean, like a really massive one, not just like nine square, a massive, massive one and asked you to solve it. It would take you forever to solve it. But if I said, here's a solved one, I want you to check if it's right. Actually, that's a much easier problem, right?
Starting point is 00:22:00 Even though they could be the same Sudoku puzzle, filling it in in the first place is much, much, much harder than just checking that it's right. So there are some... Wait, wait, wait. You say a solved puzzle. Yeah.
Starting point is 00:22:12 That would imply it's right. So you mean a filled-in puzzle. Okay. You're right. My language is sloppy. I take it back. A filled-in puzzle. Okay.
Starting point is 00:22:21 So sometimes where you've got like a blank Sudoku grid, effectively, you know, or the analogy in computers, if it's very easy to check that the answer is right, sometimes you can use that as a loophole to get you to the answer very quickly, right? Because you can just generate answers and check if they're right, rather than kind of go through and grunt through the entire process. So the question is, is that always the case? If you can check that an answer is right quickly, much quicker than you can to solve the problem in the first place, is that always the case? Can you solve something, do really hard problems that you've got to grunt through, always have quick solutions or not?
Starting point is 00:23:01 And the reason why this has repercussions and the reason why this has potentialcussions and the reason why this has, you know, potential impact on our understanding of the universe is that an awful lot of the algorithms that we use to try and understand gigantic systems, you know, I'm sure that this is certainly true in a lot of cosmology, a lot of them have to use very clever workarounds to account for the fact that some problems are just really hard. Some problems you kind of have to grunt through to find the answer. So there are a whole host of different algorithms that exist to try and make that grunting process easier. But if it were the case that actually all these difficult problems do have an easy, quick solution, I mean, that would be,
Starting point is 00:23:41 you know, if you could suddenly reduce the amount of computational time that you spend on a problem, I mean, that would have a dramatic, dramatic effect on the number of things that you could compute. So I want to, Chuck, I'm going to show off in front of Hannah, so give me a moment here. Go do your thing. So did the four-color map problem,
Starting point is 00:24:00 I was around and in college when that was solved and it was considered inelegant because someone put an algorithm and just grunted through it and but they solved it and no one else had solved it before so in principle then that implies that it was easier to solve it that way than by any analytic way is that a fair analogy here or not uh is it yeah I mean that one people got very upset about that, didn't they? Yes, I remember that. Yeah, because normally when you do a proof,
Starting point is 00:24:30 you write it down and it's these elegant statements of logic. It fits on the back of an envelope. Yeah, right, exactly, yeah. So at the risk of not being a part of the four-color map, the four-color map parade. Good point. The four-color map love fest. What the hell is the four-color map problem? Okay.
Starting point is 00:24:54 You know when you get like a map of the states, you know, the United States, and you've got like all of the different states, whatever, and you want to color it in? Yeah. The question is, can you color it in with four colors so that no two states next to each other share the same color, right?
Starting point is 00:25:10 Turns out that you can. The question is, is there, so the four color problem is, let me get this right, actually. You might remember more than me, but does any map exist for which you cannot color in in four colors? Yeah, that's the way I'm thinking.
Starting point is 00:25:23 What is the minimum number of colors that you need to color any map? I think we knew that the four colors was the right answer. We just didn't have a proof of it. And so it was intractable until somebody, again, I went to college so long ago, computers were new to the world of math. Punch cards.
Starting point is 00:25:43 Almost certainly. I mean, it was the 70s, right? It was the 70s. Yes, it was. Back when I was a college. And like a steam-powered handle. Yeah. So, and it was proven,
Starting point is 00:25:59 but only through this, by checking the answer, not by proving the answer. That feels like what you just described. Right. And so I'll give you an example from my field. Hannah, I think this is an example. We have people looking for galaxies that are very, very low in their surface brightness. Like you would scan by it and you wouldn't even know it was there.
Starting point is 00:26:26 Well, how do you look for them if they don't reveal themselves? Well, the ones we have found, we know what their light profile looks like. And so what we can do is set up a filter that goes out and tries to match the light in the sky to that filter. And when you get a slight increase in a match, there's a galaxy.
Starting point is 00:26:50 Now you can put all your resources there and say, yep, there's a galaxy there. So you're looking for the answer to ask the question. Yes, yes, yes. Yeah. That's it. So Hannah, is this legit? I mean, these are great examples of grunting through the solution. And the question is, is this legit? I mean, are we allowed to do this? Yeah, I mean, these are great examples of, like, grunting through the solution. And the question is, is that always the easiest way?
Starting point is 00:27:09 Or is there actually a trick? Is there some clever trick that you could have used? You know, I don't know, like folding your data in half and looking for, like, this superimposed, like, light. You know, maybe there's some clever trick somewhere that just no one's spotted yet. You know, there's so often cases where people come up with these clever tricks. And maybe there was a clever trick that the whole thing could have been solved much quicker
Starting point is 00:27:29 without having to grunt it through. Okay, so if this problem gets solved, then it'll give us confidence for all future problems to say, don't even worry about figuring out the answer analytically. Let's just compute the answer and then check it. Yeah. I mean, if that's easier, then for me, that's less romantic. That's less elegant.
Starting point is 00:27:53 It is. If it works, it works. But then at the same time, you have to think of the potential repercussions of this. Like, you know, there are some problems. Like if you take protein folding as an example, right? So proteins are just the source of so much, you know, life. I mean, essentially they're like the fundamental building blocks of life.
Starting point is 00:28:09 Building blocks. Everything that could happen to the human body from, you know, Alzheimer's to the drugs that you, like the effect of drugs that you take. I mean, everything, it all comes down to proteins. And proteins, they're like these long ribbons of amino acids. And the way that they fold up determines their function. These things are incredibly, incredibly, incredibly complicated. And it's okay to go from the folded up bundle of ribbon to the long string, it's possible. But going from the long string to work out what folded up knotted shape it makes is really, really,
Starting point is 00:28:45 really, really super, super, super hard. We understand all of the physics of it. We have equations that could work it out, but you just cannot grunt through it all. And if you, let's say that you could solve this problem. Let's say that you could have a computer that could grunt through all the possibilities. What that would mean is you could say, I want a protein that serves this function. I want a protein that can combat this disease. I want a protein that acts on the body in this way. What shape is it? It's like this shape. Okay, now what is the string of amino acids I need to print to create that protein? I mean, I'm talking like long, long, long, long, long term here. These are not like things that are around the corner. But I mean, however, I think there are some applications of this stuff
Starting point is 00:29:27 that means that actually romance, you know, romance is dead. Like who cares about romance when you've got protein folding? So Chuck is listening attentively here because he wants to know the formula for a funny joke. I think it's going to be hard. I really don't. Why would I change anything now? How do you fold your words together to guarantee there's a funny joke on the other side?
Starting point is 00:29:53 I'm sure there's an algorithm for that, you know. And believe me, I would love that. It's funny because it sounds like what you're talking about is quantum computing. In part, yeah. In part. Well, the extreme level of it. Yeah, if you could get the computing power. And then all problems will just be solved depending on whoever spends time looking at it. That would, like, I have to reiterate, Hannah,
Starting point is 00:30:16 that would take away the romance of the quest for me a little bit, I think. A little bit. Well, you have to sigh. Did you hear that sigh? Yeah, that was amazing. Wait, so do you think the four-color problem was unromantic? Yes. Yes. Yeah. Yeah. I mean,
Starting point is 00:30:35 because, you know, we have E equals MC squared, that fits in, you know, children write down that equation. It's one of the most profound equations in the universe. How many forces are there in the universe? There's not thousands, there's four, all right? In the early universe, there was fewer.
Starting point is 00:30:52 You want, I have a bias, a philosophical bias, that when I part the curtains, I want to find simplicity rather than complexity. Do you think that simplicity always exists, though? Or do you think, actually, sometimes, if you're a physicist, you can get, like the sort of i don't know it's like it's like a potion of simplicity that maybe yeah so here's a here's my lesson that i have to tell myself to get out of this sort of state of romance uh johannes kepler when he first showed the planets going around the sun,
Starting point is 00:31:27 and he was trying to figure out what kind of orbits did they have and about their distances. And he had a system where he's a mathematician, and you know they're the five platonic solids, right? Do you know about this, Jack? No. Five solids? No.
Starting point is 00:31:42 It's a singing group from the 60s. That's what I was looking for. That's exactly what I was looking for. So, Hannah, you want to tell them the five solids? Okay. So, Plato was like super into this idea of everything being perfect.
Starting point is 00:32:01 So, the platonic solids are the five shapes that can be created where every side is the same. So a cube is a platonic solid. Every side is a square. Tetrahedron, octahedron, dodecahedron. Dodecahedron, go ahead. And what's the last one? Icosahedron, I guess.
Starting point is 00:32:18 Well, yeah, icosahedron. And I think you left out the pyramid, which is what, the four-sided pyramid. Tetrahedron. No, she said it. Oh, you said it? Okay. Yeah, she said tetrahedron. Can we get five? So tetrahedron, octahedron, cube,
Starting point is 00:32:36 icosahedron, and dodecahedron. Right. So each of those have the same shape polygon on all sides, but there are only five of them. So Kepler knew this, and he also knew that there were six planets. And he said, well, everything is perfect and divine and math is perfect.
Starting point is 00:32:54 Maybe the planets are the separations, occupy orbits in the separations between nested platonic solids. Ooh, if only. So he took them and nested them and put planet orbits, and he actually got pretty close. It was like, but this was his ideal. This was his sense of perfection that he was imposing on nature.
Starting point is 00:33:17 And it was all bullshit. That happens quite often, though. I think it does happen where people fall in love with the simplicity of their theory and forget that actually often the world's really ugly. Yes. So I use the Kepler example. To his credit, it took him 15 years,
Starting point is 00:33:35 but to his credit, 10 years, but he discarded the entire system and out came elliptical orbits. Which are beautiful. In their own way. In their own way. Not as beautiful as perfect circles as Copernicus had presumed. So anyway, let's go to the next question. Check. All right. Let's go to John Baker from Patreon. He says, hi guys. I'm back to prove my ignorance yet again. What kind of empirical data is used? Well, first, let me ask a core question. What does it mean to use an algorithm? Not that I couldn't look it up on
Starting point is 00:34:14 wiki, or I could have just paid attention in school. I love John Baker. This guy's amazing. I love John Baker. This guy's amazing. Anyway, and you know what? I should have let off with this question, but because the truth is what we never have touched upon in this show yet, what is an algorithm? Yeah, what is it?
Starting point is 00:34:36 Hannah, algorithm 101, give it to me. Okay, 101. Algorithm is this gigantic umbrella term that doesn't really mean very much, which I think is the reason why people hate the word so much. But essentially, all it is, is a series of logical steps that take you from some input to some kind of output, right? So a recipe, a cake recipe, that counts. That's an algorithm. Your inputs are your ingredients. The logical steps is the recipe itself that outputs
Starting point is 00:35:03 the cake that you get at the end. The difference though is that when people... Wait, wait, wait. A cake recipe is a flowchart, I would think. I don't think a flowchart is an algorithm. It could be. I mean, it's just like a giant algorithm. The word algorithm is like this giant all-encompassing term. But I think when people use it,
Starting point is 00:35:24 they tend to mean something within a computer. So something where you are inputting some data and then the machine has some kind of autonomy in terms of the decisions that it makes along the way and spits out an answer at the end. Of course, so computer programs, then everything you do in a computer program is an algorithm. Is that fair to think of it that way?
Starting point is 00:35:44 I think that's fair, yeah. Although that fair to think of it that way? I think that's fair. Yeah. Although, I mean, I think that when people use the word, you know, if you go like, did you learn? So I had a ZX Spectrum, which I think is quite a British thing. It's like a Commodore 64. That was the thing that I learned to code on.
Starting point is 00:36:00 Well, that's the American one, right? Yes, we did. The Commodore 64 was the American version. And that's because Hannah is 80 years old. I was going to say, I want to unpin my video now so I can get a closer look at Hannah because right now
Starting point is 00:36:14 she's in a little window on my screen. In her early 30s, late 20s. And then she's talking about Commodore 64 and I'm like, is it her or is it my
Starting point is 00:36:26 eyes? Okay, grandma. Talk to us, grandma. I learned coding when I was two. She's just a child genius. Exactly. There you go. So all my ZX Spectrum is kind of the British equivalent. You would do like print,
Starting point is 00:36:41 hello, go to line 10 and then it would just go around and around and around and just be like the whole, fill the whole screen with hello, go to line 10. And then it just go back around, around, around, and just be like the whole, fill the whole screen with hello, right? That was the kind of programming that everyone did when they were like kids. And technically that's an algorithm. It's just a really rubbish one. It's like, you know, not particularly, not really doing anything. So I think when people use the word,
Starting point is 00:37:00 they tend to mean that it's like some kind of automated decision-making. That's sort of really what they mean. But I think, you know, if we're being absolutely fair, the word algorithm encompasses all of this. Now, Chuck, the person had more to that question, but we just ran out of time in this segment. So when we come back, we'll pick up more on the angst shared by this questioner who wondered whether he should have learned all this in school in the first place. This is StarTalk Cosmic Queries. We'll be right back. Hey, we'd like to give a Patreon shout out to the following Patreon patrons,
Starting point is 00:37:44 Dan McGowan and Sullivan S. Paulson. Thank you so much for your support. You know, without you, we just couldn't do this show. And if any of you out there listening would like your very own Patreon shout-out, please go to patreon.com slash startalkradio and support us. StarTalk. We're back. Cosmic Queries.
Starting point is 00:38:17 Algorithms and data edition. You never thought we'd go there, but we did. So there. Okay. I got Chuck, of course, and Professor Hannah Fry, Associate Professor of Mathematics at University College London. You shared with us earlier in the session that you live in Greenwich, and we've all heard of Greenwich, even if you've never been there, Greenwich time.
Starting point is 00:38:38 That's like the time, the base time of the world, right? You get kind of cocky about that? You know, we swagger around here. We swagger around. It actually took me to move to Greenwich. I've only lived here for three years or so. But it took me to move to Greenwich to realize that Greenwich Mean Time is,
Starting point is 00:39:00 the word mean in it actually means average mean across an entire year. I didn't know that. Oh yeah, entirely. Of course 24 hours a day is 24 hours. No, it's not. It's 24 hours on average. Yeah, it's exactly right. Yeah, the time it takes the sun to return to its spot on the sky on average is 24 hours. Sometimes it takes longer, sometimes it it tastes less people don't know that yeah yeah so yeah i was happily drinking my greenwich meantime lager and wandering around greenwich meantime village and i didn't realize that mean meant average mean yeah it has nothing to do with the emotional state of your time okay so chuck we left off someone upset that um he
Starting point is 00:39:44 didn't learn the meaning of algorithm in school, but I think there was a question based down there. He said, what does it mean to use an algorithm? Okay. And then Hannah said, well, a recipe is an algorithm. But I like the distinction Hannah is making as we go forward in the 21st century, that we think of algorithms as an automated procedure rather than something that... Yeah, you do. Something that makes a decision.
Starting point is 00:40:07 And then I think that there's a further distinction there as well between an algorithm and artificial intelligence. And I think that the way I like to think of this is, let's say you've got a smart light bulb that's kind of connected to the internet and you decide to program it so that it turns on at 6 o'clock and goes off at 11 o'clock. So that's an algorithm, right? You programmed it, you said, if it's six o'clock, turn off,
Starting point is 00:40:29 right? Or turn on, whatever. That's just a straightforward algorithm. If it was artificial intelligence, generally speaking, most people agree that artificial intelligence needs to include some aspect of learning. So instead, the light bulb would recognize that you came home at six o'clock and turned the light on. It would recognize that you like to dim the switch at 9 p.m. when you do some reading, and then that you go to bed at 11 o'clock. So if it's starting to learn from its environment
Starting point is 00:40:56 and then impose those rules itself, that counts as artificial intelligence. Right. But that's simply an updatable algorithm. Yeah, yeah, yeah, exactly. It's something that's continually revising itself. Okay. By the way, you... Go on, Chuck. I was going to say, in addition to that, though, it is also, more importantly, pattern recognition. So the update is based on the recognition of patterns.
Starting point is 00:41:20 Totally agree. Yeah, completely. Which was really hard until very recently. Now, Hannah, you scared me a little when you began that comment because you said, imagine a smart light bulb. And I thought, aren't smart robots enough? What would a smart light bulb be? Light bulbs marching down the street. It just pops above your head every time you have an idea. I love the smart light bulb. It's like, humans must die. In order for us to shine, humans must die.
Starting point is 00:41:54 Exactly. And in fact, they would dig up the joke from, there's a comedian whose Twitter handle is The Science Comedian. And I quote him every now and then. One of my favorite jokes of his or comments was, the light bulb was such a good idea, it became a symbol for a good idea. That's funny.
Starting point is 00:42:16 That's true. So if light bulbs become our overlords, they will remind us that anytime we think something brilliant, it's one of them that gets popped up. Exactly. Right. Don't think we don't know what you're thinking. Well, we only know what you're thinking if it's a good thing.
Starting point is 00:42:32 That's a good thing. If it's a good thought, we know it. Okay? All right. That's funny. So the science meat is Brian Mallow, if anyone wanted to dig him up. Nice. Okay.
Starting point is 00:42:44 So you got another question there, Chuck. Short thing. This is Ben Sellers, and Ben wants to know this. From an evolution standpoint, our relationships and mating behaviors probably follow patterns useful for hunter-gatherers. How do our behaviors on social media and dating websites resemble patterns from more primitive days? What would make interacting online more
Starting point is 00:43:05 connected to our primitive programming? Now, I don't know if this is your purview, Hannah, but he's making a really, you know, pretty poignant association, which is we now find people online. That's how we find love now. I mean, and the number is only going up every year. how we find love now. I mean, and the number is only going up every year. Do the hunter-gatherer, you know, brain sets actually apply to the way that we go after one another digitally? I love this.
Starting point is 00:43:34 It's like I'm just foraging, foraging for lovers. Oh, yeah. Hunter-gatherers. By the way, it is kind of foraging. Swipe right, swipe right. Yes. You're hunting. Yeah. hunter-gatherer right and by the way it is kind of foraging swipe right swipe right so actually one of the very first things i did as soon as i finished my phd was i did this really silly talk that was like actually supposed to be this kind of private joke that just got really out of hand which was called the maths of love right which was in part looking at data for from
Starting point is 00:44:03 online dating websites and like it was kind of this thing where I just wanted to demonstrate that you can take a mathematical view to everything. Anyway, it got terribly out of hand and ended up being a TED Talk. But in that, there was something that was really interesting that I think is relevant to this, which is... Wait, wait, just a quick thing, Anna. Most people who have thoughts that get out of hand don't end up giving TED Talks.
Starting point is 00:44:24 Anna, most people who have thoughts that get out of hand don't end up giving TED Talks. So it requires some level of brilliance. I just want to distinguish you from everybody else that would be encountered. So your TED Talks. So go on. Well, for a number of years in Britain, people started calling me Dr. Love. And I was like, it was just a joke, guys. It's never been serious.
Starting point is 00:44:49 I'm really not Dr. Love. Anyway. Okay, so in it, one of the things that I talked about was about who gets most attention, right? Whose photos get most attention on dating websites? And you would think, okay, surely it's going to be the people who everyone considers as best looking, right? I mean, surely mean surely right the most attractive people get the most attention surely but so okay
Starting point is 00:45:09 cupid kind of an interesting dating website because for a while there they were totally open about the fact that they were experimenting on their customers and like released all their data and also on their website what you're allowed to do is rate how attractive you thought other people were on a scale between one and five right so five is very beautiful one I think slightly more facially challenged is the technical term and what is um what they found was that it's not true that just the people who get fives get the most attention it was the people who divided opinion the most so the people who were getting the most attention were like averaging out as kind of a
Starting point is 00:45:45 four, but they weren't people where everyone was giving them a four. They were people where some people would give them a five and lots of people would give them a one. Some people thought they were absolutely horrific and some people thought they were really beautiful. And the explanation for this, which I quite like, is kind of like an instinctive one. So I guess this sort of goes into the hunter-gatherer thing in a way, which is that it's like a game theory thing, right? If you come across someone and you think that they're very beautiful, then you imagine they're getting lots of attention and you think, well, why would I, there's no point in throwing my hat in that ring. I may as well stand back. Whereas if you come across someone who you think is very beautiful,
Starting point is 00:46:24 but you imagine other people will really dislike. So someone who you think is very beautiful, but you imagine other people will really dislike. So someone who's like a bit unusual in some way, then you're like, great, this gorgeous person isn't going to be getting that much attention. And you kind of like throw yourself in. But just because everyone's doing that, that means it's the really beautiful people who are not getting any attention and the kind of quirky ones are getting lots. So I think it's kind of interesting. Wow. So what you're saying,
Starting point is 00:46:54 apart from the sociocultural lessons from that, is algorithms that might apply to human behavior that data that we're now collecting on billions of people provide might have some strong evolutionary guidance for us going forward. Yeah, I mean, I think in terms of this, it's always tough to link it back to evolution, isn't it? But I do think that you can certainly come up with these game theoretic arguments for the patterns in our behaviour.
Starting point is 00:47:19 I mean, you know, they're not exactly falsifiable, but I think they're fun to explore. But it means you have an algorithm that applies, that's established in one environment. Like Chuck was saying, you're on the plains of the Serengeti, and there's a certain algorithm for our behavior that we can't shake because it's genetically encoded within us, perhaps. Yeah, I mean, perhaps.
Starting point is 00:47:42 Yeah, I mean, I guess I'm sure that there's like anthropologists who will know this much better than me. But yeah, I don't doubt it. I'm sure that there are lots of occasions where we act really instinctively, where our ancient history causes us to act in a certain way. And I'm sure that we do it still,
Starting point is 00:47:59 even when we are interacting with people on a completely different platform to the one that we were designed for. So Chuck, we've got time for maybe one or two more questions. Wow. All right. God, we're going to have to do another show, man. I have 11 pages of questions.
Starting point is 00:48:15 Okay, Hannah, you've got to come back. I'll talk really fast. That is how many people are really interested in this subject. It's unbelievable. I have 11 pages. Beautiful. Data and algorithms, sir. So I know we're wrapping it up, so I'm trying to find one. Okay, here's unbelievable. I have 11. Beautiful. Thanks for an algorithm, sir. So I know we're wrapping it up.
Starting point is 00:48:26 So I'm trying to find one. Okay, here's one. Okay. This is Dean from Twitter. And Dean says, Wait, Chuck, you're just choosing people whose names you can pronounce.
Starting point is 00:48:37 This whole episode have been pronounceable names. All right. Let me go with them instead. I'll go with Tielo Jung manas that ain't that's better i'm sure tielo now dean thought he had his moment and robs from him at the last minute well sorry dean we'll get we're gonna do this show again so we'll get back to you, buddy. All right. So this is Tielo Jangman. Jangmans?
Starting point is 00:49:07 Jangmans. Okay, from Twitter. Okay. He says, I'm sorry. He says, assume that the game on behavioral data is already over. The next level is biological data, and the one after that is thought police. Okay. What are your expectations and what do you think the outcomes will be?
Starting point is 00:49:33 So what he's really talking about there is predictive analytics. Will we ever get predictive analytics to a point where you have pre-crime? It's like, you didn't commit a crime, but you know what? We know you're going to commit this crime because these algorithms have actually profiled you
Starting point is 00:49:51 in such a way that tells us that you are a criminal. Hannah, you said that about fennel, okay? Oh my God, you did. If you know that much about who a person is if they buy fennel i i love what checks chuck said there can can you get are the algorithms so good that they know your thoughts and then they know your next behavior and then you pre-arrest someone okay so this is such a this is such a tough topic right like i spent a huge chunk of my book talking about this you know predictive policing and
Starting point is 00:50:22 predictive algorithms in the name of your book again, just so we get that. It's called Hello World. Life in the Age of... I can't remember what they gave us. I can't remember what they did with the subtitle in America. I think it's How to Be Human in the Age of the Algorithm, maybe. Yeah, that's right. I thought you were going to say, I can't remember what the
Starting point is 00:50:40 name of my book is. But the subtitle is different in America. They do that sometimes. Yeah. Okay. They have to translate between two English speaking countries. They have to translate. Okay. So, right. This is like a really tough topic because the thing is, is that some people have definitely tried to do this. There have definitely been some situations in which people have tried to predict. So there was one particular example in Chicago, I think it was, where the idea was quite straightforward, right?
Starting point is 00:51:09 Which was like, okay, well, when it comes to gun crime, often today's victims are tomorrow's perpetrators. So if you analyze the network of people who people are friends with, who people hang out with, that kind of stuff, if you analyze that network and feed in where events are happening, can you come up with like a risk score, if you like? This is kind of like the threshold thing that you were talking about earlier with Lightning. Can you come up with a risk score that says, we think that this group of people or this group of people are likely to be involved in something in the near future?
Starting point is 00:51:49 And when this whole system was set up, it was set up in kind of a nice way, or like, I think it was set up with good intentions, because the idea was that if your name appeared on this list, then police and social workers would come around to your house and they would, so it would be like- Intervention. An intervention, right? But it's like, you know, here are these programs that you can join. Here are these alternatives that you can, you know, we want to help you out of the life that you're in, right? That was kind of like the intention. Of course, of course, it didn't work out that way. Because if you give that list to people who have got a completely different set of priorities, as soon as there was, you know, a gun homicide,
Starting point is 00:52:21 it turned out that people took this list and started at the top of the list and then just started arresting all the way down. So by the end, Rand Corporation did this analysis of the whole project. And by the end, the people who were on the list were, I can't remember the numbers, but basically way more times, way more likely to have been arrested by the police, regardless of whether they were involved in the original crime.
Starting point is 00:52:42 So essentially it turned into a list, a harassment list, right? And I think this is the thing. I think that in like a lab setting, in kind of a cold environment of like an ivory tower, actually, I think there are certain things that you can say about like likelihood, you know, like there are some people that you can pick out
Starting point is 00:53:02 who you know in a million years, they're never going to commit a gun homicide. And then there are other people who, you know, perhaps aren't quite in the same boat. And I think there are some things that you can say about humans. But the problem is, is that the world isn't this ivory tower. You can't like create a system that gives you that information because it doesn't then tell you what you're supposed to do with it. It doesn't tell you how you're supposed to interact with people. It doesn't tell you what you're supposed to do with it. And I think that I haven't yet heard of a really positive story where people have tried to do something like that and it's gone really well
Starting point is 00:53:33 because I just think it's just, I just think like that kind of, yeah, the real world is messy. The real world is messy. Right. So, Hannah, I think just AI will figure out what to do with the data. It definitely won't. AI will know to become our overlords and subjugate us because we can't take care of ourselves. Definitely won't. I mean, also, right, that was a point, like, a couple of years ago where everyone,
Starting point is 00:54:02 I mean, genuinely, newspaper articles had that attitude of, like, just feed it into the AI and it will be able to predict everything. This, so this book, actually, this one here, this guy here, Matt Salkinick, he did this amazing project where he had, I can't remember exactly how many, but thousands of kids, right? Thousands of kids. And he had data on them from when they were, when they were born, when they were five years old, 10 years old, all the way up into their teens. And he had everything on them, right? He had, you know, what their parents did. He had interviews with them, like unimaginable amount, big, big, big data, right?
Starting point is 00:54:31 On these kids. And what he did very cleverly is he released the data, you know, to the public, anonymized and so on in stages. And he held back the last stage of when they were, I think, 18. And he asked people all around the world, he said, here's everything you know about these kids from 0 to 15. I want you to predict how many of them ended up in trouble, how many of them went on to further education, all of those different
Starting point is 00:54:55 kind of things. And everyone around the world with their very clever AI and their very clever, all of this, that and the other, tried to do it. Would you like to know what came out on top? Linear regression. Oh, the basic. The most basic, basic, basic. What's gone before will happen again. You fit a line to the trend in the data. Yeah.
Starting point is 00:55:19 And there you have it. Yeah. Chuck, in case you didn't know, that linear regression is the, that's fitting a line through the data. They have to put more syllables to that. Right, yeah. Linear regression.
Starting point is 00:55:30 Oh, yes, you draw a line. Straight line. I think it was. I think I got that right, guys. By the way, I would probably fact check that slightly because it may have been legit, whatever. I may have got a tiny couple of those facts wrong, but whatever.
Starting point is 00:55:40 Yeah, we don't care. The sense of it is fine. It's a great story. I'm sticking with that. So we got to bring this to a close, but we have to have Hannah back on. Oh, we don't care. The sense of it is fine. It's a great story. I'm sticking with that. So we got to bring this to a close, but we have to have Hannah back on. Oh my gosh, we've only just scratched the surface, especially with 11 pages. Hannah, clearly you've triggered interest in our fan base, and they're going to want more of you as we go forward. But I think, correct me if I'm wrong, Hannah, that one of the great lessons of this is really maybe everyone should have paid attention in their math class because math will be the foundational forces that define our social cultural existence in this world.
Starting point is 00:56:20 Did I overstate that, Hannah, or not? Yeah, no, I think that's definitely true. I always think like if, I think that it's really hard to realize how important this stuff is because it's invisible. And I kind of think, you know, with drones, like drones came along and then all of a sudden
Starting point is 00:56:35 there were loads of drones everywhere and everyone got really upset. Now you've got to have every license possible to fly a drone. I always think that like, if you could see algorithms in the same way that you could see algorithms in the same way that you could see drones,
Starting point is 00:56:46 I think people would be a lot more, you know, willing to, well, on it really. I think that they'd want to educate themselves a lot more about it. So yeah. Or they'd just be really annoyed
Starting point is 00:56:56 by algorithms like they are drones. So, okay guys, we got to wrap this up. Hannah, thank you very much for sharing your wisdom your insights
Starting point is 00:57:07 and some of it pleasing some of it scary part of what we need going forward Chuck always good to have you
Starting point is 00:57:15 always a pleasure alright this has been a Cosmic Queries edition on data and algorithms we gotta call it quits there I'm Neil deGrasse Tyson your personal astrophysicist
Starting point is 00:57:24 bidding you as always to keep looking up.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.