StarTalk Radio - Cosmic Queries – Algorithms and Data, with Hannah Fry
Episode Date: June 22, 2020What is an algorithm? How do you interpret large amounts of data? Neil deGrasse Tyson and comic co-host Chuck Nice answer fan-submitted Cosmic Queries exploring algorithms and big data alongside mathe...matician and author Hannah Fry, PhD. NOTE: StarTalk+ Patrons and All-Access subscribers can watch or listen to this entire episode commercial-free here: https://www.startalkradio.net/show/cosmic-queries-algorithms-and-data-with-hannah-fry/ Thanks to our Patrons Dan McGowan, Sullivan S Paulson, Zerman Whitley, Solomon Nadaf, Eric Justin Morales, Matthew Iskander, and Cody Stanley for supporting us this week. Photo Credit: Storyblocks. Subscribe to SiriusXM Podcasts+ on Apple Podcasts to listen to new episodes ad-free and a whole week early.
Transcript
Discussion (0)
Welcome to StarTalk, your place in the universe where science and pop culture collide.
StarTalk begins right now.
This is StarTalk. I'm your host, Neil deGrasse Tyson, your personal astrophysicist.
And this is a Cosmic Queries edition. It's becoming a fan favorite, and when I do
Cosmic Queries, you know Chuck Nice can't be far behind. Chuck. Hey, what's happening, Neil? Dude,
welcome back in the house. Yeah, yeah, well, yeah, virtually in the house. Virtually in the
Coronaverse house. That's right. We're all in the same coronavirus house. Right. Today we're doing Cosmic Queries on algorithms and data.
Algorithms and data?
You've got to love me some algorithms and data, because nothing happens.
I do not. I'm not a big fan.
Not a fan of either.
I mean, I'm a fan of data.
Both the actual information kind and the Android from Star Trek. I'm a fan of data algorithms.
Not so much.
Oh, yeah, I forgot.
We had a whole entity named data, an Android, basically.
So what we have here is we've invited into studio today data.
We have an expert on algorithms and data,
a mathematician, Hannah Fry, who's dialing in from the UK.
Hannah, welcome.
Thank you very much.
You know, Chuck, you're not the only person who hates the word algorithms.
I was at a tech conference and I was just chatting to this guy
and I'm like, I think it's a word that makes about 85% of people want to
gouge out their own eyes.
He agreed with me.
He said, yeah, but it does make the remaining
15% of people mildly aroused.
So I know
what I'm in.
Mildly aroused, that's because Al
is only mildly sexy.
Mr. Go-Rhythm.
Oh, Al Go-Rhythm, Oh, Al Go-Rhythm.
Yeah, there you go.
Yeah, exactly.
So let me get your full bio here.
You're Associate Professor of Mathematics, University College London.
And you co-host a BBC radio show, Radio 4, because BBC has very stove-piped channels.
And it's The Curious Cases of Rutherford and Fry.
Wow.
Now, you have to do some splaining on that one.
Author of the book recently released, just last year,
Hello World, Being Human in the Age of Algorithms.
So you're the person for this Cosmic Queries.
I mean, I know a thing or two.
I'm not going to big myself up too much, but I've dabbled.
Hannah, if I may, can I just make you feel a little more at home digitally
as we cross the great pond?
Here we go.
Here we go.
Oh, BBC News time.
Five o'clock GMT.
There you go.
That was actually quite good. Did you like that? Did you GMT there you go that was actually
very good
did you like that?
did you like that?
yeah that was very good
very good
he's good
he can get a job
I'm sure
and the Brits show off
that they have
the prime meridian
so
yes exactly
universal time
Greenwich mean
you know in fact
I live in Greenwich
the prime meridian
is about
100 metres from my house.
Oh, do you feel it, though?
Do you feel it?
What's quite nice is you go for a little walk around.
There's like a peninsula.
And along the prime meridian, as it goes across the water,
there's just an arrow that says here.
Oh, no, I've forgotten the circumference of the earth now.
16,000 kilometers, something like that?
No, no, way bigger than that.
People once thought it was 16,000 in ancient Greece.
So we've got to update you on that.
I made a guess, and I made a guess with the wrong person.
She's a fan of history, that's all.
It says back to here, which I quite like.
We're about 50,000 kilometers, 40 to 50,000 kilometers.
Well, I was within a factor of 10.
I know it to miles and you're the guys who gave us miles,
so it's 25,000 miles is what that is.
But I'm just delighted you live near the prime meridian
and you can get a little vibe from it when you take a stroll.
So tell me about data.
What's the state of data today relative to, like, when any of us were kids
or even before there were computers?
The word data, of course, predates computers.
So what's going on today?
Well, in some ways, I mean, in a lot of ways, not that much has changed.
I mean, it's still a case of, you know, people collecting
statistics about humans, about how we behave, about what we do, using kind of mathematical
techniques to analyse it and trying to infer scientifically everything that you can about
our behaviour from it. So, you know, this is like a subject that has a long history going back to
the 50s and the 60s. I think that what's really changed is just the volume of data. I mean,
you don't need me to tell you just the incredible amounts of data that is collected on us.
But I think for me, it's not just the amount of data that's directly collated about us.
It's the things that you can infer from that data.
The guesses that you can make about people that you wouldn't necessarily realize that you could do.
So a really nice example is, what I'm talking about here is,
I was talking to a chief data officer for supermarkets, a chain of supermarkets in the UK.
And this supermarket, they have access to everything that everyone buys.
So they know what's in your basket.
They know your weekly shop.
But they also sell home insurance, right? So they can tell who makes higher claims on their
home insurance and compare it to what people are buying. And they realized in their data
that you can tell that people who claim less on their home insurance are people who tend to cook
food at home, which is kind of like,
oh, okay. I mean, once you hear it, it's like, well, that sort of makes sense, I guess, right?
If you're a very house-proud person and you're spending ages creating a meal from scratch,
then you're not going to let your kids play football in the house, right? It's like kind of the groups connect. But the question is, how do you decide who's a home cook? Anyway, it turns
out that there is one item that's like the strongest indicator of all. There's this one item that's in your basket
that's the biggest giveaway
that you are a home cook more than any other.
And it is-
Frozen pizza.
I'm going with frozen pizza.
I mean, it's half kind of as cooking, I guess.
Want to guess?
Do you want to guess, Neil?
Let me see.
I like to cook at home.
And see, I cook a certain type of food,
but I can't cook without olive oil.
How about anchovy paste?
Oh, yeah.
I think it is.
Or tomato paste.
Something that's so base level kitchen preparation.
So it's actually a little bit more, I little bit more, I guess, of, it's
fresh fennel. It's fresh fennel. Wow. I think it's kind of nice. And I mean, I think that's right.
Like you just wouldn't, if you're buying fresh fennel, you must be a home cook. No, you're only
cooking at home. That is the only purpose for fresh fennel. I mean, seriously, nobody's saying
like, oh, I have to pick up some fresh fennel. I need to shine my shoes.
Exactly.
So I started buying fresh fennel to see if my home insurance prices would start going down.
But as yet, nothing.
Whoa, okay.
So that's one thing where the access people have to who and what you are
when the data is consumer-based data.
So I guess we get that.
And a lot of people fear that or are angered by it.
But before we land there more securely, let me just comment that as the volume of data has grown, because computers are obtaining data constantly,
hasn't the computing ability to analyze data risen with it
so that we're not really feeling the stress of being smothered in data
that we might have feared a few decades ago?
Yeah, well, as analysts, you may be not feeling the fear of being smothered.
Okay, so I first heard from John Allen Paulus.
Now, this is now mid-90s, late 90s.
What he said was, the internet is the world's biggest library.
The problem is all the books are scattered on the floor.
And I said, wow, there's a brilliant analogy.
But at the time, there wasn't a Google search engine
or any kind of way to organize that information.
Within a few years, there were search engines.
So the books were no longer on the floor.
Who knows where the hell they are now,
but they're not on the floor.
Wow.
I really like that. I really like that.
I really like that. Yeah, it was clever, but it doesn't apply today because we can get through
the data. And my people, we have very big telescopes getting a lot of data on the universe.
Half of the effort in prepping for that telescope is the data pipeline to handle the data. So are
you, can you say that you're awash in data? Are you on top of the situation?
Well, okay.
Are we on top of the situation?
Okay, so definitely.
The amount of grunt that you have now,
I mean, you just didn't have access to before.
You can handle vast data sets
with just incredibly intricate detail
about everything from the cosmos to human behavior
that you just weren't able to before.
But the thing is that I think the biggest challenge when it comes to data isn't so much
about volume, but more about quality. And the thing is, is that anyone who has worked with data
knows that cleaning data is like so much the battle. You get like this, this massive, great,
big wallop of data.
You're like, brilliant.
It's going to be so good.
I can't wait to like dive in.
And then you realize just how long it takes to make it,
to get it into a shape where it does anything that you want it to.
Let me think of an example of something that I can give you that will be.
I can tell you in my field, we just call it reducing the data. If I showed you raw data from the universe, you'd say,
what the hell is that?
Where's my pretty Hubble picture?
Well, if you knew what the hell happened between the photon hitting the telescope
and the photo that ended up in the press, you'd be, you might be shocked.
No, you wouldn't be, Hannah, but others might.
So you have an example.
I'm trying to think.
Sorry, forgive me.
There must be one that's really funny and quite stupid.
Well, that might come out in the Q&A part.
That's the one we want, definitely.
Yeah, I know.
You definitely want one where something's just really stupid.
There's definitely got to be one.
Another quick thing about your bio.
Tell me about your BBC4 radio show.
Oh, okay.
So this is a show that I host with the geneticist Adam Rutherford.
And we've been going for about, I think we're recording our 16th series now.
Whoa.
So the idea is that people send us in questions and we go out and investigate them.
And initially we, because Radio 4 is like,
it's kind of the posh channel.
Okay.
It's very highbrow.
It's very, there's no music.
It's all, it's where the politicians go
and it's where they have like these deep intellectual debates.
I mean, they have like, you know,
programs on philosophy on it, right?
It's like the very kind of like, the very highbrow brow stuff so initially we wanted our program to be very high brow too and to be very serious like
we're very serious scientists but we discovered quite quickly that actually what works better is
if you just basically muck around um and as a result of that the questions that have been coming
in have been from families with younger kids and they end up being like the best questions. So we had a question that I really liked, which was,
what's the tiniest dinosaur? Which is a question that was asked of us by like an eight-year-old.
Seems like a really, you know, trivial, silly question that you can just dismiss with a quick
Google search. But actually unfolds this whole thing of like, how do you define size? How do
you define dinosaur? You know, this whole kind of like, how do you define size? How do you define
dinosaur? You know, this whole kind of world underneath it. So yeah, that's really what it's
morphed into is just this, yeah, like wonderful playground where all the annoying questions that
kids want to ask their parents, they send them into us instead. So to the benefit, I think,
of the listeners. So it kind of undoes some of their poshitude. It does.
It does.
It's definitely not posh.
Let's see if we can lead off with a question here.
Chuck, we got a first question on algorithms and data
for one of the world's experts on those subjects.
Sure thing.
Let's start off.
We always start off with a Patreon patron because they give us money.
So let's go to tj monroe uh from patreon says um dr tyson and dr fry the two best radio
voices in science oh thank you well wait so how is rutherford's voice in your show is rubbish
yours is much better oh okay so we gotta do we gotta do our own show then. All right, work that out. He says, can you walk through the process of creating a predictive algorithm for something like the path of a lightning bolt or ocean currents?
Now, one is a lot easier than the other.
Yeah, sure.
One repeats itself.
I'm sorry, go ahead.
You say that, Chuck, because you're an expert on this.
Well, yes, you know, so Neil, in my spare time.
But what I like, that's a great question.
What I like about it is there are two things that are highly sort of, you know, oceans can be turbulent.
You have storms and things, but there is a prevailing thing.
Right. You don't know where lightning is going to strike,
but you know it's going to strike somewhere over there.
Yeah.
So I love that question.
So, Hannah, has that reduced itself to algorithms at this point or not?
So I haven't seen an algorithm for predicting lightning strikes,
but I'm just thinking through how you could do it.
So certainly there are going to be certain things that go into it, as you say, right?
Like there are certain days when you can look out in the sky and be pretty confident that no lightning strikes are going to happen.
And other days where you can be fairly confident that they will.
So there's certain things that you can measure in the atmosphere, the atmospheric pressure, the humidity, all of those kind of things that you could plug into a system that could help you predict the likelihood of a lightning strike.
if you predict the likelihood of a lightning strike.
But the exact path of it, I mean, Neil,
you probably know more about this than me,
but I would say that the exact path of it is going to be very difficult to deduce precisely where it will end up.
It's easy, yeah.
Would you call it an algorithm
if you are checking the atmospheric pressure and the humidity
and the size of the clouds and the moisture?
If those are just inputs to something that calculates,
does that come under your category of algorithm as well?
Yeah, I think so.
I think so.
I mean, obviously there are different kinds of algorithm
and, you know, the artificial intelligence is one
that gets a lot of attention
and the algorithms that deal with sort of data
on the internet is another.
But I think anything that is taking something
from the real world, like a recipe, right? Like taking something from the real world,
like a recipe, right?
Like taking ingredients from the real world,
doing something with it, and then spitting out some kind of answer.
I think, for me, that counts as an algorithm.
I mean, you know, technically, if you stop and ask someone for directions,
if you're in your car and you stop and ask someone for directions
and they say, go down there, go that way, that way, that way,
I mean, technically, they're giving you an algorithm.
Right, right.
So, okay.
So, algorithm is a very wide catch basin then
for accounting for things that you want to predict or understand.
So, with lightning, you'll only discharge a cloud
if the buildup in charge is very different,
either from one cloud to another cloud
or between the cloud and the ground.
And so, like you said, Hannah, if you measure the humidity,
you can check to see what is the propensity of electricity
to cross humid air versus dry air
and look at a threshold for that.
And you say, when it hits this threshold, it's going.
And how much of algorithms is also thresholding phenomenon
oh lots and lots so i think yeah um i think that uh so that's the difference between um sometimes
you are trying to predict exactly what's going to happen you know you're going to you're trying to
predict let's say uh you know if a ball is rolling down a hill or whatever like exactly where it's
going to end up that kind of thing and sometimes you're just saying, as you said, like, what is the probability that this might
happen? And at what point do you set this threshold to say, okay, like for instance,
you know, the example that I gave you earlier about the fennel thing, right? It's like,
it's not definitely, you know, it's not like you buy fennel, therefore you're not going to claim
on your home insurance. It's like you buy fennel, therefore you're likely to be a home cook, therefore you're likely to do this. And then you sort of, all the way through those, you're not going to claim on your home insurance. It's like you buy fennel, therefore you're likely
to be a home cook, therefore you're likely to do this. And then you sort of, all the way through
those, you're going to be setting thresholds where you say, if it tips over this, then we assume
you're a home cook. If it tips over this, we assume you're whatever. Interesting. Okay. So
that's an important fact here because it's not the one piece of data that gives you the, it's not the one piece of data
that tells you what everything is. It's the one piece of data that might put you over the,
over the edge of that conclusion. Is that a way to think about that?
Yeah, I think that's right. I think when you're, when you're dealing with uncertainty,
I mean, very, there are very few things, especially when you're handling data to do with
human behavior, very few things are cold hard facts, right? Very few things are, you're rarely
dealing in absolutes. So when you're handling uncertainty, the only way that you can possibly
convert uncertainty into a yes, no answer is by saying, here's the line, if we cross it,
we'll assume it's a yes.
Got it. And the ocean currents, those, like you said, Chuck, those have prevailing,
they're not catastrophic the way a lightning bolt is. So presumably,
that doesn't have this kind of thresholding.
No, true. But I mean, ocean currents, there are, you know, much, they're sort of very sophisticated equations that can describe fluid flow, right?
So they're still not absolute,
especially when you're dealing with turbulence, there's still a lot of probability
and randomness and chaos, right?
That's involved in all of that.
But you can say with more,
it's not a thresholding problem.
You're, as you say, right?
You're like, you can say with more certainty
where things are going to be
and how they're going to be moving.
I was going to say, there's also connection
when you're dealing with ocean currents
as opposed to lightning bolts.
Ocean currents are all connected
because it's not one ocean.
It's an entire oceanic system that happens on the globe,
whereas lightning bolts are isolated incidents.
So that's that, there's that too. That's very true. Although if you ask a mathematician,
a mathematician who studied the ocean,
I mean, they assume it's two-dimensional.
So, I think it's a bit lost then.
Yeah, Chuck, you start subtracting dimensions
to make the problem easier to solve.
Whether or not your answer is correct at the end.
Elegant there, right?
Elegant. Yeah, we'll get back to that.
Let's take our first break.
And when we come back, more Cosmic Queries
with data mathematician Hannah Fry when we return. We're back.
StarTalk Cosmic Queries.
Algorithms and data.
Yeah, I said it.
Algorithms and data.
I'm so sorry when you say it.
Chuck Nice, co-host.
Tweeting at Chuck Nice Comic.
Thank you, sir.
And we have as our special guest,
Professor of Mathematics at University College London,
Hannah Fry.
I'm an associate professor.
You just gave me a promotion there.
Thank you.
I will do that.
You take that to the bank and to your department chair.
Hannah, how would you like to be a chancellor?
I want to be sir.
That's what I want.
Oh, there you go.
So Chuck, you got another question about data?
Sure, sure, sure.
Let's go back to Patreon
and let's go to Shawshank Submarinian
who says,
Neil, hello, and Hannah, hello.
I would like to know how impactful
with solving the P versus non-P problem
with respect to our capabilities of understanding the universe.
Excuse me, the P versus non-P problem?
Are we all fluent in P?
Yes, exactly.
If you have consumed copious amounts of liquid,
the P versus non-P problem becomes quite the conundrum.
What is the proximity to your water closet?
So, Hannah, what is P versus non-P?
And is that a real outstanding problem?
Yeah, yeah, totally.
So this is one of the millennium math problems.
So if you solve this, Shawshank, what was his name?
Shawshank?
Shawshank Submarinian.
Okay.
He's showing off, by the way,
with this question.
He's showing off.
If you solve this problem,
you win a million dollars.
So, I mean, it's kind of maybe
the change to the university bigger,
but definitely a big change to your life.
Okay.
So let's say that I gave you
a gigantic Sudoku puzzle
and asked you to solve it.
I mean, like a really massive one,
not just like nine square,
a massive, massive one and asked you to solve it.
It would take you forever to solve it.
But if I said, here's a solved one,
I want you to check if it's right.
Actually, that's a much easier problem, right?
Even though they could be the same Sudoku puzzle,
filling it in in the first place
is much, much, much harder
than just checking that it's right.
So there are some...
Wait, wait, wait.
You say a solved puzzle.
Yeah.
That would imply it's right.
So you mean a filled-in puzzle.
Okay.
You're right.
My language is sloppy.
I take it back.
A filled-in puzzle.
Okay.
So sometimes where you've got like a blank Sudoku grid, effectively, you know, or the analogy in computers, if it's very easy to check that the answer is right, sometimes you can use that as a loophole to get you to the answer very quickly, right?
Because you can just generate answers and check if they're right, rather than kind of go through and grunt through the entire process. So the question is, is that always the case?
If you can check that an answer is right quickly,
much quicker than you can to solve the problem in the first place,
is that always the case?
Can you solve something, do really hard problems
that you've got to grunt through,
always have quick solutions or not?
And the reason why this has repercussions
and the reason why this has potentialcussions and the reason why this has, you know, potential
impact on our understanding of the universe is that an awful lot of the algorithms that we use
to try and understand gigantic systems, you know, I'm sure that this is certainly true in a lot of
cosmology, a lot of them have to use very clever workarounds to account for the fact that some
problems are just really hard. Some problems you kind of have to grunt through to find the answer. So there are a whole host of different
algorithms that exist to try and make that grunting process easier. But if it were the case
that actually all these difficult problems do have an easy, quick solution, I mean, that would be,
you know, if you could suddenly reduce the amount of computational time
that you spend on a problem,
I mean, that would have a dramatic, dramatic effect
on the number of things that you could compute.
So I want to, Chuck, I'm going to show off in front of Hannah,
so give me a moment here.
Go do your thing.
So did the four-color map problem,
I was around and in college when that was solved
and it was considered
inelegant because someone put an algorithm and just grunted through it and but they solved it
and no one else had solved it before so in principle then that implies that it was easier
to solve it that way than by any analytic way is that a fair analogy here or not uh is it
yeah I mean that one people got very upset about that, didn't they?
Yes, I remember that.
Yeah, because normally when you do a proof,
you write it down and it's these elegant statements of logic.
It fits on the back of an envelope.
Yeah, right, exactly, yeah.
So at the risk of not being a part of the four-color map,
the four-color map parade. Good point.
The four-color map love fest.
What the hell is the four-color map problem?
Okay.
You know when you get like a map of the states,
you know, the United States,
and you've got like all of the different states, whatever,
and you want to color it in?
Yeah.
The question is,
can you color it in with four colors so that no two states next to each other
share the same color, right?
Turns out that you can.
The question is, is there,
so the four color problem is,
let me get this right, actually.
You might remember more than me,
but does any map exist
for which you cannot color in in four colors?
Yeah, that's the way I'm thinking.
What is the minimum number of colors
that you need to color any map?
I think we knew that the four colors was the right answer.
We just didn't have a proof of it.
And so it was intractable until somebody,
again, I went to college so long ago,
computers were new to the world of math.
Punch cards.
Almost certainly.
I mean, it was the 70s, right?
It was the 70s.
Yes, it was.
Back when I was a college.
And like a steam-powered handle.
Yeah.
So, and it was proven,
but only through this,
by checking the answer,
not by proving the answer.
That feels like what you just described. Right.
And so I'll give you an example from my field.
Hannah, I think this is an example.
We have people looking for galaxies that are very, very low in their surface brightness.
Like you would scan by it and you wouldn't even know it was there.
Well, how do you look for them
if they don't reveal themselves?
Well, the ones we have found,
we know what their light profile looks like.
And so what we can do is set up a filter
that goes out and tries to match
the light in the sky to that filter.
And when you get a slight increase in a match, there's a galaxy.
Now you can put all your resources there and say, yep, there's a galaxy there.
So you're looking for the answer to ask the question.
Yes, yes, yes.
Yeah.
That's it.
So Hannah, is this legit?
I mean, these are great examples of grunting through the solution. And the question is, is this legit? I mean, are we allowed to do this? Yeah, I mean, these are great examples of, like, grunting through the solution.
And the question is, is that always the easiest way?
Or is there actually a trick?
Is there some clever trick that you could have used?
You know, I don't know, like folding your data in half
and looking for, like, this superimposed, like, light.
You know, maybe there's some clever trick somewhere
that just no one's spotted yet.
You know, there's so often cases where people come up with these clever tricks.
And maybe there was a clever trick that the whole thing could have been solved much quicker
without having to grunt it through.
Okay, so if this problem gets solved,
then it'll give us confidence for all future problems to say,
don't even worry about figuring out the answer analytically.
Let's just compute the answer and then check it.
Yeah.
I mean, if that's easier, then for me, that's less romantic.
That's less elegant.
It is.
If it works, it works.
But then at the same time,
you have to think of the potential repercussions of this.
Like, you know, there are some problems.
Like if you take protein folding as an example, right?
So proteins are just the source of so much, you know, life.
I mean, essentially they're like the fundamental building blocks of life.
Building blocks.
Everything that could happen to the human body from, you know, Alzheimer's to the drugs that you,
like the effect of drugs that you take. I mean, everything, it all comes down to proteins. And
proteins, they're like these long ribbons of amino acids. And the way that they
fold up determines their function. These things are incredibly, incredibly, incredibly complicated.
And it's okay to go from the folded up bundle of ribbon to the long string, it's possible.
But going from the long string to work out what folded up knotted shape it makes
is really, really,
really, really super, super, super hard. We understand all of the physics of it. We have
equations that could work it out, but you just cannot grunt through it all. And if you, let's
say that you could solve this problem. Let's say that you could have a computer that could grunt
through all the possibilities. What that would mean is you could say, I want a protein that serves this function. I want a protein that can combat this disease. I want a protein that
acts on the body in this way. What shape is it? It's like this shape. Okay, now what is the string
of amino acids I need to print to create that protein? I mean, I'm talking like long, long,
long, long, long term here. These are not like things that are around the corner. But I mean,
however, I think there are some applications of this stuff
that means that actually romance, you know, romance is dead.
Like who cares about romance when you've got protein folding?
So Chuck is listening attentively here
because he wants to know the formula for a funny joke.
I think it's going to be hard.
I really don't.
Why would I change anything now?
How do you fold your words together to guarantee there's a funny joke on the other side?
I'm sure there's an algorithm for that, you know.
And believe me, I would love that.
It's funny because it sounds like what you're talking about is quantum computing.
In part, yeah. In part.
Well, the extreme level of it.
Yeah, if you could get the computing power.
And then all problems will just be solved depending on whoever spends time looking at it.
That would, like, I have to reiterate, Hannah,
that would take away the romance of the quest for me a little bit, I think.
A little bit.
Well, you have to sigh.
Did you hear that sigh? Yeah, that was amazing.
Wait, so do you think the four-color problem was
unromantic? Yes.
Yes. Yeah.
Yeah. I mean,
because, you know,
we have E equals MC squared, that fits
in, you know, children write
down that equation. It's one of the most profound equations
in the universe.
How many forces are there in the universe?
There's not thousands, there's four, all right?
In the early universe, there was fewer.
You want, I have a bias, a philosophical bias,
that when I part the curtains,
I want to find simplicity rather than complexity.
Do you think that simplicity always exists, though?
Or do you think, actually, sometimes,
if you're a physicist, you can get, like the sort of i don't know it's like it's like a potion of simplicity
that maybe yeah so here's a here's my lesson that i have to tell myself to get out of this
sort of state of romance uh johannes kepler when he first showed the planets going around the sun,
and he was trying to figure out what kind of orbits did they have
and about their distances.
And he had a system where he's a mathematician,
and you know they're the five platonic solids, right?
Do you know about this, Jack?
No.
Five solids?
No.
It's a singing group from the 60s.
That's what I was looking
for.
That's exactly what I was looking for.
So, Hannah, you want to tell them
the five solids? Okay. So,
Plato was like super into this idea
of everything being perfect.
So, the platonic solids are
the five shapes that can be created where every side is the same.
So a cube is a platonic solid.
Every side is a square.
Tetrahedron, octahedron, dodecahedron.
Dodecahedron, go ahead.
And what's the last one?
Icosahedron, I guess.
Well, yeah, icosahedron.
And I think you left out the pyramid,
which is what, the four-sided pyramid.
Tetrahedron. No, she said it.
Oh, you said it? Okay. Yeah, she said tetrahedron.
Can we get five? So tetrahedron,
octahedron,
cube,
icosahedron, and dodecahedron.
Right. So each of those have
the same shape
polygon on all sides, but there are only
five of them. So Kepler knew this,
and he also knew that there were six planets.
And he said, well, everything is perfect and divine
and math is perfect.
Maybe the planets are the separations,
occupy orbits in the separations
between nested platonic solids.
Ooh, if only.
So he took them and nested them and put planet orbits,
and he actually got pretty close.
It was like, but this was his ideal.
This was his sense of perfection that he was imposing on nature.
And it was all bullshit.
That happens quite often, though.
I think it does happen where people fall in love
with the simplicity of their theory
and forget that actually often the world's really ugly.
Yes.
So I use the Kepler example.
To his credit, it took him 15 years,
but to his credit, 10 years,
but he discarded the entire system
and out came elliptical orbits.
Which are beautiful.
In their own way. In their own way. Not as beautiful as perfect circles as Copernicus had presumed. So anyway, let's go to the next question. Check.
All right. Let's go to John Baker from Patreon. He says, hi guys. I'm back to prove my ignorance
yet again. What kind of empirical data is used? Well, first, let me ask
a core question. What does it mean to use an algorithm? Not that I couldn't look it up on
wiki, or I could have just paid attention in school. I love John Baker. This guy's amazing.
I love John Baker.
This guy's amazing.
Anyway, and you know what?
I should have let off with this question,
but because the truth is what we never have touched upon in this show yet,
what is an algorithm?
Yeah, what is it?
Hannah, algorithm 101, give it to me.
Okay, 101.
Algorithm is this gigantic umbrella term
that doesn't really mean very much,
which I think is the reason why people
hate the word so much. But essentially, all it is, is a series of logical steps that take you from
some input to some kind of output, right? So a recipe, a cake recipe, that counts. That's an
algorithm. Your inputs are your ingredients. The logical steps is the recipe itself that outputs
the cake that you get at the end. The difference though is that when people...
Wait, wait, wait.
A cake recipe is a flowchart, I would think.
I don't think a flowchart is an algorithm.
It could be.
I mean, it's just like a giant algorithm.
The word algorithm is like this giant all-encompassing term.
But I think when people use it,
they tend to mean something within a computer.
So something where you are inputting some data
and then the machine has some kind of autonomy
in terms of the decisions that it makes along the way
and spits out an answer at the end.
Of course, so computer programs,
then everything you do in a computer program is an algorithm.
Is that fair to think of it that way?
I think that's fair, yeah. Although that fair to think of it that way?
I think that's fair.
Yeah.
Although, I mean, I think that when people use the word,
you know, if you go like, did you learn?
So I had a ZX Spectrum, which I think is quite a British thing.
It's like a Commodore 64.
That was the thing that I learned to code on.
Well, that's the American one, right?
Yes, we did.
The Commodore 64 was the American version. And that's because Hannah is 80 years old.
I was going to say,
I want to unpin my video now
so I can get a closer look
at Hannah
because right now
she's in a little window
on my screen.
In her early 30s,
late 20s.
And then she's talking
about Commodore 64
and I'm like,
is it her or is it my
eyes? Okay, grandma.
Talk to us, grandma. I learned coding
when I was two.
She's just
a child genius. Exactly.
There you go. So all my
ZX Spectrum is kind of the British equivalent.
You would do like print,
hello, go to
line 10 and then it would just go around and around and around and just be like the whole, fill the whole screen with hello, go to line 10. And then it just go back around, around, around,
and just be like the whole, fill the whole screen with hello, right?
That was the kind of programming that everyone did when they were like kids.
And technically that's an algorithm.
It's just a really rubbish one.
It's like, you know, not particularly, not really doing anything.
So I think when people use the word,
they tend to mean that it's like some kind of automated decision-making.
That's sort of really what they mean. But I think, you know, if we're being absolutely fair, the word algorithm
encompasses all of this. Now, Chuck, the person had more to that question, but we just ran out
of time in this segment. So when we come back, we'll pick up more on the angst shared by this
questioner who wondered whether he should have learned all this in school in the first place.
This is StarTalk Cosmic Queries.
We'll be right back.
Hey, we'd like to give a Patreon shout out to the following Patreon patrons,
Dan McGowan and Sullivan S. Paulson.
Thank you so much for your support.
You know, without you, we just couldn't do this show.
And if any of you out there listening would like your very own Patreon shout-out,
please go to patreon.com slash startalkradio and support us.
StarTalk.
We're back.
Cosmic Queries.
Algorithms and data edition. You never thought we'd go there, but we did.
So there.
Okay.
I got Chuck, of course, and Professor Hannah Fry,
Associate Professor of Mathematics at University College London.
You shared with us earlier in the session that you live in Greenwich,
and we've all heard of Greenwich, even if you've never been there,
Greenwich time.
That's like the time, the base time of the world, right?
You get kind of cocky about that?
You know, we swagger around here.
We swagger around.
It actually took me to move to Greenwich.
I've only lived here for three years or so.
But it took me to move to Greenwich
to realize that Greenwich Mean Time is,
the word mean in it actually means average mean
across an entire year.
I didn't know that.
Oh yeah, entirely. Of course 24 hours a day is 24 hours. No, it's not. It's 24 hours on average.
Yeah, it's exactly right. Yeah, the time it takes the sun to return to its spot on the sky
on average is 24 hours. Sometimes it takes longer, sometimes it it tastes less people don't know that yeah yeah so yeah i was happily drinking my greenwich meantime lager and wandering around
greenwich meantime village and i didn't realize that mean meant average mean yeah it has nothing
to do with the emotional state of your time okay so chuck we left off someone upset that um he
didn't learn the meaning of algorithm in school,
but I think there was a question based down there.
He said, what does it mean to use an algorithm?
Okay.
And then Hannah said, well, a recipe is an algorithm.
But I like the distinction Hannah is making as we go forward in the 21st century,
that we think of algorithms as an automated procedure rather than something that...
Yeah, you do. Something that makes a decision.
And then I think that there's a further distinction there as well
between an algorithm and artificial intelligence.
And I think that the way I like to think of this is,
let's say you've got a smart light bulb
that's kind of connected to the internet
and you decide to program it so that it turns on at 6 o'clock
and goes off at 11 o'clock.
So that's an algorithm, right? You programmed it, you said, if it's six o'clock, turn off,
right? Or turn on, whatever. That's just a straightforward algorithm. If it was artificial
intelligence, generally speaking, most people agree that artificial intelligence needs to include
some aspect of learning. So instead, the light bulb would recognize
that you came home at six o'clock and turned the light on.
It would recognize that you like to dim the switch
at 9 p.m. when you do some reading,
and then that you go to bed at 11 o'clock.
So if it's starting to learn from its environment
and then impose those rules itself,
that counts as artificial intelligence.
Right.
But that's simply an updatable algorithm.
Yeah, yeah, yeah, exactly. It's something that's continually revising itself.
Okay. By the way, you... Go on, Chuck.
I was going to say, in addition to that, though, it is also, more importantly, pattern recognition.
So the update is based on the recognition of patterns.
Totally agree. Yeah, completely. Which was really hard until very recently.
Now, Hannah, you scared me a little when you began that comment because you said,
imagine a smart light bulb. And I thought, aren't smart robots enough?
What would a smart light bulb be? Light bulbs marching down the street.
It just pops above your head every time you have an idea.
I love the smart light bulb.
It's like, humans must die.
In order for us to shine, humans must die.
Exactly.
And in fact, they would dig up the joke from,
there's a comedian whose Twitter handle is The Science Comedian.
And I quote him every now and then.
One of my favorite jokes of his or comments was,
the light bulb was such a good idea,
it became a symbol for a good idea.
That's funny.
That's true.
So if light bulbs become our overlords,
they will remind us that anytime we think something brilliant,
it's one of them that gets popped up.
Exactly.
Right.
Don't think we don't know what you're thinking.
Well, we only know what you're thinking if it's a good thing.
That's a good thing.
If it's a good thought, we know it.
Okay?
All right.
That's funny.
So the science meat is Brian Mallow, if anyone wanted to dig him up.
Nice.
Okay.
So you got another question there, Chuck.
Short thing.
This is Ben Sellers, and Ben wants to know this.
From an evolution standpoint, our relationships and mating behaviors
probably follow patterns useful for hunter-gatherers.
How do our behaviors on social media and dating websites
resemble patterns from more primitive days?
What would make interacting online more
connected to our primitive programming? Now, I don't know if this is your purview, Hannah, but
he's making a really, you know, pretty poignant association, which is we now find people online.
That's how we find love now. I mean, and the number is only going up every year.
how we find love now.
I mean, and the number is only going up every year.
Do the hunter-gatherer, you know,
brain sets actually apply to the way that we go after one another digitally?
I love this.
It's like I'm just foraging, foraging for lovers.
Oh, yeah.
Hunter-gatherers.
By the way, it is kind of foraging.
Swipe right, swipe right.
Yes. You're hunting. Yeah. hunter-gatherer right and by the way it is kind of foraging swipe right swipe right so actually one of the very first things i did as soon as i finished my phd was i did this really
silly talk that was like actually supposed to be this kind of private joke that just got really
out of hand which was called the maths of love right which was in part looking at data for from
online dating websites and like it was kind of this thing where I just wanted to demonstrate
that you can take a mathematical view to everything.
Anyway, it got terribly out of hand and ended up being a TED Talk.
But in that, there was something that was really interesting
that I think is relevant to this, which is...
Wait, wait, just a quick thing, Anna.
Most people who have thoughts that get out of hand
don't end up giving TED Talks.
Anna, most people who have thoughts that get out of hand don't end up giving TED Talks.
So it requires some level of brilliance.
I just want to distinguish you from everybody else that would be encountered.
So your TED Talks.
So go on.
Well, for a number of years in Britain, people started calling me Dr. Love.
And I was like, it was just a joke, guys.
It's never been serious.
I'm really not Dr. Love.
Anyway.
Okay, so in it,
one of the things that I talked about was about who gets most attention, right?
Whose photos get most attention on dating websites?
And you would think, okay,
surely it's going to be the people
who everyone considers as best looking, right? I mean, surely mean surely right the most attractive people get the most attention surely but so okay
cupid kind of an interesting dating website because for a while there they were totally
open about the fact that they were experimenting on their customers and like released all their
data and also on their website what you're allowed to do is rate how attractive you thought other
people were on a scale between one and five
right so five is very beautiful one I think slightly more facially challenged is the technical
term and what is um what they found was that it's not true that just the people who get fives get
the most attention it was the people who divided opinion the most so the people who were getting
the most attention were like averaging out as kind of a
four, but they weren't people where everyone was giving them a four. They were people where some
people would give them a five and lots of people would give them a one. Some people thought they
were absolutely horrific and some people thought they were really beautiful. And the explanation
for this, which I quite like, is kind of like an instinctive one. So I guess this sort of goes into the
hunter-gatherer thing in a way, which is that it's like a game theory thing, right? If you
come across someone and you think that they're very beautiful, then you imagine they're getting
lots of attention and you think, well, why would I, there's no point in throwing my hat in that
ring. I may as well stand back. Whereas if you come across someone who you think is very beautiful,
but you imagine other people will really dislike. So someone who you think is very beautiful, but you imagine
other people will really dislike. So someone who's like a bit unusual in some way,
then you're like, great, this gorgeous person isn't going to be getting that much attention.
And you kind of like throw yourself in. But just because everyone's doing that, that means
it's the really beautiful people who are not getting any attention and the kind of quirky
ones are getting lots. So I think it's kind of interesting.
Wow.
So what you're saying,
apart from the sociocultural lessons from that,
is algorithms that might apply to human behavior that data that we're now collecting
on billions of people provide
might have some strong evolutionary guidance for us going forward.
Yeah, I mean, I think in terms of this,
it's always tough to link it back to evolution, isn't it?
But I do think that you can certainly come up
with these game theoretic arguments for the patterns in our behaviour.
I mean, you know, they're not exactly falsifiable,
but I think they're fun to explore.
But it means you have an algorithm that applies,
that's established in one environment.
Like Chuck was saying, you're on the plains of the Serengeti,
and there's a certain algorithm for our behavior that we can't shake
because it's genetically encoded within us, perhaps.
Yeah, I mean, perhaps.
Yeah, I mean, I guess I'm sure that there's like anthropologists
who will know this much better than me.
But yeah, I don't doubt it.
I'm sure that there are lots of occasions
where we act really instinctively,
where our ancient history causes us
to act in a certain way.
And I'm sure that we do it still,
even when we are interacting with people
on a completely different platform
to the one that we were designed for.
So Chuck, we've got time for maybe one or two more questions.
Wow.
All right.
God, we're going to have to do another show, man.
I have 11 pages of questions.
Okay, Hannah, you've got to come back.
I'll talk really fast.
That is how many people are really interested in this subject.
It's unbelievable.
I have 11 pages.
Beautiful.
Data and algorithms, sir.
So I know we're wrapping it up, so I'm trying to find one. Okay, here's unbelievable. I have 11. Beautiful. Thanks for an algorithm, sir. So I know we're wrapping it up.
So I'm trying to find one.
Okay, here's one.
Okay.
This is Dean from Twitter.
And Dean says,
Wait, Chuck,
you're just choosing people
whose names you can pronounce.
This whole episode
have been pronounceable names.
All right.
Let me go with them instead.
I'll go with Tielo Jung manas that ain't that's better
i'm sure tielo now dean thought he had his moment and robs from him at the last minute well
sorry dean we'll get we're gonna do this show again so we'll get back to you, buddy. All right. So this is Tielo Jangman.
Jangmans?
Jangmans.
Okay, from Twitter.
Okay.
He says, I'm sorry. He says, assume that the game on behavioral data is already over.
The next level is biological data, and the one after that is thought police.
Okay.
What are your expectations
and what do you think the outcomes will be?
So what he's really talking about there
is predictive analytics.
Will we ever get predictive analytics
to a point where you have pre-crime?
It's like, you didn't commit a crime,
but you know what?
We know you're going to commit this crime
because these algorithms have actually profiled you
in such a way that tells us that you are a criminal.
Hannah, you said that about fennel, okay?
Oh my God, you did.
If you know that much about who a person is
if they buy fennel i i love what checks chuck said
there can can you get are the algorithms so good that they know your thoughts and then they know
your next behavior and then you pre-arrest someone okay so this is such a this is such a tough topic
right like i spent a huge chunk of my book talking about this you know predictive policing and
predictive algorithms in the name of your book again, just so we get that.
It's called Hello World.
Life in the Age of...
I can't remember what they gave us.
I can't remember what they did with the subtitle in America.
I think it's How to Be Human in the Age of the Algorithm,
maybe. Yeah, that's right.
I thought you were going to say, I can't remember what the
name of my book is.
But the subtitle
is different in America. They do that sometimes. Yeah. Okay.
They have to translate between two English speaking countries. They have to translate.
Okay. So, right. This is like a really tough topic because the thing is, is that some people
have definitely tried to do this. There have definitely been some situations in which people
have tried to predict. So there was one particular example in Chicago, I think it was,
where the idea was quite straightforward, right?
Which was like, okay, well, when it comes to gun crime,
often today's victims are tomorrow's perpetrators.
So if you analyze the network of people who people are friends with,
who people hang out with, that kind of stuff,
if you analyze that network and feed in where events are happening, can you come up with
like a risk score, if you like? This is kind of like the threshold thing that you were talking
about earlier with Lightning. Can you come up with a risk score that says, we think that this group
of people or this group of people are likely to be involved in something in the near future?
And when this whole system was set up, it was set up in kind of a nice way, or like,
I think it was set up with good intentions, because the idea was that if your name appeared on this list, then police and social workers would come around to your house and they would,
so it would be like- Intervention.
An intervention, right? But it's like, you know, here are these programs that you can join.
Here are these alternatives that you can, you know, we want to help you out of the life that you're in, right?
That was kind of like the intention.
Of course, of course, it didn't work out that way.
Because if you give that list to people who have got a completely different set of priorities, as soon as there was, you know, a gun homicide,
it turned out that people took this list and started at the top of the list and then just started arresting all the way down.
So by the end,
Rand Corporation did this analysis of the whole project.
And by the end, the people who were on the list were,
I can't remember the numbers,
but basically way more times,
way more likely to have been arrested by the police,
regardless of whether they were involved in the original crime.
So essentially it turned into a list,
a harassment list, right?
And I think this is the thing.
I think that in like a lab setting,
in kind of a cold environment of like an ivory tower,
actually, I think there are certain things that you can say
about like likelihood, you know,
like there are some people that you can pick out
who you know in a million years,
they're never going to commit a gun homicide. And then there are other people who, you know,
perhaps aren't quite in the same boat. And I think there are some things that you can say about
humans. But the problem is, is that the world isn't this ivory tower. You can't like create
a system that gives you that information because it doesn't then tell you what you're supposed to
do with it. It doesn't tell you how you're supposed to interact with people. It doesn't
tell you what you're supposed to do with it. And I think that I haven't yet heard of a really
positive story where people have tried to do something like that and it's gone really well
because I just think it's just, I just think like that kind of, yeah, the real world is messy.
The real world is messy. Right. So, Hannah, I think just AI will figure out what to do with the data.
It definitely won't.
AI will know to become our overlords
and subjugate us because we can't take care of ourselves.
Definitely won't.
I mean, also, right, that was a point,
like, a couple of years ago where everyone,
I mean, genuinely, newspaper articles had that attitude of, like, just feed it into the AI and it will be able to predict everything.
This, so this book, actually, this one here, this guy here, Matt Salkinick, he did this amazing
project where he had, I can't remember exactly how many, but thousands of kids, right? Thousands
of kids. And he had data on them from when they were, when they were born, when they were five
years old, 10 years old, all the way up into their teens. And he had everything on them, right?
He had, you know, what their parents did.
He had interviews with them,
like unimaginable amount, big, big, big data, right?
On these kids.
And what he did very cleverly is he released the data,
you know, to the public, anonymized and so on in stages.
And he held back the last stage
of when they were, I think, 18.
And he asked people all around the world,
he said, here's everything you know about these kids from 0 to 15. I want you to predict how many
of them ended up in trouble, how many of them went on to further education, all of those different
kind of things. And everyone around the world with their very clever AI and their very clever,
all of this, that and the other, tried to do it. Would you like to know what came out on top?
Linear regression.
Oh, the basic.
The most basic, basic, basic.
What's gone before will happen again.
You fit a line to the trend in the data.
Yeah.
And there you have it.
Yeah.
Chuck, in case you didn't know,
that linear regression is the,
that's fitting a line through the data.
They have to put more syllables to that.
Right, yeah.
Linear regression.
Oh, yes, you draw a line.
Straight line.
I think it was.
I think I got that right, guys.
By the way, I would probably fact check that slightly
because it may have been legit, whatever.
I may have got a tiny couple of those facts wrong,
but whatever.
Yeah, we don't care.
The sense of it is fine.
It's a great story.
I'm sticking with that. So we got to bring this to a close, but we have to have Hannah back on. Oh, we don't care. The sense of it is fine. It's a great story. I'm sticking with that.
So we got to bring this to a close, but we have to have Hannah back on. Oh my gosh, we've only
just scratched the surface, especially with 11 pages. Hannah, clearly you've triggered interest
in our fan base, and they're going to want more of you as we go forward. But I think,
correct me if I'm wrong, Hannah, that one of the great lessons of this is really maybe everyone should have paid attention in their math class because math will be the foundational forces that define our social cultural existence in this world.
Did I overstate that, Hannah, or not?
Yeah, no, I think that's definitely true. I always think like if,
I think that it's really hard to realize
how important this stuff is
because it's invisible.
And I kind of think, you know, with drones,
like drones came along
and then all of a sudden
there were loads of drones everywhere
and everyone got really upset.
Now you've got to have every license possible
to fly a drone.
I always think that like,
if you could see algorithms
in the same way that you could see algorithms in the same way
that you could see drones,
I think people would be
a lot more, you know,
willing to, well, on it really.
I think that they'd want
to educate themselves
a lot more about it.
So yeah.
Or they'd just be really annoyed
by algorithms
like they are drones.
So, okay guys,
we got to wrap this up.
Hannah, thank you
very much
for sharing your wisdom
your insights
and
some of it
pleasing
some of it scary
part of what we need
going forward
Chuck
always good to have you
always a pleasure
alright
this has been a
Cosmic Queries edition
on data and algorithms
we gotta call it quits there
I'm Neil deGrasse Tyson
your personal astrophysicist
bidding you as always to keep looking up.