Radiolab - Breaking Benford
Episode Date: November 14, 2020In the days after the US Presidential election was called for Joe Biden, many supporters of Donald Trump are crying foul. Voter fraud. And a key piece of evidence? A century-old quirk of math calle...d Benford’s Law. We at Radiolab know Benford’s Law well, and have covered it before. In this political dispatch, Latif and Soren Sherlock their way through the precinct numbers to see if these claims hold up. Spoiler: they don’t. But the reason why is more interesting than you’d expect. This episode was reported by Latif Nasser. Support Radiolab by becoming a member today at Radiolab.org/donate. Links: Walter Mebane, “Inappropriate Applications of Benford’s Law Regularities to Some Data from the 2020 Presidential Election in the United States”
Transcript
Discussion (0)
Wait, you're listening to radio lab from W and Y.
Well, first, am I imagining that Chad already knows about, because we just talked about it
when I was promoting my show, right?
About Ben Prislaw?
Just tell him that, remind him.
Okay, he knows, but he forgets.
Yeah, just, okay.
Well, I think, I think my,
hey, I'm Luttifnosser, this is Radio Lab.
And today, in the week after the election
was called for Joe Biden,
but in this sort of weird middle space
when Donald Trump has refused to concede,
our editor, Soren Wheeler and I sat,
Chad Dan, with him and I have talked to him.
I think we, because we wanted to tell him
about something from our past,
that has come back to haunt us.
Okay.
So a few months ago, you may remember,
I was telling you about the Netflix show I made, connected,
right?
And I was telling you about one of the episodes, which is about something that before my time,
Radio Lab had also separately done an episode about, which was this thing called Benford's
Law.
Yes.
Yes.
This kind of funny, nerdy, a little bit mystical, mathematical law that's kind of stupendously,
broadly applicable.
Stupendously, broadly applicable.
Yeah, it's one of those.
But so, in my head, I feel like I parked it
in the fun, silly, nerdy category of stuff
that's more sort of interesting than it is important.
But then all of a sudden, the last week it became very important
Oh boy
Ben Fritz Law
What has Ben Fritz Law and how is it relevant to the Twentieth Way election?
It was all over Twitter, Facebook
It's a mathematics ad clarity and transparency
I think some of the first things I saw about it were YouTube videos
In terms it's the mathematically and statistically provable point
Then it shows up on all these fringe news sites, newsmax, Boston Herald, Gateway Pundit,
and then conservative podcasts, people like Dan Bungino.
Start to talk about it.
What if I told you when you look at some of the voter data, it was constant, that Ben
Ferdz Law is saying, it's not possible.
And Ben Ferdz Law is what we're talking about.
Scott Adams, the cartoonist,
Scott Adams, who's a big Trump supporter,
he was talking about it.
So it's don't conform to the natural arc.
There was so much stuff going around online
that Twitter actually started flagging mentions
of Benford's law.
Wait, why?
Because suddenly there were just, you know,
a lot of people online trying to use Benford's law
to parse the 2020 election returns
to say that the vote for Joe Biden
were fraudulent votes and Benford's law proved that.
Huh, it's funny because we're all waiting,
there's still just talk about fraud
and everyone is like, okay, show us the fraud,
where is the fraud?
You're saying Benford's law of all things
is what people are using just trying and say that there's fraud?
Yeah.
And then the big hit is that like on Twitter people would be like,
by the way, check out Radio Lab, it's totally true.
Right, so like,
So go see Connected, like this is the real deal.
Yeah, it almost felt like we got pulled into this,
into this fight.
You never know where your stuff's gonna end up.
But I assume from what you said about Twitter
that there is no there there,
in terms of Benford-
Yeah, moving fraught.
No, definitely not.
I mean, if you talk to the people who know the math,
do the math, they say, you know, like,
all these guys are wrong.
But interestingly, the way that they're wrong,
and the way that all these people are sort of fighting
about it on Twitter is actually kind of interesting.
Can you just remind me what Benford's law is? I mean, I have vague memory of this from one of our shows,
but like, what is it again?
So tell you what, let's play a part of the old piece and we'll come back to the election battle in just a second.
But this was like 11 years ago, we did a whole show about numbers.
It was actually during your paternity leave with them.
That's right. This is your paternity leave with them. That's right.
This is my paternity leave, yeah.
So what we did was sort of brought you back into the studio
to work you into the piece.
I'm Chad Abumrod.
And I'm Robert Krollwich.
This is Radio Lab.
We're still talking about numbers,
and now we're going to switch.
Love it me.
It was definitely a while ago, and it felt
a little bit like a simpler time.
But we're going to play it real quick.
And the story that we told basically came from an interview
that Robert did with a guy named Mark Negrini,
who's a business professor at the college of Jersey.
And his favorite story, the numbers tell,
actually starts back in 1938.
So imagine an office in the connected in New York
at the GE Research Laboratories.
And in that office is a man man he's sitting at his desk.
Mr. Frank Benford and Mr. Frank Benford is a physicist, so he's doing some difficult calculations
and it's hunched over a book. Probably actually the one of the most boring books you could imagine.
Yes, this is a book of logarithmic tables. What a logarithmic tables? So, logarithmic tables were a very convenient way
of doing multiplication in the early part of the last century. So remember this is before they
were calculators. So if you wanted to multiply something like 145 times 3564, you could just go to this
book and look it up. So it starts with numbers you might want to multiply by 1 to 100 on the first pages,
then 101, 2, up to 200, and 300. And the back of the book is like 900. The further you go, the higher
and higher the numbers you use to multiply. That's right. So Arbenford fellow, he's sitting there,
doing his calculations, and he's looking at the numbers, flipping through the book. He's staring
at the pages, and he notices something kind of weird.
He noticed that the first few pages were more worn than the last few pages.
Meaning more smudgy and dark-curnal oil-y as if he was using the front of the book
more than the last few pages.
And he wondered why is this happening?
Strings. We're not aware of favoring one part of the book over the other.
Am I doing something a little odd?
Or maybe it's something bigger. And that's when it hit him. He thought maybe in this world,
they are more numbers with low-frisk digits than with high-frisk digits.
What? More numbers that start with one or two, the numbers that start with seven, eight or nine.
Just because his book is more.
No, that's what started him thinking.
So here's what he did.
He compiled some tens of thousands of statistics.
That Steve Strogat's mathematician at Cornell University.
Just anything he could think of that was numerical,
molecular weights of different chemical,
baseball statistics, census data,
the revenues of all the companies listed
on the main stock exchanges in America. and everywhere he looked in all these different
categories it seemed yes there were more numbers beginning with one in twos than
eights and nines. Wait really? Oh yeah this has been checked out again and again
and again and it's true size of rivers, earthquakes and things like that.
populations or a number of deaths in a war, areas of counties.
Stream flow data.
What if you were to say get all the people in New York together and look at their bank accounts?
Bank account balances follow Benford's law nearly perfectly.
Meaning that if you just look in at the amount of money that people have,
matter of fact, in all the bank accounts, you'll find they begin with one more often than they begin with two.
Perfect.
Yes.
So actually they begin with one 30.1% of the time they'll begin with a two 17.6% of the time
they'll begin with a three 12.5% of the time.
That's a big difference.
Why would three be two?
I'm sorry I keep going.
And the poor nine would only occur as a first digit, 4.6% of the time, which actually would make the one
approximately six times as likely as the nine.
And it is quite amazing.
That is more than quite amazing.
That's deeply suspicious.
I mean, this is crazy what I'm telling you.
And I can't give you good intuition why it's true.
But Steve and Mark and many, many, many mathematicians
will tell you, despite what you may think,
there is a preference, a deep preference
in the world it seems for number sequences
that start with ones and then twos and then threes.
So that was how we did it.
There's more but.
I just feel like I'm re-experiencing
the weirdness of this log and is that
at the risk of tracking us off?
Why is that that you would get more ones than twos and all that?
Like this is, yeah, the question I had to say.
Do you want the radio lab explanation?
We don't want this to go in the upper right.
What did we say?
Like we did this originally.
This was a classic move.
It's like we get to the end of the piece.
You ask this very question.
That's it.
Huh?
You still having addressed the central mystery here.
Why in the world, whether it be more ones than nine shouldn't they be equi
Equico-incident. Yes. Well the answer is
Actually very complicated and deeply mathematical the simple answer is is there an answer?
Yes, there is an answer and it has to do do you understand the answer? No, just two
New Mary could be doing
Explain it to you. Okay, all right. But I will now
and then we laugh our way out of the room. The best explanation I heard was about, if you imagine
a lot of things grow, right? They like grow and get bigger, right? And so let's say you're a
you're a one, right? And you want to get to grow to be a two. You have to grow a hundred percent
to get to a two. So it takes you a long time to get from being a one to being a two. If
you're an eight and you want to grow to become a nine, right, that's just an eighth of you
that needs to grow again, right? So it's like, you're going to be in that eight zone way
shorter than you were in the between in the one to two zone.
Yeah.
So it's like log, it's kind of like some kind of a lot of rhythmic situation.
Exactly.
Jumping from one to two is huge, two to four is huge, four to eight is huge.
And that's why the hugeness of those jumps means you and I lost to the end of that sentence.
I mean, sort of why we couldn't in the first, but you know, the point of the piece, like
even though we didn't explain that, what the piece did do was sort of, and this is
the whole reason we did the piece, was go on to talk about how you can actually use
Benford's law in these really pretty consequential ways.
Right.
When Mark Nagrini first ran across Benford's law, Peacock? Maybe I can use this law to must people.
For payroll fraud, tax return fraud, you thought, hey, we can use this to catch a thief?
That's right.
Huh?
How?
Well, in the greenie...
And so, you know, basically, we explain that if books are cooked, like financial books
are cooked, they will not be following Benford's law.
So a real book would follow the law,
cooked books don't so you can spot a thief.
And we gave a bunch of examples of that, you know,
the little story.
Benford and boom, oh busted.
She eventually pled.
Or the guy with a $40 million punzy scheme.
Run Benford and boom.
Then at the end of the piece, boom, I mean,
Benford was an element in all these cases.
It wasn't the clincher, but still,
it is a very compelling argument.
And then 10 years from now,
it'll be the equivalent of a fingerprint.
The
which is, you know, a strong statement.
But I mean, at the same time, we didn't say anything
about elections and we weren't even thinking about elections.
Right. But when we come back from break, we didn't say anything about elections, and we weren't even thinking about elections.
Right.
But when we come back from break, we're going to go right into Benford's Law and election
data specifically, which is where the trouble begins.
Audi, this is Blake Krozer from Nashville, Tennessee.
Radio Lab is supported in part by the Alfred Peace Loan Foundation's enhancing public
understanding of science and technology in the modern world.
More information about Sloan at www.Sloan.org
Science reporting on Radio Lab is supported in part by Science Sandbox, a
Simon's Foundation initiative dedicated to engaging everyone with the process of science.
I'm Lutth if not sir this is radio lab we're talking about Benford's law and the 2020 presidential election.
Well, let me ask you have any smart statistics knowledgeable people actually tried to use
Benford's law to see whether there's election fraud
in any election anywhere?
Yeah, yes, yeah.
And for me in particular,
like I mean, you guys were mostly talking about like,
taxes and forensic accounting
and it was about money and numbers,
but in the connected episode I did,
I totally talked about election.
Well, there is one moment when our millions of personal decisions all get channeled in the same direction.
When free will is in full flower, or at least it's supposed to be.
There's a whole segment about here's how Benford's law can potentially relate to elections.
And in fact...
Walter?
Oh, hey, how are you, look?
Hi, how you doing?
The guy I talked to in that show is the guy that a lot of these people online
It's his work that many of these folks are citing so I was like now recording it says I
Better call him back up. Okay, sure
Introduce yourself for me. Just tell me who you are in line. Walter Mebbin at University of Michigan in Ann Arbor.
So what is happened between now and the last time we talk?
Well, the last time we talk.
And he had seen all of this stuff online.
He knew all about it because he had been getting all these emails in the last week.
So the trick all turned into a flood and my students start to write me and say, hey, this
is going on.
Saying, hey, look, Benford's Law and your work,
people are using it basically to claim fraud.
I was originally pointed to a Twitter post,
and then there was another Twitter post,
and then I found a YouTube video,
and I don't know which one came first.
So basically what a lot of people are doing
is they're taking these like precinct tallies
and saying, okay, look, we're entering all the precinct tallies,
we're taking the first digits from all the vote tallies
for each of the candidates in each of these precincts
and then we're checking how often it starts to one,
how often it starts to two and so on.
Take Chicago, right?
They did Chicago, didn't they?
Let's say they did, yeah.
It's not a swingy one, it's a one that Biden won by a lot.
But let's just do this real quick, Chicago,
how many voting precincts?
I think there's something like hundreds
of not thousands of them.
Yeah, let's just say a thousand.
Okay, sure.
You just take the votes for Donald Trump
in each of those precincts.
Here he got 238, here he got 462, here he got 521.
And then on another column, you take all the totals
for that Joe Biden got in those same precincts,
266 here, 892 here, 326 here.
And what it is that's popping up is oftentimes basically just a little graphic, two little
curves with the vote counts.
It's like, here's Trump's curve.
Look at all the ones and a little less two, a little less three, nice sweeping little curve.
Oh, here's Biden.
So it's that weird.
There's a bunch of sixes in that bumpy curve.
Right. And they're just likees in that bumpy curve. Right.
Right.
And they're just like, look at these two.
Nice.
And the way that they see it.
So this one is for some data from Michigan.
It's like, okay, there's less ones than there should be.
There's way less twos than there should be.
There's more threes than there should be.
There's more fours, way more fives, more sixes, more sevens, more eights, and less nine.
So it's like, it's all over the curve, right?
Is the claim?
To the saying that the Biden numbers are,
they don't fall according to this pretty Benford curve
there for something, something.
Probably somebody was just like keying them in on some.
Some fake adding, some kind of false fraudulent adding
Yeah, no numbers because the idea is that random numbers
Followed this pattern, but human created numbers don't this is a way of seeing human
meddling
That's the that's the I that's the claim, but all this is
Completely ridiculous Walter was just like you just can't do this with precinct says is very well known That's the claim. But all this is completely ridiculous.
Walter was just like, you just can't do this with precincts.
Says it's very well known.
The first digit of precinct vote counts are not
useful for diagnosing frauds or anything else.
So I started jumping into those threads,
correcting people and explaining,
which didn't win me a lot of friends on Twitter,
at least among that audience.
So once this thing blew up online,
another person I talked to on my show.
Jennifer Goldbeck, professor at the University
of Maryland's College of Information Studies.
Is a researcher who studies social media?
She's like on Twitter, like fighting battling with people.
Yeah, so.
Or are you getting nasty dams, is that?
Yeah, I mean, not the worst that I've gotten, right?
More, you know, people cross a threshold where they don't understand where I'm so what I'm saying anymore
So then they just start calling me names, which is okay
I mean, it's for sure not a dumb idea to wonder if we can do this and
Part of the attraction for Benford and why people really like it here is that it's super surprising
when you first learn about Benford, right?
My mind was blown the first time I learned about it.
And so on radio lab, right?
Yeah, on radio lab, that's right.
I mean, it is what got me started doing research
in this space was like listening to that radio lab
and not being able to stop thinking about it, you know,
for weeks.
There are papers out there that say, look,
Benford could be really powerful for doing this,
but it's not the way anybody on Twitter was doing it.
The problem is when you're looking at precinct data,
these precincts are basically purposefully made
just for counting election data,
and the reason that those are helpful
is because they're all the same size.
Normally when Benford works, you've got lots of orders of magnitude.
So say we're looking at the length of all the rivers in the world.
We know that that follows Benford.
So there's some really short rivers and there's some super, you know, continent-long rivers.
So there's a lot of changes in the orders of magnitude.
But the precincts do not have multiple orders of magnitude.
Precincts tend to be kind of the same size.
And the first digit is primarily determined by the size of the precinct.
It has nothing to do with anything about the behavior of the voters or the officials or
the vote counters.
It's just a matter of the design of the precincts.
Wait, how so?
How would the first number have to do with the size of the precinct?
Well, Waltzor put it this way.
Imagine that all the precincts are about a thousand in size, which is roughly a convenient number,
but quick, whatever you like. And imagine the two candidates get about 50% of the vote.
Which was true in a lot of the, you know, swinging precincts that this data was based on.
And if you look at the Trump Biden totals in those places...
Then the first digit everywhere is going to be a four or five. precincts that this data was based on. And if you look at the Trump Biden totals in those places,
then the first digit everywhere is gonna be a four or five.
You're gonna get 500 to this guy,
to 500 to this guy, 400 to this guy, 600 to this guy.
You're probably never gonna get a one in front of anything
unless it's a blowout.
That's interesting.
So they're skewing,
because of the way that the precincts are drawn up,
you're skewing towards the middle, I would imagine.
You're gonna get a whole bunch more force in 566s.
And that's exactly what that chart,
I was telling you about before,
that sort of gone a little bit viral,
which is like, that's exactly what it shows.
So it says that Michigan data, where it's like,
oh look, the highest numbers are for 4s, 5s, and 6s.
Oh, that's funny.
So you're like, oh, so that makes sense
that Biden would have gotten 4s, 5s, and 6s, right?
And then Jen pointed out, okay,
now take another 1,000 person precinct
and say this is one where Biden has a really big lead.
So if Trump gets 150 votes,
then Biden's gonna get 850 votes.
He's gonna get the rest, right?
So if Trump follows a Benford's law distribution, Biden necessarily will not
follow that distribution because he's getting the other part, right? But then shouldn't
Biden's votes be the exact opposite, like the mirror of the curve or something close to
it? So if it were this situation where we had like a thousand people in every precinct
that would totally be true, in reality, there's like variation.
And so if you look at the plotting of this, what you actually see is that Biden's curve
pretty much matches the curve of how many people are in the precinct.
So that's not Ben Praslaw.
That's just because the precinct sizes are the way they are.
Yeah, I mean, I don't think there's any reason to expect anybody's gonna look like Benford or not,
because this first significant digit analysis
of Benford and elections doesn't work, period.
Now, there are definitely statisticians out there
who say that you cannot benfordize elections at all.
But Walter, the thing that's interesting about him,
is that he thinks, and this is, you know,
a lot of the work that he does, that you can look at elections if you look at not the first digit,
but at the second digit. This has become his thing. So he's like looked at elections in Kenya,
in Russia, and Germany, and Turkey, and Mexico, and Iran, and often he is using Benford's law on the second
digit. The first digit is going to be a function of how big the precinct is, which is a human
construct. But the second digit, that's going to be more subject to the laws of randomness and
whatever, yada yada yada. That is not a human construct. Not probably, hopefully, hopefully,
right. So yeah, super stupid caveat.
Most mathematician, total freak out geeks, we'll say,
is the thing happening with the second digit,
is not strictly Benford's law.
It's just a very Benford-like thing.
That's a matter.
You can spot it.
He uses phrases like Benford inflected,
or Benford-like, or Benford.
Like it's like, that guy's a...
And Walter also makes clear that even if you do look
at the second digit even then it's not a perfect signal for fraud. He'll do the bent for
thing right but he also has this whole like sort of a tool kit, a toolkit right that he's made
of all these other different statistical things that you bring to bear on it because it's super
complicated in backgrounds and expectations and then you do this with a regression to the blah, blah, blah.
So he's going to bring the whole suite of many complicated, I'm a super smart person tools to bear on this.
Right.
And the tools that he's, he's sort of picked up in literally 20 years of studying
elections statistically for fraud.
But anyway, okay, so those are kind of, in general, the reasons why he looks at all the stuff
that sort of these kind of amateur Benford's law analysts
online are throwing up.
He looks at that and he's like, that's ridiculous.
And then so then he was like, okay, I'm over the weekend,
he was like, I'm just gonna run this myself.
I produced a working paper, which I think you have the link for, and it has the diagnostics
not only for the second digit, but the other diagnostics that I've developed over the
past decade and a half.
And what did you find in running that stuff, the second digit stuff?
Well, there's nothing.
I mean, there's no sign of frauds of any kind. It was like this is a totally
ordinary, in fact, it seems pretty exemplary. As far as I've seen, there's nothing bad that happened
anywhere. This has been miraculous. This election has been so smooth.
You can't just plug some numbers into your Excel spreadsheet after you read a Wikipedia page and
do it. But people who feel like, look, I can see the difference and a bunch of these look the
right way and one looks the wrong way.
So forget what you're saying about it being complicated.
It must be true.
Those are hard people to convince.
It's just common sense enough, but it's just obscure
and complicated enough that it feels like
it's like a piece of evidence that could take on,
yeah, sort of a life of its own.
It could be this thing that then,
there's a certain, admittedly small,
but segment of the population that's like,
now whenever they talk about the 2020 election,
they'd be like, oh, and Ben first law shows that it's all,
it was all bogus, you know?
Yeah, it strikes me as like,
is it gonna go to court and win the day?
No.
Is it the kind of thing that maybe next Tuesday,
Donald Trump would say in a press conference?
Press?
Yes.
Yeah.
It's also like, it's the kind of thing
that steps into the breach of a very uncomfortable phenomenon.
And we all experienced this, right?
Which was election night, you had one result.
And then you had this weird, creeping, slow tilt over the next three or four days.
And even though we ourselves did a story about how that tilt was coming, the tilt still
felt surprising and non-intuitive in some emotional way. And so then that is fertile ground
for these benford-esque ways of explaining the irregularity.
Yeah, but even though the people who actually know what they're talking about say it doesn't
explain the reality.
No, but see, this is the problem with the world we live in, is that there's never been
more distrust of leadership and expertise, but you're asking us at a moment when we're
very unlikely to trust experts to explain this thing.
Just all to say, like, it does feel like it's hard to explain.
It's going to be hard for a lot of people to explain,
like why, especially if they're not resorting to experts
to help explain it, like how did it go this way
when everybody I talk to seems to have wanted
the opposite to happen?
Yeah, totally.
If I'm myself wanting to do sort of a cost benefit on Benford's applications in the world, well, the upside would be financials.
The upside would be financials.
Well, not just financials.
So there's, yeah, financials is a big one.
I mean, tax fraud.
OK.
OK.
But also another thing is, and this is something that Jen uses in her other research, which
is her main research, is how you use Benford's
on social media.
So you can use it to detect bots online.
You can see and she has done that very thing.
How?
You can also use it.
No, keep going.
I'm going to double back.
No, so you literally, you count, and this is cool because people can do it at home too. You count up not, so all the follower numbers
of your followers on Twitter, Facebook, whatever,
and you can see based on the followers of your followers,
if you just take those first digits,
it should follow then for it's long.
And if it doesn't, yeah.
That's why.
And then there's a, is that so we just Twitter use that
to prune bots,
as they do from time to time.
I don't know if Twitter officially uses it.
She Jen definitely uses it.
And then not only that, but also you can use,
and this is another thing in the TV show
that we did, I did an interview with former radio lab guest,
also one of my former college professors,
Hony Farid, you can use it to basically sniff out deepfakes
and you can use it to sniff out manipulated media
in an era when it's getting way harder to tell the truth.
You basically look at these embedded number values
in a single picture and you can see whether that picture
has been edited or how many times it's been saved
or kind of things like that.
So it's actually, I mean, it definitely is a force
for chaos and doubt and it's also sometimes
a kind of a superhero for truth.
If only we could understand it.
Right.
Alright, thank you Latif and Soren for taking me back into the past. And forward into now.
Of course.
No problem.
Definitely go check out,
connected, lots of show on Netflix,
you can find it there.
It's amazing.
The episode with the Benford stuff is called Digits.
And also in addition to that, you can go back into the deep past and listen to the radio lab show called Numbers,
which has not only all that
Benford goodness,
but also a love story,
some babies, and
even a little Johnny Cash.
It's a great episode.
I'm Jed Avon-Rod, and I'm Loughtiff Nasser.
Thanks for listening. This is Mark Schoendorf from Highland Park, Illinois.
Radio Lab was created by Chad Habenrod and is edited by Soren Wheeler.
Louisville Miller and Lentit Nasser are our co-hosts.
Dylan Keith is our director of sound design.
Susie Lektenberg is our executive producer.
Our staff includes Simon Adler, Jeremy Bloom, Dr. Pressler, Rachel Q.Sick, David Gable,
Tracy Hunt, Matt Kielte, Tobin Lowe, Annie McEwan, Sarah Curry, Aryan Wack, Pat Walter's,
and Molly Webster.
With help from Shima O'Leigh, Sarah Sandedback and Johnny Mones,
our fact checker with Michelle Harris.