Everything Everywhere Daily: History, Science, Geography & More - Correlation vs Causation

Episode Date: May 23, 2022

You have probably heard the old saying that there are lies, damned lies, and statistics.  There are several reasons why statistics are often misinterpreted. One of the biggest is the confusion betwee...n the two concepts of correlation and causation.  This confusion is not only made by laypeople but also by members of the media and scientists. Learn more about correlation and causation and why one doesn’t necessarily imply the other on this episode of Everything Everywhere Daily.  Subscribe to the podcast!  https://podfollow.com/everythingeverywhere/ -------------------------------- Executive Producer: Darcy Adams Associate Producers: Peter Bennett & Thor Thomsen   Become a supporter on Patreon: https://www.patreon.com/everythingeverywhere Update your podcast app at newpodcastapps.com Discord Server: https://discord.gg/UkRUJFh Instagram: https://www.instagram.com/everythingeverywhere/ Twitter: https://twitter.com/everywheretrip Website: https://everything-everywhere.com/everything-everywhere-daily-podcast/ Everything Everywhere is an Airwave Media podcast." or "Everything Everywhere is part of the Airwave Media podcast network Please contact sales@advertisecast.com to advertise on Everything Everywhere. Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 You've probably heard the old saying that there are lies, damned lies, and statistics. There are several reasons why statistics are often misinterpreted. One of the biggest is the confusion between the two concepts of correlation and causation. This confusion is not only made by lay people, but also by members of the media and scientists. Learn more about correlation and causation and why one doesn't necessarily imply the other on this episode of Everything Everywhere Daily. What if your perceptions about the past were wrong? throughline is a podcast that takes you back in time to uncover the parts of the story that may have gone unnoticed.
Starting point is 00:00:49 It effectively turned day into night and how it shaped the world now. Time travel with us every week on the ThruLine podcast from NPR. I've done several episodes where I dealt with logical and statistical fallacies. Today, I want to focus on what is perhaps the most frequent logical and statistical error that many people succumb to. The confusion of correlation and causation, or as it's known in Latin, Kumhawk ergo propropterhoc, which translates to with the fact, therefore, because of the fact. Let's say you're doing a study, and you're looking at two different things. These things can quite literally be anything.
Starting point is 00:01:32 Once you've collected the data, you then plot the data on a chart to see how they correlate. Roughly speaking, there are three ways two different variables can correlate. There can be a positive correlation. When one variable goes up, the other goes up. and when one goes down, the other goes down. There can be a negative correlation. When one variable goes up, the other goes down, and vice versa. And the third possibility is to not correlate whatsoever.
Starting point is 00:01:56 When one variable goes up, the other one might go up or it might go down. I am vastly oversimplifying this because there are gradations within all of these. Two variables could be strongly correlated or weakly correlated. But let me give you some examples. What do you think the relationship is between the number of doctorates awarded in civil, engineering and the per capita consumption of mozzarella cheese. Or how about the divorce rate in the state of Maine and the per capita consumption of margarine? Or what about the annual number of people who drowned by falling into a swimming pool
Starting point is 00:02:28 and the number of new movies released starring Nicholas Cage? All three of these things you might be thinking have absolutely nothing to do with each other. And you'd be right. They don't really have anything to do with each other. Yet, shockingly, all three of these examples have really. really strong correlations. Nicholas Cage movies and swimming pool drownings were positively correlated by 67% between 1999 and 2009. Mozzarella consumption and civil engineering doctorates positively correlated by 95% between 2000 and 2009. And finally, divorce rates in Maine and
Starting point is 00:03:05 Marjorin consumption were positively correlated by 99.2% between 2000 and 2009. When you hear or see something with such a strong correlation, you might start to wonder why. What is the cause between these things? In the examples I just gave, there isn't anything. It was all just chance. They were discovered by data mining. Just keep looking at data sets of everything, and eventually you'll find two sets of variables that correlate.
Starting point is 00:03:31 These are spurious examples. I don't think anybody listening really believes that new Nicholas Cage movies are causing people to jump into swimming pools and drowned. But then again, he has made a lot of really bad movies. I bring up these examples to prove the point that correlation does not necessarily imply causation. I have to mention this because the confusion comes into play when there actually is causation. Because when there is causation, there will be correlation. Let's look at another trivial example that illustrates the point.
Starting point is 00:04:04 There is a positive correlation between the number of points a sports team scores, and it really doesn't matter the sport, and the number of wins. This is because there is a direct causation between points and wins. The team that scores the most points will win the game. Over the course of a season, the team with the more points will probably win more games. This isn't a perfect correlation, of course. A team could lose a bunch of very close games and then win one game in a big blowout. However, this relationship is very strong the more games you look at. The English Premier League recently ended, and looking at the end of year standings,
Starting point is 00:04:38 the team with the four highest goal differentials were the top four teams. teams, and the teams with the six worst were the bottom six. Things, however, can be, and usually are, a lot messier than these trivial examples. In the 19th century, lung cancer was a relatively rare form of cancer. However, in the 20th century, cases of lung cancer exploded. Researchers soon found a very strong correlation between people who smoke cigarettes and people with lung cancer. Very early on, many people rightly pointed out that correlation doesn't imply causation, Just because people smoke doesn't mean that it caused cancer. In fact, as late as 1960, two-thirds of all doctors in the United States didn't think that the case-linking smoking and lung cancer had been firmly established.
Starting point is 00:05:22 While correlation doesn't imply causation, if there was causation, there would be correlation. Eventually, researchers determine that smoking causes cancer by conducting experiments at the cellular level, and no one really doubts this relationship anymore. The case between smoking and cancer is pretty straightforward. But let's add another variable into the mix. To the best of my knowledge, no study on this has ever been done, but for the sake of argument, let's assume that there is a positive correlation between people who carry lighters in their pocket and lung cancer. Even though this study has never been done, I'm willing to bet that there probably is some positive correlation between these two things. Why would there be a correlation between carrying a lighter and lung cancer?
Starting point is 00:06:07 Do lighters cause lung cancer? Are people with cancer compelled to carry a lighter? In this case, it would be an example of a confounding variable. There's something that is behind both of the variables, and in this example, the confounding variable would be smoking. Smokers are more likely to get cancer, and smokers are more likely to carry a lighter. Hence, people with lighters would probably be more likely to have lung cancer.
Starting point is 00:06:31 The problem of confounding variables is a huge one in most research. This is especially a big problem in nutritional science. The vast majority of papers published on nutrition are what are known as epidemiological studies. These are observational studies, really just surveys, which only produce correlations. Yet, more often than not, these studies are reported as if there is some sort of causation, or it's at least heavily implied. And I'm sure you've seen headlines that report eating X causes cancer or heart disease. Well, maybe it does, but you can't determine that via a correlation.
Starting point is 00:07:06 The problem with nutritional surveys is that there are so many confounding variables. How many different foods have you consumed over the last six months? Could you possibly list them all including how much of each food you ate? It's extremely difficult to near impossible to isolate a single variable that would be causal with some health result. And on top of that, eating certain foods might correlate with certain lifestyle choices like working out and exercising. Another interesting co-founding variable study was showing the effects of alcohol consumption on heart disease. There have been several studies showing that people who drink moderate amounts of alcohol showed lower risks of heart disease than those who drank heavily or those who didn't drink at all.
Starting point is 00:07:45 Why would some alcohol be better than no alcohol or a lot of alcohol? Why is there a Goldilocks amount of alcohol? Well, it could be that moderate alcohol consumption wasn't the cause of anything. It was just something that people who were already healthy did. Another problem with studies is what is known as P-hacking. As I mentioned before, there can be strong correlations and weak correlations. When you make a hypothesis for a published scientific paper, the probability value of a hypothesis being false, which goes by the variable P, usually has to be under 0.05, or 5%. There's nothing special about a P value of 0.05. It's the value everyone uses because it's the value everyone uses.
Starting point is 00:08:30 The problem with this number is that you don't have to test that many variables to get something statistically significant just by chance. chance. Let's assume you gather data on seven different variables and compare each of the seven variables against each other. That's 21 possible combinations, which means that the odds of at least one of those combinations will yield a statistically significant result only by chance. P-hacking is a very fancy word to describe throwing stuff against a wall to see what sticks. You can then write your research paper on the two variables that correlate, totally ignoring and not telling anyone else about the 20 other combinations you tried that yielded no results. One final thing to know about correlation and causation is that oftentimes the causation can be reversed.
Starting point is 00:09:16 Even if there is a causation, it's often hard to figure out in what direction the causation flows. One popular statistic, which is often floated, is that people who attend college will earn more money in their lifetime than people who do not. this correlation is true. However, everyone just assumes that attending college is what's responsible for the increase in income. However, believe it or not, there is shockingly little to actually support this. In fact, there's a very good argument that it might be the other way around. People who are destined to make more money just happen to be more likely to go to college. This comes as a shock to most people, but there are ways to test it and it's been done.
Starting point is 00:09:57 First, you can look at the subset of people who had similar grades and test scores in high school and compare those who went to college to those who didn't. Those who didn't go to college might have done so for personal or financial reasons, even though they could have gotten accepted. What studies have found is that this group that could have gone to college but didn't had similar lifetime incomes to those that did attend college. They were only slightly lower and that doesn't even factor in the debt that may have been accrued by going to college.
Starting point is 00:10:25 Likewise, there was a study done on people who were. were accepted and graduated from elite universities and those who applied but were not accepted to elite universities and went to college elsewhere. Again, the results showed that the lifetime earnings of the two groups were pretty much exactly the same. Winning the lottery of getting accepted to an elite university was secondary to just being part of a group that was smart and ambitious enough to even bother to apply in the first place. Understanding the differences between correlation and causation is really important, and it's very easy to confuse them. The reason it's so confusing is that if a relationship is causal, it will show a correlation.
Starting point is 00:11:04 The next time you hear a news report or read a headline with some scientific finding that shows a link between two things, you should always have a big dose of skepticism, or at least think of possible confounding variables, which might better explain that which is being presented. Everything Everywhere Daily is an Airwave Media podcast. The executive producer is Darcy Adams. The associate producers are Thornton and Peter Bennett. I just wanted to extend a big thank you to everyone who is supporting the show over at patreon.com. I have show merchandise available there, including hoodies, t-shirts, and stickers.
Starting point is 00:11:40 Plus, it really just helps me get this show out every single day, including, of course, weekends and holidays. Remember, if you leave a review or send me a boostogram, you too can have it read on the show. Ah, not a bear in sight. The bear patrol must be working like a charm. Specious reasoning, Dad. Thank you, honey. By your logic, I could claim that this rock keeps tigers away. Oh, how does it work?
Starting point is 00:12:09 It doesn't work. Uh-huh. It's just a stupid rock. Uh-huh. But I don't see any tigers around here, do you? Lisa, I want to buy your rock.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.