The Peter Attia Drive - #269 - Good vs. bad science: how to read and understand scientific studies

Episode Date: September 4, 2023

View the Show Notes Page for This Episode Become a Member to Receive Exclusive Content Sign Up to Receive Peter’s Weekly Newsletter This special episode is a rebroadcast of AMA #30, now made avai...lable to everyone, in which Peter and Bob Kaplan dive deep into all things related to studying studies to help one sift through the noise to find the signal. They define various types of studies, how a study progresses from idea to execution, and how to identify study strengths and limitations. They explain how clinical trials work, as well as biases and common pitfalls to watch out for. They dig into key factors that contribute to the rigor (or lack thereof) of an experiment, and they discuss how to measure effect size, differentiate relative risk from absolute risk, and what it really means when a study is statistically significant. Finally, Peter lays out his personal process when reading through scientific papers. We discuss: The ever-changing landscape of scientific literature [2:30]; The process for a study to progress from idea to design to execution [5:00]; Various types of studies and how they differ [8:00]; The different phases of clinical trials [19:45]; Observational studies and the potential for bias [27:00]; Experimental studies: randomization, blinding, and other factors that make or break a study [44:30]; Power, p-values, and statistical significance [56:45]; Measuring effect size: relative risk vs. absolute risk, hazard ratios, and “number needed to treat” [1:08:15]; How to interpret confidence intervals [1:18:00]; Why a study might be stopped before its completion [1:24:00]; Why only a fraction of studies are ever published and how to combat publication bias [1:32:00]; Frequency of training for Olympic weightlifting [1:22:15]; How post-activation potentiation (and the opposite) can improve power training and speed training [1:24:30]; The Strongman competition: more breadth of movement, strength, and stamina [1:32:00]; Why certain journals are more respected than others [1:41:00]; Peter’s process when reading a scientific paper [1:44:15]; and More. Connect With Peter on Twitter, Instagram, Facebook and YouTube

Transcript
Discussion (0)
Starting point is 00:00:00 Hey everyone, welcome to the Drive Podcast. I'm your host Peter Atia. This podcast, my website, and my weekly newsletter all focus on the goal of translating the science of longevity into something accessible for everyone. Our goal is to provide the best content in health and wellness, and we've established a great team of analysts to make this happen. It is extremely important to me to provide all of this content without relying on paid ads. To do this, our work is made entirely possible by our members, and in return, we offer exclusive member-only content and benefits above and beyond what is available for free.
Starting point is 00:00:46 If you want to take your knowledge of this space to the next level, it's our goal to ensure members get back much more than the price of the subscription. If you want to learn more about the benefits of our premium membership, head over to peteratia-md.com forward slash subscribe. Welcome to a special episode of the drive. For this week's episode, we're going to re-broadcast AMA number 30 on how to read and understand scientific studies, which was originally released in December of 2021. While this was originally released as an AMA for subscribers only, due to how important of a topic this is, we've decided to re-release
Starting point is 00:01:24 it and make it available for everyone today. If you're a consumer of this podcast or any of our weekly emails, you know that I place a large emphasis on scientific literacy, and how the media often gets this wrong, and even well-intentioned scientists sometimes misrepresent or misunderstand their own results. And so this episode is our effort to try to help you with them. In this episode we discuss what is the process for a study to go from an idea to a design to execution. What are the different types of studies out there and what do they mean? What are the strengths and limitations of each of them?
Starting point is 00:02:00 How do clinical trials work specifically for drugs, for example? What are the common pitfalls of observational studies that you should be looking for? What questions should you be asking about a study to figure out how rigorous it was? What does it mean when a study is statistically significant? And is this the same as it being clinically significant? Why does some studies never get published? And what is my process for reading scientific papers? So without further delay, I hope you enjoy or re-enjoy this special episode on how to read and interpret scientific studies.
Starting point is 00:02:32 Hey Bob, how are you man? Looking pretty studious there in the library today. Hey Peter, thanks very much. We had just getting some reading in before the podcast. This is going to be a pretty good one, because as you may recall about, I don't know, four or five months ago, maybe longer, I was on a podcast with Tim Ferriss. And I don't know how it came up, but I do remember somehow it came up that we had spent a lot of time writing this series, studying studies, and God, that's been four years ago,
Starting point is 00:03:04 I think. But we didn't really have something more digestible for folks on how to make sense of the ever-changing landscape of scientific literature and how to kind of distinguish between the signal and the noise of the research news cycle. And I remember after that, Tim and I went out for dinner and he kept pressing me on, well, what can I do to get better at this process? Are there newsletters I see be subscribing to and things like that?
Starting point is 00:03:29 And while I'm sure that there are, I didn't know what they were off top of my head. And so I think what we've done here, when I say we, I mean you, what you have done here is aggregate all the questions that have come in over the past year, basically, that pertain to understanding the structure of science. I looked through the questions last week and I was pretty excited.
Starting point is 00:03:50 I think it's going to be a sweet discussion and I hope this serves as an amazing primer for people to really understand the process of scientific experiments and everything from how studies are published and obviously what some of the limitations are. So anything else you want to add to that, Bob, before we jump in? I agree. I think it's a fun topic. We get so many of these questions that we end up early, sti-do, or to the website, where we'll point readers to one of the parts of the studying studies, but I think sometimes
Starting point is 00:04:17 just talking about it and explaining it can help a lot. So I think this will be really useful as far as like a question and answer session rather than just treating a blog. I don't think this displaces that other stuff. I think we go into probably more detail on some things there, but I also think we're going to cover things here that aren't covered there. So depending on how you like to get your info, this could be fun. So where do you want to start? We have again a lot of questions, but I think this question gets to the core of I think what we're trying to do here, which is, how can a user or a person who has no scientific background better understands studies that they read in the news or in the publications to know if the findings are
Starting point is 00:04:55 solid or not, especially in today's age where you can easily see two studies that contradict each other. Coffee's good, coffee's bad. Eggs are good, eggs are bad. So I thought we could run through a bunch of questions with the first one that we got here is, what is the process for a study to go from an idea to design an execution? This is a great question. In theory, it should start with a hypothesis. Good science is generally hypothesis driven. I think the
Starting point is 00:05:26 cleanest way to think about that is to take the position that there is no relationship between two phenomena. We would call this sort of a null hypothesis. So my hypothesis might be that drinking coffee makes your eyes turn darker. So I would have to state that hypothesis, and then I would have to frame it in a way that says, my null hypothesis is that when you drink coffee, your eyes do not change in color in any way, shape or form. And that would imply that the alternative hypothesis is that when you drink coffee, your eyes do change color.
Starting point is 00:06:10 You can already see, by the way, that there's nuance to this. Because am I specifying what color it changes to? Does it get darker? Does it get lighter? Does it change to blue? Green? Does it just get the darker shade of whatever it is?
Starting point is 00:06:23 But let's put that aside for a moment and just say that you will have this null hypothesis and you will have this alternative hypothesis. And to be able to formulate that, cleanly, is sort of the first step here. The second thing, of course, is to conduct an experimental design. How are you going to test that hypothesis? As we're going to talk about, a really, really elegant way to test this is using a randomized controlled experiment. If it's possible to blind it, we'll talk about what that means.
Starting point is 00:06:51 You'll have to decide, well, how long should we make people drink coffee, how frequently should they drink coffee, how are we going to measure eye color. These are the questions that come down to experimental design. You then have to determine a very important variable, which is how many subjects will you have, and of course, that will depend on a number of things, including how many arms you will have in this study. But it comes down to doing something that's called a power analysis, and this is so important that we're going to spend some time talking about it today, although I won't talk about it right now. If this study involves human subjects or animal subjects,
Starting point is 00:07:25 you will have to get something called an institutional review board to approve the ethics of the study. So you'll have to get that IRB approval. You'll have to determine what your primary and secondary outcomes are, get the protocol approved, develop a plan for statistics, and then pre-register the study. All of these things happen before you do the study, and of course, in parallel to this,
Starting point is 00:07:47 you have to have funding. So those are kind of the steps that go into doing an experimental study. And what we're gonna talk about, I think in a minute, is that there are some studies that are not experimental, where some of these steps are obviously skipped. Yeah, one of the questions we got was, what are the different types of studies out there,
Starting point is 00:08:05 and what do they mean? For example, observational study versus a randomized controlled study. What are the different types of studies? I think broadly speaking, you can break studies into three categories. One would be observational studies. We'll bifurcate those or try for cake those in a minute. Then you can have experimental studies. And then you can have basically some nations of and or reviews of and or analyses of studies of any type.
Starting point is 00:08:40 Let's kind of start at the bottom of that pyramid. I think you actually have a figure that I don't like very much, but I was gonna say yeah That was one of your favorites. Yeah, I can't stand it. I'll tell you what I like about the figure I like the color schema because my boys are so obsessed with rainbows that if I show them this figure They're gonna be really happy. So let's pull up said rainbow figure. Okay, got it. Okay, so you can see these buckets here. And again, at the level of talking about them, I think this makes sense. What I don't agree with the pyramid for Bob is that it puts a hierarchy in place that suggests
Starting point is 00:09:17 a meta-analysis is better than a randomized control trial, which is not necessarily true. But let's just kind of go through what each of these things mean. So looking at the observational studies, an individual case report is first or second paper I ever wrote in my life when I was in medical school was an individual case report. It was a patient who had come into clinic when I was at the NIH. This was a patient with metastatic melanoma, and their calcium was sky-high, dangerously high, in fact. And obviously our first assumption was that this patient had metastatic disease to their bone, and that they were lysing bone, and calcium was leaching into their bloodstream. It turned out that wasn't the case at all. It turned out they had something that had not been previously reported in patients with melanoma, which was they had developed this parathyroid hormone-related-like hormone in response to their melanoma.
Starting point is 00:10:11 This is a hormone that exists normally, but it doesn't exist in this format. And so their cancer was causing them to have more of this hormone that was causing them to raise their calcium level. It was interesting because it had never been reported before in the literature. And so I wrote this up. This was an individual case report. Is there any value in that? Sure. There's some value in that. The next time a patient with melanoma shows up to clinic and their calcium is sky high and someone goes to the literature to search for it. They'll see that, and it will hopefully save them time in getting to the diagnosis.
Starting point is 00:10:46 You're mentor and friend, Steve Rosenberg. I think of him when I think of individual case reports. I think if you listen to the podcast, he talks about this, but a lot of what motivated him early on, I think we're just a couple of cases. I think it gets back to that first question, too, about the process for a study to go to an idea to design execution, which is to have a hypothesis you need to make an observation. And so you make an observation, you say, hmm, that's strange. And I think that that's what individual case reports can represent sometimes.
Starting point is 00:11:14 This is an interesting observation. It's hypothesis generating for the most part, but it really might kick start a larger trial or it might kick start a career. You never know. Exactly. Now, of course, it's not going to be generalizable. I can't make any statement about the frequency of this in the broader subset of patients. And obviously, I can't make any comment about any intervention that may or may not change the outcome of this. So that
Starting point is 00:11:39 gets us to kind of our next thing, which is like a case series or set of studies. So here you're basically doing the same thing, but in plural, effectively. You wouldn't just look at one patient, you would say, well, I've now been looking back at my clinical practice, and I've had 27 patients over the last 40 years that have demonstrated this very unusual finding. Another example of this going back to the Steve Rosenberg case would be one could write a paper that looks at all spontaneous regressions of cancer. Obviously spontaneous regressions of cancer are incredibly rare, but there are certainly enough of them that one could write a case series. So now let's consider cohort studies.
Starting point is 00:12:27 So cohort studies are larger studies, and they can be retrospective or they can be prospective. So I'll give you an example of both. So a retrospective observational cohort study would be, let's go back and look at all the people who have used saunas for the last 10 years and look at how they're doing today relative to people who didn't use saunas over the last 10 years. So it's retro-spective. We're looking backwards.
Starting point is 00:12:59 It's observational. We're not doing anything, right? We're not telling these people to do this or telling those people to do that. And the hope when you do this is that you're going to see some sort of pattern. Undoubtedly, you will see a pattern. Of course, the question is, will you be able to establish causality in that pattern? Cohort studies can just as easily, although more time-consumingly, be prospective. So you could say, I want to follow people over the next five years, 10 years who use sonnas and
Starting point is 00:13:29 Compare them to a similar number of people who don't and Now in a forward-looking fashion we're going to be Examining the other behaviors of these people and ultimately what their outcomes are do they have different rates of death Heart disease cancer Alzheimer's disease other metrics of health that we might be interested in? Again, we're not intervening. There's not an experiment per se. We're just observing, but now we're doing it as we march forward through time. So this brings us to the kind of the next layer of this pyramid, which are the experimental studies. Divide these into randomized versus non-randomized.
Starting point is 00:14:05 And of course, this idea of randomization is going to be a very important one as we go through this. So a non-randomized trial sometimes gets referred to as an open label trial, where you take two groups of people and you give one of them a treatment and you give the other one either a placebo or a different treatment, but you don't randomize them. There's a reason that they're in that group. So you might say, we want to study the effect of a certain antibiotic on a person that comes in the ER, and we're going to take all the people that come in who look a certain way. Maybe they have a fever of a certain level or a white blood cell count of a certain level. We're going to give them the antibiotic and the people who come in, but they don't have
Starting point is 00:14:56 those exact signs or symptoms. We're going to not give an antibiotic to and we're going to follow them. That's kind of a lame example. You could do the same sort of thing with surgical interventions. We're going to try to ask the question is surgery better than antibiotics for appendicitis or suspected appendicitis, but we don't randomize the people to the choice.
Starting point is 00:15:17 There's some other factor that is going to determine whether or not we do that. As you can see, that's going to have a lot of limitations because presumably there's a reason you're making that decision and that reason will undoubtedly introduce bias. So of course, the gold standard that we always talk about is a randomized control trial
Starting point is 00:15:36 where whatever question you want to study, you study it, but you attempt to take all bias out of it by randomly assigning people into the treatment groups, the two or more treatment groups. We'll talk about things like blinding later, because you can obviously get into more and more rigor when you do this, but before we leave the kind of experimental site,
Starting point is 00:15:59 anything you wanna add to that, Bob? I would add, so non-ranomized controlled trials, maybe another example, illustrative example, I think, with non-ranomized controlled trials, maybe another example, a lestrative example, I think, with non-ranomized controlled trials might be you have patients maybe making a decision beforehand, which will get into selection bias, but they might want to go on a stat and let's say, and then you give them a choice. The other ones might want to go on some other drug like Azetamib.
Starting point is 00:16:19 They're basically selecting themselves into two groups, but you could compare those two groups and see how they do, but it hasn't been randomized. There's a lot of bias that can go into that. There could be a lot of reasons why one group is selecting a particular treatment over the other. That's why I think when we get to randomized trials that shows the power of randomization. Yeah, exactly. We don't need to go back to the figure, but people might recall that the top of that pyramid
Starting point is 00:16:43 was systemic reviews and meta-analyses. Let's just talk about meta-analyses since they are probably the most powerful. So this is a statistical technique where you can combine data from multiple studies that are attempting to look at the same question, basically. So each study gets a relative weighting, and the weighting of a study is sort of a function of its precision. It depends a little bit on sample size, other events in the study, larger studies, which have smaller standard errors are given more weight than smaller studies with larger standard
Starting point is 00:17:12 errors, for example. You'll know you're looking at a meta-analysis. We should have had a figure for this, but I'll describe it the best I can. They usually have a figure somewhere in there that will show across rows all of the studies. So let's say there's 10 studies included in the meta-analysis. And then they'll have the hazard ratios for each of the studies. So they'll represent them usually as little triangles. The triangle will represent the 95% confidence interval of what the hazard ratio is, which
Starting point is 00:17:43 we'll talk about a hazard ratio, but it's basically a marker of the risk. And you'll see all 10 studies, and then they'll show you the final summation of them at the bottom, which of course, you wouldn't be able to deduce looking at the figure, but it takes into account that mathematical weighting. So on the surface meta-analyses seem really, really great, because if one trial,
Starting point is 00:18:04 one randomized trial is good, 10 must be better. I know I've said this before probably three or four times over the past few years on the podcast, but as James Yang, one of the smartest people I ever met when I was both a student in fellow at NCI once said during a journal club about a meta analysis that was being presented, he said something to the effect of a thousand South ears makes not a pearl necklace. And that's just an eloquent way to say that garbage and garbage out. So if you do a meta analysis of a bunch of garbage studies, you get a garbage meta analysis. It can't clean garbage. It simply can aggregate it. So a meta analysis of great randomized control trials will produce a great meta-analysis. They try to control for garbage the researchers and the investigators,
Starting point is 00:18:51 but I think to your point with the Pearl necklace, imagine if you had, say, 10 trials and nine of them are garbage, one of them is really good, really rigorous randomized control trial. And you're looking at the top of the pyramid and you're saying, well, meta-analysis is the best. We should be looking at this meta-analysis. Meanwhile, you've got that one randomized controlled trial that actually is worth its salt, its rigorous, et cetera, that I would say, if you had the option, I think you probably would rely more on that one randomized controlled trial, which is lower on the pyramid. So I think that's probably, I think, you've told me one of your hangups with the pyramid because it's not necessarily top of the pyramid. It's going to be some meta-analysis of randomized control trials. That's right. Yeah. I don't want to suggest meta-analyses
Starting point is 00:19:32 are not great. What I want to suggest is you can't just take a meta-analysis as gospel without actually looking at each study. You don't get a pass at examining each of the constitutive studies within a meta-analysis. It's really the point I think we want to make here. There's one thing in here that isn't represented, but we had a few questions about it. I think a couple. People are asking about what's the difference between a phase three and a phase two or a phase one clinical trial? You know what's going on there? Yes. So here we're talking about human clinical trials.
Starting point is 00:20:05 This phraseology is used by the FDA here in the United States. And typically, the world does tend to follow and lock step, but not always with kind of the FDA's process. So if you go way, way, way back, you have an interesting idea. You have a drug that you think is or a molecule that you think will have some benefit. Think of it as a cancer therapeutic. You've done some interesting experiments in animals, maybe started with some mice, and you went up to some rats, and maybe even you've done something in primates. And now you're really committed to this as the success of this,
Starting point is 00:20:43 and the safety of this in animals looks good. So it's both safe and efficacious in animals and you doubt decide you want to foray into the human space. Well, the first thing you have to do is file for something called an IND, an investigational new drug application. So after you do all of this preclinical work, you have to file this IND with the FDA. pre-clinical work, you have to file this IND with the FDA, and that basically sets your intention of testing this as a drug in humans. And the first phase of that, which is called phase one, is geared specifically to dose escalate this drug from a very, very low level to determine what the toxicity is across a range of doses that will hopefully have efficacy.
Starting point is 00:21:28 These are typically very small studies, usually less than 100 people. They're typically done in cohort. So you might say, well, the first 12 people are going to be at 0.1 milligrams per kilogram and assuming we see no adverse effects there, we'll go up to 0.15 milligrams per kilogram and assuming we see no adverse effects there, we'll go up to 0.15 milligrams per kilogram for the next 12 people. And if we have no issues there, we'll escalate it to 0.25 to do do do do do do do do do do do. You'll notice Bob, I said nothing in there about does the drug work. These are going to be patients with cancer. If this is a drug that's being sought as a treatment for colon cancer, these are going
Starting point is 00:22:04 to be patients that all have colon cancer. If this is a drug that's being sought as a treatment for colon cancer, these are going to be patients that all have colon cancer. They're often going to be patients who have metastatic colon cancer. So these are going to be patients who have progressed through all other standard treatments and who are basically saying, look, sign me up for this clinical trial. I realize that this first phase is not going to be necessarily giving me a high enough dose that I could experience a benefit, and that you're really only looking to make sure that this drug doesn't hurt me. But nevertheless, I want to participate in this trial. If the drug gets through phase one, safely, then it goes to phase two.
Starting point is 00:22:42 And the goal of phase two is to continue to evaluate for safety, but also to start to look for efficacy. But this is done in an open label fashion. What that means is they're not randomizing patients to one drug versus the other typically. They can, but usually it's now we think we know one or two doses that are going to produce efficacy. They were deemed safe in the phase one. We're now going to take patients and give them this drug and look for an effect. And a lot of times, if there's no control arm in the study, you're going to compare to the natural history.
Starting point is 00:23:21 So let's assume that we know that patients with metastatic colon cancer have, on standard of care, have a median survival of X months. Well, we're going to give these patients this drug and see if that extends it anymore. And of course, you could do this with a control arm, but now it adds the number of patients to the study. So again, typically very small studies can be, you know, in the 2030, 40, 50 range, maybe up to a few hundred people. And that one, Peter, I think is a probably a good example of if you have the non-randomization, this might be a case where say it's an immunotherapy, and people know about the immunotherapy, and it's been really effective. It gets approved for a particular cancer, let's say.
Starting point is 00:23:59 And there are a lot of people that know about it, and there are cancer patients that know about it, and they want to get that treatment, but it's not approved. They're talking to their doctor, they may be there online, they might enroll in one of these trials because they really want to try the drug, and maybe they might believe in it more than some other treatment.
Starting point is 00:24:15 Yep, there are lots of things that can introduce bias to a phase two if it does not have randomization. Again, the goal would be to still randomize in phase two because you really do want to tease out efficacy. So if a compound succeeds in phase two, which means it continues to show no significant adverse safety effects, which by the way, it doesn't mean it doesn't have side effects. Every treatment has side effects. It's just that it doesn't have side effects that are deemed unacceptable for the risk profile of the patient.
Starting point is 00:24:47 And it shows efficacy. So really, you have to have these two things. You then proceed to phase three. Here, a phase three is a really rigorous trial. This is a huge step up. It's typically a log step up in the number of patients, you're talking potentially thousands of patients here, and this is absolutely a placebo-controlled trial, or not necessarily placebo, but it can be standard of care versus standard of care plus this new agent, but it is randomized. Whenever possible, it is blinded and with drugs, that's always possible. And these are typically longer studies, because you have so much more sample size, you're going to potentially pick up side effects that weren't there in the first place, and of course now you really have that gold standard for measuring efficacy. And it's on the basis of the phase one, phase two, and mostly phase three data that a drug
Starting point is 00:25:41 will get approved or not approved for broad use, which leads to a fourth phase, which is a post-marketing study. So phase four studies take place after the drug has been approved. And they're used to basically get additional information because once a drug is approved, you now have more people taking it, and they may also be using this to look at other indications for the drug. We talked about this recently, right? A phase four trial with semi-glutide being used to look at obesity versus its original phase three trials,
Starting point is 00:26:17 which we're looking at diabetes. The drug's already been approved. This study isn't being done to ask the question should semi-glutide be on the market? No, it's on the market. It's basically expanding the indication for semi-glutide. In this case, so that insurance companies would actually pay for it for a new indication.
Starting point is 00:26:33 But given the size and the number of these studies, you're also looking for, hey, is there another side effect here that we missed in the phase three? Right. And it might be the particular population, it might have a different risk profile. You might have a different threshold. That's right, because you're not doing this
Starting point is 00:26:49 in patients with type two diabetes, you're doing this in patients who explicitly don't have diabetes, but have obesity, different patients. Can we see something different here? So yeah, so anyway, that's the long and short of phases one, two, three, and four. Okay, so going back to observational studies,
Starting point is 00:27:06 are there any things that you look for in particular that will increase or decrease your confidence in it, whether it's a pearl necklace or a garbage? I think that selection bias is a big one. When I think about observational studies, whether they be prospective or retrospective, the healthy user bias, I think, is one of the more common ones we see in the epidemiology
Starting point is 00:27:28 as it pertains to health. So I wouldn't even know where to begin talking about these studies because the examples are so myriad, but is bacon bad for you? Well, if you look at observational epidemiology, bacon is almost always bad for you. I don't know what the hazard ratio is,, Bob, but it's probably in the neighborhood of 1.3 or something like that, meaning it has about a 30% increase in the risk of basically anything you look at, right? Whether it be cancer, heart disease, death, is that directionally right? I think that's right. I mean, there's probably more nuance. The WHO is looking at, I think
Starting point is 00:28:02 they said over 700 epidemiological studies for red meat consumption, and I think they also had processed meat consumption. When you look at those, we can get into it, but how are they measuring bacon consumption? They're using these food frequency questionnaires, probably get into this recall bias. But yeah, generally, I think with the WHO stuff, I think it was about 20 to 30 percent associated increase. And so you look at that at the surface, of course, you'd be concerned, you'd be like, oh my God, like I shouldn't be eating fill in the blank. I shouldn't be drinking coffee. I shouldn't be eating bacon.
Starting point is 00:28:30 I shouldn't be eating meat at all. The problem with these studies is that you can't ever, no matter how much you try to statistically reconcile it, you can't strip out the fact that people make choices not in isolation. So is there any difference between a person who makes a lifelong decision to not eat meat and a person who doesn't? Of course there is. And it's going to come down to many things that go beyond their diet, including things that can't be controlled for now.
Starting point is 00:29:02 Obviously you can control for some things, smoking. A person who doesn't eat meat is far less likely to smoke than a person who does. Person who doesn't eat meat is probably far more likely to exercise or pay attention to their sleep habits or be more compliant with their medications or things like that. Again, people who don't eat meat, basically that is a proxy. That is a really good marker for someone who is very, very health conscious. So this healthy user bias permeates everywhere. And by the way, it permeates in both directions. So if you look at the epidemiology that started to become very popular about 10 years ago that was suggesting that diet soda was more fattening than soda. So drinking a diet coke is worse than drinking a coke. Well, in the surface, that doesn't seem to make a lot of sense, right?
Starting point is 00:29:56 I mean, diet coke has no calories in it. Coke is full of just liquid sugar. And of course, it gets you thinking, oh, is it the aspartame or whatever else? Well, a far simpler explanation is look at people who are drinking diet soda versus people who are drinking soda. You could make an argument, I think this is the argument, that as a person is becoming more metabolically ill and they're being informed that they really need to stop drinking soda. They're going to be drinking diet soda. And so it's very difficult to look at just people drink this, people drink that. They're otherwise identical and simply the only difference between them is what they drink. It just doesn't really hold up. So anyway, you're always going to look for that healthy user bias. You talked about another bias a second ago,
Starting point is 00:30:45 which is information or recall bias. And I think many people are just shocked to learn how clunky and cluelgy nutritional epidemiology is. Like when you think about all of the amazing technology we have in the world, and we just recently did a podcast and talking about some of the most cutting edge tools of neuroscience that allow you to examine the behavior of a single neuron using channel opposins and all these
Starting point is 00:31:11 things. That's at one end of science and at the other end of science we have this thing called a food frequency questionnaire where you get a call from Billy and he asks you, hey, do you remember how many times a week you ate oatmeal for the past year? I pay quite a bit of attention to what I eat. I don't know how I'd answer that question. You just go to your spreadsheet of your oatmeal consumption, right? Yeah, now you dug into this a bit, Bob. I'm being a bit tongue-in-cheek and facetious. Can you try to make the case that recall biases isn't really that bad and I'm just exaggerating?
Starting point is 00:31:43 I can't make that case. I guess it might depend on what are you recalling. Yeah, what's the best case scenario? I would probably get out of the food category. It might have to do with something, say, like smoking history. You might even have receipts of the last time that you paid for cigarettes or something like that. If you ask people how much did you smoke in the last year,
Starting point is 00:32:02 I think you can get a more accurate answer. But with the food frequency questionnaires, and there's so many analyses, even just the number of foods that are out there compared to the number of foods that are encapsulated in the food frequency questionnaire, vastly different. It only covers a very small portion of it. And it's actually the foods that, I think, the epidemiologists often look at, like the red meat consumption and things like that, the people will underestimate when they do like these validity studies and actually follow them or they do like a food log compared to the food frequency questionnaires, the correlation is so low that it's so underestimated that you're not really getting an accurate picture.
Starting point is 00:32:36 So don't know about like a best case scenario with food frequency questionnaires for food, it would be on frequency. Imagine if you got a food frequency questionnaire that was maybe more technologically advanced, it's an app where you literally recall at the end of each day what did I eat today. But the problem is I think is the frequency of this that oftentimes it's, you're doing one questionnaire for what did you eat over the course of, say, one year or two years, or even will just do one food frequency
Starting point is 00:33:04 questionnaire at the beginning of the study at baseline, they'll follow up with these people for say 10 years, 20 years. And the assumption is, don't change their eating habits. They never asked them again, what happened here? But if, you know, they might compare two groups. One group has higher bacon consumption than the other. The assumption is that they're going to continue with those dietary habits in perpetuity. Again, that's not a best case scenario.
Starting point is 00:33:25 But I guess the best case scenario is you could have more rigor, I think, if you did it more frequently because obviously, if I asked you what you had for breakfast this morning, I think you probably have more confidence in the answer than what did you have for breakfast on January 3rd of last year. I think with nutrition, I just, because I spend so much time doing this type of stuff with patients, it's metaphysically impossible. I mean, I really feel strongly that we should abandon food frequency questionnaires, and no study should ever be published that includes them. I mean, anger, a number of epidemiologists
Starting point is 00:33:55 listening to this. I really think we need to put a stop to that. I think where recall is reasonable, is, as you said, on things that are more profound. I mean, if we wanted to do a study on think of something really that you would never forget, like, oh, childbirth, asking women to recall how many times have you been pregnant, how many times did you either have an abortion or miscarry, and how many times did you deliver it term? Like something that profound, yeah, I would feel confident that if you asked a woman that question over the past 10 years of her life, you would get very accurate answers. But by the way, it still doesn't tell me that I would be able to infer causality. If I was trying to look
Starting point is 00:34:35 at women who have never had a miscarriage versus women who have had miscarriages, just because I look back and ask them to tell me those things doesn't mean that embedded within those differences are other biological or social or economic factors. You kind of get where we're going here, which is, I think epidemiology has a place, but I think the pendulum has swung a little too far and its place has been asserted as being more valuable than I think it probably is. You kind of talked about something that's, I think, a very important bias that exists in any study, but I think this is actually a big problem in prospective studies if they're done incorrectly, which is something called performance bias.
Starting point is 00:35:19 So the Hawthorne effect is basically an effect that says, if a person is watching you, you will change your behavior. So anybody who has tried to fisteriously log what they eat every day, which I've done, many people have done this, there's no question you change your behavior. Just by logging what you eat, you will change what you eat.
Starting point is 00:35:41 How much more will you do it when you know someone is going to look at it? Unbelievably so. In fact, you could make a case that one of the most efficacious dietary interventions known to man is having somebody watch what you eat every meal, not just every meal every moment. And whether you have somebody virtually or literally watching you at every moment eat, especially someone who you're not entirely comfortable with, that's going to have an enormous impact. Isn't there a name for this that's like a car? Is it the Hertz effect? It's the Avis effect because actually, Hertz might in the 70s and 80s,
Starting point is 00:36:19 Avis was always behind. Oh, they were behind Hertz. Yeah, yeah, yeah. It was a Hertz or a budget, budget, but I remember, so they're slogan. I thought it was great. Basically, they were behind Hertz. Yeah, yeah, yeah. It was a the Hertz or budget budget. Renek, I said, they're slogan. I thought it was a great basically, we're number two. Then they would say, we try harder. We've got this inferiority complex. We're number two. And that trying to think of an example, the Hawthorne effect is it's almost like a experimenter bias that
Starting point is 00:36:38 the experimenter is watching the observing the people under the lamp that was where this came from, looked at work productivity with different lighting. They got the clipboard and it could be your boss that's out there and looking and watching you. And so that experimenter's having the effect with the Avis effect. So say that you were, say, you Peter, competitive Peter, were enrolled in a cycling trial. Say it's open label and you get a placebo or you get nothing.
Starting point is 00:37:03 And you know that there's another group out there that say, we looked at this a little bit, say it's like this lotion that you put on, and supposedly it's supposed to improve your performance. There could be a part of you that says, like, I'm gonna beat those guys, the control group. They're gonna say, like, we're number two, like, we're not getting the special treatment. So we're gonna win this thing.
Starting point is 00:37:19 Sounds like that wouldn't happen, but I think that people, if they enroll in a trial, sometimes there's that competitive nature, and it's also to your point about, if you have somebody watching you, that could also adjust your performance. Now you have this, whether it's not physically somebody watching you, but you know that there's a trial that's following you and that you know that they're going to be looking a year down the road, two years down the road, or even three weeks down the road is probably more common. And they're going to test you again and see how you do and see if you improve or see if you don't.
Starting point is 00:37:45 Those things can play a role. And just to the point of somebody that might be not pleasant that's watching you as far as food, there's a great Saturday Night Live clip with, it's the rock in the clip, and it's a commercial for Nick Cattrell, and it's a smoking cessation treatment. And it's actually, the rock is named Nick Cattrell. There's a guy in a couch and he's about to smoke a cigarette. The rock comes out jacked. It smacks the cigarette out of his hand and it's one of the most effective smoking cessation
Starting point is 00:38:12 programs I've seen. I bet. I think there's a more sinister form of performance bias that creeps up in clinical trials, especially in randomized control trials where you think at the surface, wow, this is a really well done study. So you'll take two groups and let's say it's a weight loss trial. We're going to test calorie restriction versus pick your diet, the alpotato diet. So the calorie restricted group is given some leaflets and it tells them how to measure calories that they need to cut their calories
Starting point is 00:38:47 by 25% from baseline, and we'll see you in 12 weeks. The potato diet group is given twice weekly counseling sessions on all the different ways you can cook potatoes so that you don't get fed up and bored of eating potatoes all day on the potato diet group. And at the end of the study, the potato diet group lost more weight than the calorie-restricted group. It'd be tempting to say, well, come on, this is a randomized control trial. I mean, but the problem is, there's an enormous performance bias in the potato group, in
Starting point is 00:39:18 that they were given far more attention. They were observed more. They were given more coaching, they had much more of a positive behavioral influence. I would say that's the number one bias that I see in RCTs that are lifestyle based, is that very subtle performance bias. If you're really designing a trial well, you have to flatten the curve on those differences. So each person in each group should be getting the exact same amount of attention, the exact same amount of touch with the investigators, the exact same type of advice so that you can eliminate that difference, which unfortunately, it shows up a lot. That almost gets back to this idea, the null hypothesis that, what do you say, coffee might darken your eyes. That's your guess.
Starting point is 00:40:10 You've observed it. You've got a couple of case studies of some people in your family or whatever. And so that's your hypothesis. And then the way that you design the trial is interesting because it's really like, this coffee is going to be innocent until proven guilty, the default position. And then it's really your role. It seems almost counterintuitive to a lot of people. And it's hard actually from a human perspective,
Starting point is 00:40:28 is that your role is really to be as rigorous as possible, to essentially falsify your hypothesis. You need to do that as rigorously as you can. And sometimes I think to your point, sometimes it's like, you get really excited about a treatment, the people that are involved in the study, the investigators, they're really excited about it,
Starting point is 00:40:44 the control group, or the placebo, it's almost an afterthought, and so there might be a lot of things that they're doing in the treatment group, not just the treatment itself that could bias the study. Yeah, continuing on that thread of other things that you want to look at in a study is, and we talked about this very briefly in passing, was the idea differentiating primary from secondary outcomes. And there's some debate about whether you can only have one primary outcome or whether you can have co-primary outcomes. But the primary outcomes are basically the outcomes for which the study is designed around and powered against.
Starting point is 00:41:15 Again, we will come to this idea of power in a moment. But there are lots of secondary outcomes, and they're often exploratory. It's really important that when people are pre-registering studies, they state what the primary outcome is and what the secondary outcomes are, and typically, a study that fails to meet its primary outcome will be deemed a null study, even if it meets secondary outcomes. So it's just very important to pay attention to the subtlety of that, and again, a good journal with a pre-registered study is going to make that abundantly clear. But I can promise you that someone writing about it
Starting point is 00:41:48 in the newspaper is virtually never going to make that distinction. And it's important to understand that because it gets to this next issue, which is kind of the multiple hypothesis testing problem. Research should be hypothesis seeking or hypothesis testing, but it can also be hypothesis generating. And so you can use statistical tools to slice and dice data in multiple ways, and you can
Starting point is 00:42:10 take many looks at data to see if you actually find something significant there. You have to be careful because the more you look, the more times you look at something, the more likely you are to find something that is indeed positive. So this isn't a great analogy, but just to give you a sense of it, if you flip a coin, the fair coin, you've got like a 50% chance of getting heads. If you get two chances to flip the coin, the probability that you're going to get heads is now 75%. If you get three chances to flip a coin, you're up to 87.5% chance that you're going to
Starting point is 00:42:44 get at least one head. Ten times, you're basically at 100% likely that you're going to get head. So if you're allowed ten looks, you have to correct for that. And there's something in statistics called a Bonferroni correction factor that does force you to do that. It forces you to divide your p-value by n, where n is the number of times you've taken a look at the data, so to speak. And therefore, it raises the bar for what is significant.
Starting point is 00:43:11 And again, we'll talk about p-values for folks who maybe aren't as familiar with that in a second. Is there anything else that you'd add to that? I'm sure I'm missing some things. Maybe a more technical term that we didn't bring up, which is confounding. When we talked about the healthy user bias, I think that's a great example of something that can confound your results. It's not in the causal pathway, what's called it, that might be affecting the results,
Starting point is 00:43:31 whether it's age, sex, smoking. The list is almost endless, and this is what those observational studies will try to control for in order to almost mimic what randomization would look like. Right. This is the sort of the vein of the existence of the epidemiologist. If you're trying to determine a relationship between hot chocolate consumption and skiing accidents, it's very likely that people who drink more hot chocolate are more likely to have ski accidents.
Starting point is 00:43:57 I mean, does skiing cause hot chocolate consumption or ski accidents cause hot chocolate consumption? Does consuming hot chocolate make you a worse skier? Or is it that people who live in colder climates consume more hot chocolate and usually skiing occurs in colder climates outside of Dubai? So climate, therefore, is obviously a confounder and the goal is to be able to identify every possible confounder when you're doing epidemiology and I think as John Ionidis argued when we had him on our podcast, that would be a good podcast for people to go back and listen to
Starting point is 00:44:29 alongside this. It's really not possible to identify, let alone eliminate, all confounders. Absolutely. So if we look at experiments or experimental studies, compared to observational studies, are there things you look for specifically or in particular for experimental studies to increase or decrease your confidence in them? Yeah, well, first and foremost, randomization. So, if an experiment isn't randomized, again, it doesn't mean that it's useless, but it just means it's going to be a lot harder to really make sense of this. And randomization needs to be a rigorous randomization. You can randomize incorrectly, believe it or not. I think there's a very famous example with predimed, which was a study that when it was published,
Starting point is 00:45:11 was kind of a remarkable finding, a very large study, something like 7,500 people randomized into three groups, 2,500 per group, given basically two different dietary patterns, a Mediterranean diet, in two versions, and a low fat diet. This was a primary prevention study, so it was looking at people who are high risk, but who haven't had heart attacks or anything yet, and it was looking at mortality.
Starting point is 00:45:36 And the study was actually stopped early, again, something we're gonna talk about in a second, because it had such a positive effect. So the Mediterranean diet had such a favorable effect relative to the low fat diet that people were dying at a rate far less such that it would have been unethical to continue the study for the,
Starting point is 00:45:53 I think the seven and a half years it was planned to run. And I think they stopped it in the four-year mark and sort of declared victory. But then something happened, Bob, what happened? They went back and re-analyzed this predimate group. The first paper was published in New England Journal of Medicine 2013, and they had almost like a brand new article addressing some issues.
Starting point is 00:46:13 They did a re-analysis that was published in 2018. I think it came from this fellow named John Carlisle, who had this way, this way of looking, and I think this was, we've got an email on this with David Allison, a great statistician. He talks about this in his article too, as well, where he looked at this. But this fellow named John Carlyle did this analysis where he looked at thousands of studies, and he could flag the studies and see, does this truly look like randomization based on some particular statistics? And the Predamit study was flagged, looking like this doesn't look like proper randomization. There might be something going on here.
Starting point is 00:46:49 And I think according to the media outlets, I think I read in the New York Times, they talked to the lead or the senior investigator. And he said that it turns out that some of the villages are the clinics. I forget how many clinics in total there were in the study. But at 11 of the clinics, one of the investigators were randomizing the entire clinics to one group.
Starting point is 00:47:09 If you really want to dig into a study, sometimes you really have to get the story, which is, oftentimes you look at randomization. Oh, that's really simple. You just randomize people to different groups and blind it or unblind it. It's very hard to blind the fact. You see your neighbor get a delivery every week, a jug of olive oil, or a sack of mixed nuts, which were the two Mediterranean groups. And I think what happened was people started complaining in the villages, they're like, what do I get? And they're like, you
Starting point is 00:47:34 get your low fat diet pamphlet. Remember, we give it to you every year. You can do that in the study, and that's typically referred to as a cluster, a cluster randomization, where you might randomize one classroom to another classroom, which might be convenient, but it requires different statistical methods. Let's use that example, because that's actually a really good one, right? If you want to study the effects of meditation
Starting point is 00:47:56 on attention span of kids, it's very different to say, we're gonna just take 100 kids and randomize 50 into one group, 50 into another and separate them versus saying we've got a class, two classes over here, two classes over here, we're going to split those two and two into the effect. That's a totally different type of randomization. One is a true randomization, one is a cluster randomization, and while you can do the latter, it requires a different statistical adjustment. So, Predimid basically
Starting point is 00:48:31 had to re-analyze all of their data in light of that. It turned out in the case of Predimid, the results still held, but it will always kind of be a cloud that hangs over it. I think he needs to make this point. He was a huge fan of the Predamine study and something that he said, which I think might be intuitive, is if they're a randomizing entire villages to a group and they're not accounting for it, he thinks like, I'm not sure that's gonna be the only problem in that study.
Starting point is 00:48:56 You know, everything was uncovered, but on the flip side, it's really, really hard to do everything right in the study. You're going to make mistakes. And now imagine randomizing a household. Dad, you're on a mediterranean diet for the next seven years. Mom, you're on a low-fat diet for the next seven years. I mean, it starts to get very difficult. That's one important thing. You also want to make sure, is there a control group? Not all prospective trials have control groups. Sometimes it's a single group where a person
Starting point is 00:49:22 serves as their own control. and there's typically a crossover. So you'll take a group, you'll randomize them into two. It's not that one group is getting treatment A, and the other group is getting placebo or treatment B. Both groups get both treatments, plus or minus a placebo, in different orders. in different orders. And this is a great statistical tool provided the treatment doesn't interfere with the washout. The treatment doesn't interfere with the control session. The reason this is powerful is you need far fewer subjects
Starting point is 00:49:57 when everybody gets to serve as their own control. So it greatly reduces basically the cost and logistics of a study, but you run into challenges, right? So if you take 20 people are going to take this drug that is supposed to help them exercise better for eight weeks and another group is going to take a placebo for eight weeks in exercise. And then everybody switches because that's the right way you would do it. You had some people start first on the treatments, some people start first on the placebo. Do you
Starting point is 00:50:29 need a gap between the treatments? Because will the effects of that drug linger into the placebo period for one group, which is not what's happening to the other group. And even if it is, maybe if you're only doing it with one group, are you confounding the effect of that treatment? I hope that makes sense, Bob. I don't know if I'm making sense. I know you know what I'm saying, but is there a better way to explain that? I think that makes sense.
Starting point is 00:50:50 One other point I was gonna make about that too with the cross-over group says, it's gonna ask you about that because I've seen the statistical power, I guess you would call it with the cross-over group. So you can see relatively small studies. Not a lot of people pretty short, and you look at the P values and we'll get into that,
Starting point is 00:51:04 but they're, you know, 0.00, something. The assumption, when I was thinking about it, when you're talking about it, and they serve as their own controls, it's almost as if they're treating them like, if you could get identical twins and randomize those identical twins to one group or the other, you would think that's great because you're controlling for so many things about the physiology or the genotype, et cetera, about those people. And it's almost like they treat these cross-over groups. Or you're almost cloning these people. You're comparing them to themselves, but it's a good point.
Starting point is 00:51:31 And there might be something about the order of the treatments that they receive. If they get treatment A and then treatment B, maybe one might have an effect on the other. The really good ones go A, B, and then B, A. They divide them into two groups and go A, B, and B, A. And yeah, it really comes down to the fact that you can use what's called a paired T test. The simplicity of the statistic of the paired T test is part of its elegance here and that it basically eliminates a lot of variance. Okay, so then we talked earlier about this, blinding.
Starting point is 00:51:58 What does that mean? So, in an ideal world, both the subjects and the investigators should not know who is getting the treatment and who is getting the placebo. At a minimum, the subjects should not know. That would be single-binding. But again, double-binding is always preferred if possible because the investigators can be biased. They can have hidden biases if they know the outcome.
Starting point is 00:52:27 So for example, if patients are being given a drug for weight loss, you could say, well, it's pretty easy to blind the patients from that. But if the investigators know that, they might behave differently towards the patients for whom they expect greater weight loss if they believe that this drug is effective. So again, very important, and sometimes very challenging.
Starting point is 00:52:47 I think we talked about this in the podcast with Rick Doblin. One of the huge challenges of studying psychedelics is it's very difficult to blind anybody. Most of all, the user, the subject. One group is getting psilocybin, and the other group is getting, even if it's niacin, which causes so flushing, it's not hard to know which group you're in, and that may affect the results.
Starting point is 00:53:10 Size matters, duration matters, and basically the generalizability of the study. So is it in a population that replicates or looks like what I'm interested in studying, whether it's me or my patient or whomever I care about. And there are strengths and weaknesses to mass heterogeneity of studies. So the more heterogeneous a study in terms of its patient population, well, the more generalizable the results are, but the higher the bar for finding it. So I think there's got a lot of attention lately, but I think for a while it was a relatively unknown, kind of dirty little secretive medicine, was how many clinical trials involved men only.
Starting point is 00:53:49 How many drugs were approved for both men and women, but on the basis of only being studied in men. And the rationale for this was that it was more complicated to study women. So women, especially premenopausal women, because they've a menstrual cycle, that really changes things hormonally and therefore it's more complicated to do studies and look at drug kinetics and all sorts of things in women. So the easier way to do that was to just study it in a homogeneous population of men. Well, of course, that poses an enormous problem if you're now trying to extrapolate the utility
Starting point is 00:54:23 of that drug in women. It's an extreme example, but a very important one. For large studies, you tend to want to know, is this a multisite or a single site? Again, Predimates a great example. So you had a multisite study, and there were probably significant differences between how the sites were run. So there's an advantage to multisites, because in theory, it brings more heterogeneity. It should cancel out the effect of any one study over another, but it's harder to control. And therefore, you can have whether it be deliberately or non-deliberately rogue studies, or
Starting point is 00:54:55 sites rather, introducing more or bias. I think another thing I really look at here is how big is the association of the effect? We'll talk about this with power, but you can have something that is statistically significant, in that sense, the study is, quote-unquote, a success, but it's clinically irrelevant. The effect is not that big, so we've tested this new drug for blood pressure, and it lowers systolic blood pressure by one millimeter of mercury after a year of use. And it's like, okay, that might be statistically significant if the study was large enough.
Starting point is 00:55:28 Is it clinically significant, almost assuredly not? You want to pay attention to what the adverse events were, both in frequency, severity, and distribution. You want to pay very close attention to who funded the trial. Trials don't fund themselves, and a lot of trials are funded by drug companies. Now, again, they're
Starting point is 00:55:45 usually done with very clear data monitoring and data analytics. And despite all of the fear of mongering out there, it's not like farmer really gets to put their hand on the scale of these farmer studies, but where I think things can get a little dicey is in terms of things getting buried in supplemental journals and things like that. So you do wanna pay a bit of attention to who's funding a trial. And I think even more important than that is kind of understanding what the conflicts
Starting point is 00:56:14 of interest are of the authors. And nowadays those have to be declared, but there's been a huge amount of hoopla over that and there have been some very famous examples of people who are on editorial boards of journals or publishing like Crazy and not declaring that, hey, I'm a paid consultant of these 10 pharma companies and I'm writing or doing experiments on drugs by these people or I'm an editor on journals that are commenting on this. And then finally, you really want to understand if the study was adequately powered. And this becomes very important if the study has a null outcome.
Starting point is 00:56:47 You want to just spend a minute and we'll talk about power. Yeah, I think that makes sense. Power is defined as one minus beta, where beta is defined as the probability of a false negative. Let's contrast that for a moment by talking about what a false positive is. A false positive is defined as alpha, and that's also known as the p-value. I think this is actually complicated, and I want to just spend a minute on this. So everybody's heard of a p-value, but I don't think people think of it as a false positive rate. I don't think most people have heard of the false negative rate being beta, and then one minus beta being the power. So I think people probably always know we talk about p-values being 0.05 or less. It's very difficult to make a case that we're going to look at a study that has a p-value
Starting point is 00:57:35 of 0.1 and say it's significant. So what does that mean? So the P value is, as I said, it's the probability that what you've seen is a false positive. You see an effect, it's actually by chance. It's not the true effect. You do a study and you're trying to determine if this is stupid, but coffee changes eye color makes your eyes darker. And if you do that study and lo and behold, it appears that coffee did make the eyes of the subject's darker. And the P value is 0.17.
Starting point is 00:58:18 It means there's a 17% chance that this was a false positive. So let me kind of restate this. So the P value is basically trying to answer the question, 15% chance that this was a false positive. So let me kind of restake this. So the p-value is basically trying to answer the question, what's the probability of rejecting the null hypothesis when it is in fact true? If the p-value is zero, it means it's impossible. And if it's one, it means you are absolutely going to do it.
Starting point is 00:58:41 So obviously, we want p- values that are as small as possible. It can never be zero, but you want them to be as close to zero as possible. And basically, we say 5% is our minimum threshold, really our maximum threshold. That's the ceiling that we'll put on this idea. We go back to what we talked about the outset. So the default position is that the null hypothesis is correct, that there is no difference between the groups. So this term's statistical significance basically means that the null hypothesis is rejected if the p value is less than that pre-stated level. I don't know if I'm explaining this really, really well, Bob. Is there anything you would add to this? Because I think this is an important idea, even though its p-values are so ubiquitous, I think it's maybe worth spending one more minute on it before we go back to power.
Starting point is 00:59:32 Sounds like it makes sense. I'm trying to think of somebody who might not understand it as well. Those examples that you gave are good. And this is, so you see on most papers, I think you'll see this p-value of .05. We can get into the confidence intervals, but you'll see 95% confidence interval and p value of 0.05. And that's your false positive rate. It's an arbitrary threshold, so you could try to submit a paper. And I've seen, I usually catch it by the confidence interval. I'll see 90% confidence interval on some figure table. And I'll look at it, they'll use a p-value of less than 0.1.
Starting point is 01:00:07 And maybe they have some justification for it or not, but it really is this arbitrary threshold. Like, imagine if your p-value was, if it's less than 0.95, we're going to reject the null hypothesis. In theory, it's not exact, but if it's the chance of this being a false positive is about 90% based on your analysis, you would reject the null hypothesis that there's no difference between these groups, which sounds sort of insane. So it's, I think it was this guy Fisher who established this .05, but this has been
Starting point is 01:00:35 the threshold. More or less, they're willing to accept at least for the purposes of a single trial that the p value of .05, that they're willing to accept a level of false positive in their results and still make that claim that they rejected that hypothesis. Right, because if you make the p value so low, if you say no, my threshold is 0.000001, then you really run the risk of discarding a lot of information that turns out to be kind of relevant. It is a fine balance between those two. And that would be a false negative. Exactly. The lower there, yeah, there might be an effect,
Starting point is 01:01:09 but you're not going to see it. So this false negative rate, we typically allow to be a larger number. It's typically between 10 and 20%. The flip side of that is we have 80 to 90% power because one minus accepted false negative rate is called your power. I think this is one of the most important concepts to understand in designing any sort of clinical trial, whether it's humans, animals, any sort of intervention. So there's a table they're all over the place, but this is the one I've always liked. It's old. It's probably 10 years old, probably longer than that actually, but it's out of a great cancer textbook on clinical trials. So pull up this table, Bob,
Starting point is 01:01:49 and we'll kind of walk through. Got it, power table. Okay, these look a little intimidating at the outset. So let's kind of walk through how to interpret this. So what this table is saying is, you want to presuppose you know what the difference is between the treatment groups. You have to say, I believe that the difference between the success rate and the treatment between group A and group B is going to be X percent. And the smaller of the two is Y percent. Let's come up with a real number. So I think that we are going to look at how this drug impacts your rate
Starting point is 01:02:40 of surviving a urinary tract infection or cur of this infection. And I think that the placebo group is going to have a success rate of 25% and I think that the treatment group is going to have a success of 35%. So I think there's a 10% gap, and I think the lower of those two is 25%. So you go to .25 on the horizontal axis and you go to .1 over on the column. And you'll see there's two numbers there, 459 and 358. And the upper of those two is if you want 90% power, i.e. 10% false negative,
Starting point is 01:03:26 and the lower of those two is for 80% power or 20% false negative rate. And those numbers basically tell you how many people you need in each of the two treatment groups if you want to be significant at a level of .05%. So what do you notice when you look at this? You notice that the bigger the gap, the bigger the effect size between the two groups,
Starting point is 01:03:56 the fewer subjects you need. So if you march left to right in this table, holding that effect size at .25, if you say, well, the difference is 15%, it goes to, you only need 216 or 165. If the difference is 30%, so one group is gonna have a 25% success rate, one group's gonna have a 55% success rate,
Starting point is 01:04:19 you're down to 60 and 47. And if you go out to a 50% difference, so one group is gonna have a 25% response rate, the other groups a 75% response rate, you're now down to needing somewhere between 18 and 23 people per arm. And by the way, if you go down to 5%, one group responds at 25% the other at 30%, you're at 1700 or nearly 1300 depending on your level of power. I appreciate everybody kind of bearing with me as I went through this power table. It seems like one of the driest things in the world.
Starting point is 01:04:52 But as my mentor once told me, it's the single most important table you should ever familiarize yourself with if you want to be in the business of designing clinical trials. Or basically any sort of experiment. Because it is just so easy to get this wrong and over or under power an experiment. What does that mean? So, to under power experiment, I think, is the more common mistake here. You simply don't have enough people in the study to appreciate a difference if it is there. The study ends up being null.
Starting point is 01:05:27 The p-value does not exceed the threshold of 0.05, and you say, look, there is no difference between treatment a and treatment b. When in reality, there may well have been, but you didn't have the power to determine it, and therefore you don't actually know if you should have rejected the null hypothesis or accepted it. Yeah. Yeah. I think the other problem, equally sinister, perhaps not as common, is when a study is overpowered. Now you have more people in the study than you
Starting point is 01:06:00 should have had for the effect size. And you start to find things that are statistically significant, but are probably irrelevant clinically. That's when you start to pick up an effect size of 1% when you're dealing with something clinically that should never be thought of as being relevant below 10% detection threshold. So notwithstanding the fact that you also probably
Starting point is 01:06:24 had more people in a study that you also probably had more people in the study than you needed to, could have cost more. And you typically don't see this as much with clinical trials, but you'll see this more with kind of data dump trials, data mining studies where they're grossly overpowered. Okay, I kind of got a way off on a tangent there. I don't know why I went down that path of power, but I know it's important. So I think we got on the subject because we were looking at things you look for in an experimental study that increase or decrease your confidence in it, and I think it's something that's if people have this list, it's often left off. I think it's
Starting point is 01:06:54 important. Yep, okay, good. So yeah, power matters. And when you look at a study and it's not significant, you should ask a question, was this study powered correctly? I can't tell just by looking, I actually have to pull out that table, we just went over and go through the matrix and go, okay, well, this is how many people were in it, therefore, at 80% power, they were detected to tell a difference between the two groups of this much within effect size here. And then a lot of times I go, wow, the study wasn't powered appropriately. Anyway, so I've learned nothing new here, unfortunately.
Starting point is 01:07:28 Is it true that you have a laminate of this in your wallet, this power table? I don't anymore, but I used to have a laminated copy of my desk. Yes. I made placements out of it for the kids. They love it. Very nice. Yeah. Hours of enjoyment. Related to this, I think, without looking at their power analysis, which often, I think this is like maybe a tip, is often you won't see anything in their paper, but you might see
Starting point is 01:07:51 it in the protocol if they include that. They'll talk about how they powered the study, what was their justification, what was the effect size that they were looking for, and how many participants did they need, and then you can look at how many they actually got in the trial that actually completed the trial or enrolled in the trial. And to your point of overpowering a study, sometimes you might be able to discern it if you're looking at that example or I think you said there's a drug that lowers systolic blood pressure by one millimeters of mercury. And the results are statistically significant. I think that might put up your feelers and say how many thousands of patients were in this study.
Starting point is 01:08:25 And I think that that gets to another question, which is looking at how these differences are actually determined when you're looking at the effect in one group versus the effect in another group. So what are some of the ways in which researchers measure the association or the, quote unquote, effect size in these studies? A lot of times it's only reported as a relative risk.
Starting point is 01:08:45 You and I have harped on this in the past, which is you can't really talk about relative risk without knowing absolute risk. And sometimes they don't give you enough data in the paper to do that. And it's infuriating actually. But absolute risk is, let's use the example right, it's sort of like group one had at the end of this study
Starting point is 01:09:02 of 5% risk of dying and the other group had a 3% risk of dying. So what's the absolute risk? It's 5% in one group, 3% in another. So therefore we have what's called the ARR or the absolute risk reduction is the delta between those two. So the ARR is 5%, minus 3%, is 2%. There is a 2% absolute risk reduction. And that's important to know because often what's only reported is the relative risk reduction, which is the absolute risk reduction over the non-exposure absolute risk. So in this case, the relative risk reduction would be whatever the absolute one was, I think 2% divided by the non-exposure risk, which is 5%, so that's 40%. So they had a 40% relative risk reduction going from 5% to 3%.
Starting point is 01:10:07 Both of those things are important, but again, it's really critical that you know both. One of my favorite examples of this, of course, is the famous Women's Health Initiative, which was looking at the increase in the risk of breast cancer for the women who were receiving the estrogen and synthetic progesterone treatment. Now, notwithstanding the fact that, and we've talked about this a hundred times, why I don't think that that study was a good study in any way, shape or form, and I don't think that the study demonstrated there was any difference in risk. Statistically, here's what got reported. It got reported that the women receiving the hormone replacement therapy had a 25% increase in breast cancer, and that was true at a
Starting point is 01:10:55 relative risk level. But the absolute risk difference was a difference of five women per thousand to four women per thousand. So if you went from four cases of breast cancer per thousand women to five cases of breast cancer per thousand women, that is indeed an increase of 25 percent. Five minus four is one divided by four is point two five. But what's the absolute risk reduction? Or in this case risk increase, it's one over a thousand or 0.1%. So what I usually say to women when we're talking about hormone replacement therapy is you can kind of use that as you're ceiling for the true risk increase of this therapy, even if you discount the 12 mistakes in that study that make it hard to believe that that effect size would hold. So another way that we tend to measure effect size or association is using something called a hazard ratio.
Starting point is 01:11:53 A hazard ratio actually involves some really complicated math that we're not going to get into something called a Cox proportional hazard, which I'm embarrassed to say, I don't actually know the math anymore. There was a day when I did, and I remember it was not easy for me to learn. I had to go out and buy a bunch of books on statistics, because even though my background's in math, I did not have a huge background in stats. It wasn't like rocket science, but I remember really having to understand the mathematics behind the Cox proportional hazard. The magic of the hazard ratio is it is temporal. So it captures the risk of something, i.e. the hazard, the magic of the hazard ratio is it is temporal. So it captures the risk of something i.e. the hazard over time.
Starting point is 01:12:31 And that differentiates it from something called an odds ratio, which can't do that, which only can measure over the entire period of time what is the risk. So at the risk of oversimplifying this a little bit. Let's talk about the hazard ratio over a given period of time, but acknowledging that it's real magic is its ability to tell you what's happening at any point in time. Let's just pretend we're talking about a cancer drug trial and the hazard rates, i.e. the rates of disease progression, were 20% in one group and 30% in another group. So the people getting the drug progressed 20% of the time, the people not getting the drug, the getting the placebo progressed 30% of the time.
Starting point is 01:13:15 So the hazard ratio is the ratio of 0.2 to 0.3, which is 0.667. So in other words, the treatment group was 67% as likely to experience disease progression as the control group. You could flip the math and say, well, what if you saw the exact same rates, but in something that was desirable? So then it would be the point three over the point two would be 1.5. So your hazard ratio would be 0.15, which means there's a 50% increase in the benefit or the harm if it's something that's harmful. So again, hazard ratios are, I think ubiquitous in clinical trials, you'll see them everywhere.
Starting point is 01:13:56 And the thing you just have to know is how to do the math on it. So Bob, I'll quiz you and you tell me in the listener how you're figuring this out. The hazard ratio is .82. 0.82, you're comparing the experiment to the control or the experimental group to the control. The experimental group is, I would probably flip it. I would say about 18% reduced risk of whatever the event is you're talking about of progression. All right. All right. So give me one, Bob. Nice. How about we'll go the other way,'re talking about of progression. All right. All right.
Starting point is 01:14:25 So give me one, Bob. Nice. How about we'll go the other way? The other side of one. Hazard ratio of 2.2. If we said 1.8, it would be an 80% increase. 2.2 would be a 120% increase. How are you doing that?
Starting point is 01:14:42 You're taking 2.2, you're subtracting one, and you get 1.2, and you multiply by 100%. And what you did earlier was when I gave you 0.82, you took one minus 0.82, and you got negative 0.18, which is a reduction of 18%. So again, you can just play with these for like five minutes. It's actually not that complicated, but you just have to do a bunch of them and become familiar with what those numbers mean. Now, let's bring it back to the ARR thing. There's another common theme you'll hear about in trials called the number needed to treat or the NNT analysis. And this gets back to the importance of absolute risk reduction. Let's say there's an example of, let's use the same numbers we used earlier. They're
Starting point is 01:15:33 familiar to me, but you've got a drug that the people who take it have four heart attacks per thousand people over a five-year period. And then the placebo, they have five events over that same period of time per thousand people. The drug reduces the events from five out of a thousand to four out of a thousand. So what's the relative risk reduction there? The relative risk reduction is 20 percent, four minus five divided by five. In this case, is a 20% relative risk reduction. So you might say This is something we should be putting in the drinking water. This is such an important thing But you want to calculate how many people do you need to treat to Prevent the event and to do that you have to take one and divided by the absolute risk reduction, not the relative risk reduction.
Starting point is 01:16:28 And the absolute risk reduction here is .01%. And one divided by .01% is 1000. So now you have to treat a thousand people to achieve the effect, which means you better figure out what the side effects are of that thing, what the cost of that thing is, what the complexity of it to justify it. There may be certain things for which a NNT of 1000 is valuable, but you wouldn't say that across the board. Conversely, if you have a drug that reduces the risk of death
Starting point is 01:16:57 from 4% to 2%, say 4% to 3%, then you would say 4 minus 3 is 1%, 1 divided by 1% is 100. If it took it from 4% to 2%, it would be 1 divided by 2% is 50. If it went from 4% to 1% a reduction of death from 4% to 1%, your difference is 3%, you're now talking about an NNT of 33. As a general rule, we love to see drugs in that sub-100 range of NNT. We tend to not get that impressed when the NNT of something is like a thousand. So again, it's another way to think about the effect size. Okay, I like the number needed to treat from a clinician's perspective or from a practical perspective.
Starting point is 01:17:48 It's really telling embedded in there obviously is the absolute risk and not just relative risk. So we went over p-values and confidence intervals a little bit. I don't think we went over confidence intervals as much as p-values. Do you want to stop and talk about confidence intervals? Sure. Why don't you take this one? I need a drink.
Starting point is 01:18:04 Okay. By the way, non-alcoholic just for those listening. Drink it. Stop and talk about confidence intervals. Sure. Why don't you take this one? I need a drink. Okay. By the way, non-alcoholic just for those listening. Drink it right. What do you drink? Drinking my guia with topochico. So confidence intervals are technically intervals in which the population statistic could lie.
Starting point is 01:18:18 Typically, I think what you see on a paper is this 95% C.I. It's usually abbreviated, but it's a 95% confidence interval. And it's usually reported next to the hazard ratio that we just talked about. Say the hazard ratio is 0.5, which means having of the risk, say in the experimental groups versus the control group, and then you'll see this 95% confidence interval.
Starting point is 01:18:39 And it might say, they'll give you these two numbers. For example, let's just say 0.2 to 1.2 is your confidence interval. And what that is is that's the flip side of the significance level, which is 1 minus alpha. So we've talked about alpha being the p value, but also being the false positive rate. So it's the flip side. So when you see 0.05 for your p value,
Starting point is 01:19:03 that's a tip off that your confidence interval is 95%. And I think a lot of people think about the word confidence in this definition, and they take it to mean the probability that a specific confidence interval. So my example, I think it was 0.4 to say 1.2, that interval between those two numbers or between those two ratios contains the population parameter. They think, okay, we could be 95% confident that the true effect, say, meat consumption and cancer is between these two numbers, but that's not really what the confidence interval suggests. It's more of a suggestion. I don't think this often happens in practice, but if you were to take 100 different samples and compute this
Starting point is 01:19:42 confidence interval, then approximately 95 out of those 100 will contain the true mean value. It's been described by some as an uncertainty interval rather than a confidence interval. So there's another way to do this. It's just kind of a quick and dirty way to do this is just to look at the confidence interval and ask if the interval contains one or not. You gave an example a second ago, Bob Bob you said your hazard ratio was what? Hazard ratio was 0.5 with a confidence interval of 0.4 to 1.2. Okay, so that would not be significant. So even though your hazard ratio you might look
Starting point is 01:20:15 at that and say, oh look that's a big reduction. 0.5 hazard ratio means a 50% reduction, but your confidence interval was very wide. It was all the way from 0.4 up to one point something. So it crosses over unity. Conversely, if you had a hazard ratio of 0.5, but your confidence interval was 0.4 to 0.6 or 0.7, or even up to 0.9, you would say, indeed, that is at the level of 95% that is confidence.
Starting point is 01:20:49 So the other thing you'll notice, by the way, is the closer one edge of the confidence interval comes to one, the closer the p value is to 0.05. When you have like a confidence interval that runs from 1.01 up to 2, your p-values probably about 0.049 or something like that. Whereas when you have confidence intervals that are miles away from 1, the p-values tend to be very small. Yeah, that statistician, Andrew Gellman, he talked about uncertainty in intervals. And
Starting point is 01:21:23 the reason why he says that is imagine you've got a huge confidence intervals, what we call it. So big confidence interval, meaning instead of 0.4 to 1.2, it was like 40% reduction at 0.4 or it went all the way out to like say a thousand. He would say like that's a huge uncertainty interval, but the way that we think about confidence is that's a huge confidence interval. And it's maybe intuitively backwards for some people to think about it that way. Yep, absolutely agree. The tighter the interval, the more confidence you actually have in it. And obviously it can't cross one.
Starting point is 01:21:53 The less uncertainty. That's right, the less uncertainty there is. When you get these monster ones, and this is why I like those sort of tornado graphs that you see in meta-analyses where you visually get to see how much uncertainty existed in a given study. The confidence interval, here's a great example. Hazard ratio was 1.4. Oh wow, 40% increase.
Starting point is 01:22:16 The 95% confidence interval went from 1.1 to 17. Do I really have a lot of confidence in that? No. That's an enormous uncertainty interval. Yeah. And of course, you would want to know, okay, what are we talking about here? Absolutely. And not just relatively. Yeah. Yeah. Yeah. Yeah. I mean, look, I think the takeaway of this entire section is if you make the decision that you want to pay attention to science, you just have to roll up your sleeves and accept the fact you're not going to be able to read these things in the bathtub on a lazy Sunday morning.
Starting point is 01:22:50 You kind of have to roll up your sleeves and pay attention to all of this little stuff. Now it gets easier the more you do it. When I read a paper today, it's so much easier than it was 25 years ago, but you still have to kind of have your guard up for all of these things. You might learn something new from virtually every paper you read. So you didn't read a paper 25 years ago and you read your second paper today. You've read a multitude of papers. And in each one, there's probably something educational in there that you might pick up
Starting point is 01:23:16 and that probably goes to the beginning of the episode you're talking about. I think Tim is asking, how do I get better at this? And it's probably like a consistent repetition. Read a paper, you know, your favorite paper every week. And you can see that some of the stuff we've talked about here, I mean, we just went deep into some statistics and not even that deep, right? I mean, we didn't really explain what the Cox
Starting point is 01:23:33 proportional hazard is and things like that. And we didn't differentiate odds ratios with hazard ratios, which requires getting into more math. Like, you still have to be able to kind of crunch some numbers sometimes. And it's unfortunate that I think a lot of people in the media don't know how to do this, and yet they're the ones that are reporting on these things.
Starting point is 01:23:48 So if you're getting your science info from Twitter and from the news, there's a little bit of a buyer beware. You have to understand the fact that it's very likely that the people that are reporting these things, not because they're necessarily not well-intentioned, but they themselves might not be doing the type of analysis that's necessary. So another question we got is, do studies ever stop midway through? If so, what are the reasons?
Starting point is 01:24:12 Yes, they do. They're generally three reasons that studies are stopped. And again, we're really talking about prospective clinical trials here. So the first and most important of these is safety. So remember, we talked about phase one, phase two, phase three. Well, phase one is all about safety. Phase two is about efficacy and safety. Phase three is really about effectiveness and safety. But notice safety is in all of those. So absolutely. Anytime there's a safety breach, which means there is a statistically significant difference between an important
Starting point is 01:24:46 safety metric, between the groups, that'll just stop the study. The second thing that will stop a study is benefit. Again, the predimet example of when it was first done is it stopped two-thirds of the way through because it was deemed that there was such a benefit to the group on the Mediterranean diet, relative to the low fat diet, that it would have been unethical to let those people in the low fat diet continue for another two and a half years on a diet that was so clearly increasing their risk of mortality. And then the final thing that will stop a study prematurely is futility.
Starting point is 01:25:19 It's a little bit harder to understand, but it actually comes down to that hazard ratio concept, which is able to measure risk temporally in an aggregate fashion. So if two thirds of the way through a study, there's no benefit. And statistically, you know that nothing that's going to happen in the remainder of the study is going to change that. You stop the study.
Starting point is 01:25:41 It's futile to continue the study. Those are basically your big three reasons why a study is going to be stopped. So Peter, I think a good example of stopping a trial for safety was actually there's been several, but one of the C-temp inhibitors, Torsetrapid, I don't know if I pronounced that correctly. I remember this really well. This is one of the few moments in science where I remember where I was standing when the result was announced. It was Q4 of 2006. I was at McKinsey at the time and I was walking up Kierney towards California street.
Starting point is 01:26:16 I heard the news of this and I couldn't believe. I was so sure this was going to be a home run study. Yeah, so in this case, the trial was set up with 7,500 patients about in each group. They're on a C-tep inhibitor and they're all on statins. They're on lipitor in particular. They compared the C-tep inhibitor to just lipitor alone, which serves as a control group. It was actually the C-tep inhibitor plus lipitor versus lipitor alone. This was a Pfizer study, which everybody thought was very cheeky of Pfizer, because lipitor was about to come off patent. Their way
Starting point is 01:26:50 of sort of extending the life of it was saying, Hey, when you pair the C-tep inhibitor with lipitor, it's going to have a benefit because the background on this is that C-tep inhibitor's raised HDL cholesterol. So it's like, we're going to take a drug lipitor that lowers LDL cholesterol. We're going to pair it with a drug that raises HDL cholesterol. So it's like, we're going to take a drug lipitor that lowers LDL cholesterol, we're going to pair it with a drug that raises HDL cholesterol. How could this possibly go wrong? The famous last words. They intended to follow a patient's, almost a five year trial, four and a half years. And along the way, they'll have a review board that's looking at the results of the study. And in this case, they had a monitoring board that was looking at, in this case, they're
Starting point is 01:27:27 looking at death, all-cause mortality. And they found that 82 patients receiving the drug combination had died compared with only 51 on lipitor alone. And so they advised Pfizer to halt the trial at that point, which it did immediately, which it was just a little over a year into the trial when they did it. And in a way it gets back to when you're talking about power analysis, the way that they do this is they have pre-specified p-values where they're kind of sneaking looks at the data. Like we talked about multiple hypothesis testing that they're actually taking a few shots on
Starting point is 01:27:58 goal in a way because after 12 months they're going to actually compare the two groups, see if this thing is a p-value of less than 0.01 depending on what they're looking at. And in this case, they had this pre-specified P value of less than 0.01 based on a test for death from any cause. And they found that. And actually, the paper, there was still a published paper even though the trial only went for 12 months in the New England Journal of Medicine and they report those end points where the study was stopped. And that's a whole other discussion about why did that happen? Because other C-TEP inhibitors would go on to face the same fate.
Starting point is 01:28:33 So the first thought was, well, it was this particular drug, but it turned out that C-TEP inhibitors in general are not a good thing. At best, they do nothing and at worst, they kill people and probably had to do with the fact that they're altering HDL function. But anyway, that's another discussion. In fact, I feel like Tom Deisbring and I, or Ron Kraus and I talked about this at length on one of our podcast. I think it was Tom and I talked about this. It's super interesting. So, remember you telling that story? It's just a quick follow up question. I guess, A, was this the first C-Tep inhibitor that was test... Okay, and then
Starting point is 01:29:06 you had to follow up one. I don't think we've really addressed this, but why would you do an observational study over a randomized controlled trial? In some cases, a good example is try getting it past the IRB, a randomized controlled trial, and get people a carton of marboros. They're going to smoke that over, you know, each day compared to a placebo group. It's unethical. And so with this, it's, I don't know how many times they saw these adverse events with each C-tep inhibitor. And it's, I guess, the same drug class. They might have different mechanisms.
Starting point is 01:29:32 But does there become a point where maybe that's why they have the phase one, phase two, phase three, and they get past those barriers. But in a way, as do you almost assume, it might be unethical to run another C-tepin inhibitor trial if you're seeing differences in death the last, however many times. Well, I mean, I don't think it's unethical because I think they are basically saying, look, it's a different drug. You can change one molecule on a drug and it completely changes the way it works. Look at Cox two inhibitors, you look at celibrex versus viox.
Starting point is 01:30:00 I mean, notwithstanding my views on that, which I talk about with Eriktopoul on our podcast, but basically two drugs, nearly identical, and one was far more efficacious, but also had side effects and a subset of people with hypertension. So I think the real question is at what point do pharma companies say enough is enough? And I lost track. I feel like there were three C-TEP inhibitors that were brought to phase three. And ultimately, there was a Mendelian randomization that looked at C-TEP mutations and really found that this was not going to be a good strategy. That's a good example that you brought up with respect to safety. And then we talked about benefit, which was predamined.
Starting point is 01:30:40 And then what happened in the look ahead, Trial? Because I think look ahead was one that got stopped for futility, right? That was one where they randomly assigned about 5,000 overweight or obese patients with type two diabetes. And it was an intensive lifestyle intervention. That was the intervention group. And then you had diabetes, support and education in the control group. Their primary outcome, what they were looking at is what's called mace. So death from cardiovascular causes major adverse cardiovascular events.
Starting point is 01:31:06 And I think they were going to look for 13 and a half long trial, almost 14 years. And in this case, the trial was stopped just under 10 years. And it was based off of what's called a futility analysis that you explained. Yeah, which basically means no matter what happens from this point on, this study will not be significant. So at the time that it was stopped, the hazard ratio was 0.95. So there was a suggestion of a 5% reduction in the risk of death from cardiovascular events. So in the right direction, but the 95% confidence interval or uncertainty interval, if we're going
Starting point is 01:31:43 to adopt that terminology, was 0.83 to 1.09. So it crossed one. And so you know the p value is going to be greater than 0.05. In fact, the p value was 0.51 or something like that. I mean, it was basically complete chance. There was absolutely no effect. And again, no point in continuing. Okay. So moving on to review process. What is the review process? Once a study is done to get a paper published in a journal. Once a study is done and they've done their analysis and they write up a manuscript,
Starting point is 01:32:14 they'll submit it to a journal for publication. And then that journal will have an editor who will look to see if the paper meets their criteria. And if they think it's original and interesting, is this paper adding something to the body of knowledge? At that point, the editor might just say, hey, this is not really a good fit for our journal or for whatever reason. This is something we're not interested in any further. You're free to go and submit this elsewhere. But otherwise, the editor is going to invite individuals that are typically part of a editorial board to peer review
Starting point is 01:32:46 the manuscript. So you hear this term all the time, right? Which is, is this a peer-reviewed publication? And that's important because not all things that get published have been peer-reviewed. And that's obviously the highest standard. So the reviewers are basically invited not randomly, but because they have some expertise in this area. But other things are important, right? You have to consider the conflicts of interest. They might have to decline if they're conflicted. That's kind of a sticky topic because there were some really obvious conflicts, like financial conflicts of interest, but I think there's a whole deeper discussion
Starting point is 01:33:18 about when you have sort of philosophical conflicts of interest with the person. And that gets into another area, which is peer review can be blinded or not blinded. Ready can be single blinded where the reviewer knows who the author is, but the author doesn't know who the feedback is from. That tends to be very common. I think that's probably the most common one I've seen. They can be double blinded where the reviewer doesn't know who it's being written by and vice versa and they can be completely open.
Starting point is 01:33:45 But again, the most common one that I've seen is single-blinded. You'll typically have three reviewers review something, and they can either accept it outright, reject it outright, or make recommendations for revisions. I think you'll see that as probably the most common thing where they say, we're still interested in this paper, but did you actually consider this hypothesis? So sometimes the revisions are just repeat your analysis. Sometimes it's do another experiment. That won't be the case in a clinical trial. I've had papers where that happened where I've done a series of experiments and I'd written it all up and I'd submitted
Starting point is 01:34:18 and the reviewer came back and said, well, you really should have done this experiment as well, because this would have served as another control. So you go and repeat that experiment. Of course, when you're working in cell culture or something like that, it's not that own risk. And this process can go on several times, but ultimately, the editor makes a decision to accept that paper and publish it or reject it again. And that's basically the process.
Starting point is 01:34:37 And you're typically going to start at the top of the food chain. So you're typically, as an author, you're going to try to get your paper published in the most prestigious journal. I guess that's something we can talk about what determines the procedure of a journal. But you'll sort of keep going down the pecking order until you can get it into the right journal. And sometimes right out of the gate, you just sort of know, like, this is a publication that is really mechanistic and it's really going to be geared towards proceedings of the National Academy of Science versus something
Starting point is 01:35:06 that has really got enormous clinical implications and should go to JAMA or the English Journal of Medicine. There's a little bit of that that's going on as well. Every study that's out there, do they end up getting published? No, many don't. I think this is a really big problem, which is you have this thing called publication bias.
Starting point is 01:35:23 So there's a very, very famous example of this that you and I have spoken about, which is the Minnesota Heart Study. This is an example where a study was done. It ran from what 1967 to 1973, if my memory serves me correctly. And it was looking at people who were in a residential care facility. They had complete control over what these patients ate, and they were randomized to a diet of either normal saturated fat consumption or very low saturated fat consumption, where the saturated fat was substituted with polyunsaturated fats. And at the end
Starting point is 01:35:56 of this seven-year study, of course, the hypothesis being the group that was substituted saturated fat for polyunsaturated fat would have lower cholesterol levels and lower cardiovascular death rate. And at the end of in 1973, when the study concluded, they found that indeed the subjects who were given high amounts of polyunsaturated fats and low saturated fats did in fact have lower cholesterol levels, but the rates of cardiovascular deaths were significantly greater. And they didn't publish the study. That study would remain unpublished until 1989, some 16 years later, when asked why a 16-year delay in publishing that study, the lead author, who's, I don't even remember who it was, Ivan France.
Starting point is 01:36:36 Yeah, France, that's right. There's a senior in a junior, yeah. Yeah, yeah. He said the study didn't turn out the way we wanted it to. That's kind of an egregious example of publication bias. In this case, a negative study, but I think there are a lot of studies that don't get published, even if they're negative. And that's a shame because when something doesn't work, it is just as important as when
Starting point is 01:36:58 it does work. It is unfortunate that not all studies get published because, again, just think about it this way, if you want to go out and do an experiment and 10 people have done that experiment before you and it's always failed, wouldn't it be great to know that? Would that impact your decision on whether or not you want to do the experiment a certain way or would you want to try something a little bit different? So you can see very quickly this becomes problematic when papers don't get published. different. So you can see very quickly this becomes problematic when papers don't get published. Okay. You've got this massive problem publication bias. Do you know of any ways that can combat this? I think there are a lot of people working on this problem. And I think one of the important steps
Starting point is 01:37:36 is pre-registration, which we talked about at the outset, right? Which is you force investigators to pre-register their experiments on clinical trials.gov. That's not just, here's my experiment. It's, here are my statistical methods. Here is my number of subjects. Here's my primary outcome. Here are my secondary outcomes, etc. And that basically makes it a lot harder to say, I'm not going to publish this when it
Starting point is 01:38:02 comes out if it doesn't turn out the way I wanted it to. I don't know if there are particular journals to participate in this. I imagine that they could. They could make it a prerequisite. Your trial must be preregistered in order to be published in our journal. And if it's a journal, you know, worth publishing,
Starting point is 01:38:15 and it's probably not a bad idea. Correct. There's both requirements of journals, and there's also requirements of funding entities, which say we want fund you unless the study is preregistered. Registered reports is a publishing format that an organization called the Center for Open Science. I think that's the one founded by Brian Nosek, is that his name?
Starting point is 01:38:33 I think that's right, yeah. Yeah, Brian would be a great guy to have on the podcast, actually, at some point. So with registered reports, basically, you submit your protocol, almost like the pre-registration. You submit that and at that point, instead of after all the data is collected, it's peer reviewed. And if it's peer reviewed and accepted, based on you've got a high quality protocol, everything looks good, then it's provisionally accepted for publication. Like you said, if it's a negative result and maybe the journal is not going to publish
Starting point is 01:39:02 it, they're basically making a decision. If this is a positive or a negative trial, whatever, however it turns out, your protocol, your plan looks really good. And so we're going to accept it. We're going to basically accept it provisionally, provided that you don't start cutting corners and go away from this plan that we accepted. So you follow your plan and it's like, however the cards fall, it's already been accepted for publication.
Starting point is 01:39:22 That's a pretty novel concept actually. But again, I think it's all in the spirit of how do we make sure that we get rid of publication bias, positive result bias. Again, going back to what we said a second ago, you're far more likely to see something get published if it is a positive finding, that if it's a negative finding, although negative findings are just as important for the establishment of knowledge. Right, let's use the CTEP inhibitor. Imagine no one had ever published the studies demonstrating that CTEP inhibitors were at best
Starting point is 01:39:53 neutral, at worst harmful. Studies of that magnitude can't escape publication, but think of all the bench research that can be going on or the small early phase one trials that can be going on, or the preclinical stuff that's going on. That's very easy to kind of under-report things that are negative. Yeah. One other thing about the registered reports is just thinking about gets to your power analysis. Like as an example, let's say that your studies under-powered, it would be great to have a group of your peers say, tear your study apart, but see if there's anything wrong with it. And they say like, your study is powered to detect like a 70% difference in all cause mortality. They might be pointing something out that basically saying, your
Starting point is 01:40:32 study's dead on arrival if you actually run it this way. Right. Because there's no way you're going to see an effect size greater than 30% and yet you're only powered to detective. It's 70% which is crazy. So either change the experimental design, figure out a way to raise more money to do this study correctly. That's a valuable tool. We touched on this. I think you talked about more reputable journals. That's one of the questions. I think people know that certain journals are more respected than others, but is there a reason why, in particular, something's respected over another journal? Yeah, so there's something called an impact factor. And it's usually something
Starting point is 01:41:08 that changes each year, meaning it's usually evaluated on a per year basis. So it's the ratio between the number of citations, total citations, citing something is referencing that paper. So when you, if you're writing a paper, you would say, well, Kaplan wrote such and such and you cite that paper. It's referenced. Great paper. Great paper by Kaplan. It's the ratio between the total number of citations that come to all articles published by that journal by the total number of articles published by that journal over a previous period of time. So it's typically done over a year. So you would say 27,000 citations came into the journals published out of that article out of 10,000 articles. 27,000 divided by 10,000 would be 2.7, the impact factor would be 2.7.
Starting point is 01:41:57 To put this in context, there's 13,000 journals out there. 98% of them have an impact factor less than 10, 95% of them have an impact factor less than 5, and about half of them have an impact factor less than 2, just to give you a sense of what impact factor looks like. By the way, there's a tail on that is very asymmetric. The number of journals that have an impact factor of 0.4, 0.7, 0.8, I mean, is incredibly hot. If you look at the distribution of this, there's obviously a very long tail on the small end of this.
Starting point is 01:42:31 I've got a table here that I can pull up. Yeah, yeah. Let's take a look at that because I think it's pretty cool to look at this actually. So if you look at this table, you've highlighted the journals that have more than 100,000 citations. What year is this? 2019. So you've got the Wingland Journal of Medicine, which is kind of staggering, right? Nearly 350,000 citations, and we could do the math, but you can tell how many articles were published because if you divide
Starting point is 01:42:58 347,000 by that number, you get 74.699. So that's the impact factor for the Wingland Journal of Medicine. number you get 74.699. So that's the impact factor for the New England Journal of Medicine. The Lancet, 250,000 citations, impact factor 60. So you can sort of see like these are the whatever top 28 journals by impact factor. There's kind of an outlier here, right, which is the cancer journal for clinicians, which has an staggering impact factor of 292, despite only having 40,000 citations. That's a little bit of a skew. I don't really consider that to be in the same league because it's basically the global cancer society statistic article
Starting point is 01:43:39 and therefore it reports on tons of cancer statistics and therefore it doesn't really publish that much, but it gets referenced so much because anytime someone is basically referencing a cancer statistic, they're going to reference that. So I kind of put that in its own little category. And the same, by the way, notice the WHO Technical Reports series has an impact factor of 59, but it's only cited like 3,500 times. So it's cited a lot for a very few number of publications. But again, I think the ones that really matter here clinically, the New England Journal of Medicine, obviously Lancet, JAMA, are sort of your huge clinical ones.
Starting point is 01:44:17 One more question about reading a scientific paper. So do you have a particular process when you read a paper? Do you print it out and start to finish, start from the abstract and work your way through? Or do you have a particular process in general? Yeah, kind of. I mean, I generally do read the abstract first, and that gives me a sense of am I interested in this paper. The title of the paper is usually not sufficient for me to know if I'm going to be interested, but the abstract usually is my go, no go on that. So I could read 10 abstracts in a matter of minutes and decide, do I want to read three of these papers? The next decision I make is how familiar am I with this subject matter?
Starting point is 01:44:57 And if I'm not really familiar with it, I will read the introduction section. A lot of times I am relatively familiar with this every matter, so I'll just skip the introduction section altogether, and I usually go straight to the methods section. And that gets into the details. So this was an exercise study. They did muscle biopsies. I just want to really get right down to it. So how many subjects were there?
Starting point is 01:45:19 How were they randomized? What were the interventions? When were the biopsies taken? Was there a crossover, I just wanna get into all that detail. The next thing I do is I look at the results section, but start with the figures. So I kinda go right into look at the figure
Starting point is 01:45:35 and read the legend. And if the authors have done a good job, it's almost stand alone at that point. So figures and tables should, in my opinion, be stand alone at that point. So figures and tables should, in my opinion, be stand alone. So the legend should explain everything you need to know. And of course, then reading the pros of the results section kind of adds a little bit more color to that. And then the last thing I do is I'll read the discussion section because I'll buy this point, have formulated my own thoughts on what
Starting point is 01:46:04 the strengths and weaknesses of the studies are, what questions remain, etc. Oftentimes the authors will have thought of things that I haven't thought of, or they'll thought of things that I disagree with, and I'll kind of want to go through and do that. So that's my general framework for it, and you'll notice it's quasi-linear, but not entirely linear. Yeah, I'd like your example of figures. I can't remember, but did Steve Rosenberg and this probably talks about the importance of mentorship as well. Did he have advice as far as, I think probably when you're writing a paper, how, like,
Starting point is 01:46:33 your figures and what they should represent? Yeah. That was our process. So when you finished an experiment, the very first thing you did was you made the figures and tables, you made those and the legends first. And that's what you would go in and present to him and you'd present that at Journal Club or not Journal Club at Lab Meeting, rather. And you wouldn't really take pen to paper
Starting point is 01:46:50 to write anything until you had that down. You had to sort of know what are the relevant figures, what are the relevant tables, can I explain them very concisely in a legend? And once you got that down, the paper kind of wrote itself. The methods are really easy to write, the paper kind of wrote itself. The methods are really easy to write, the results is easy to write, and the last thing you would write would be the intro and the abstract. That was just the way that I was taught to do it,
Starting point is 01:47:13 and I found that to be very productive. Okay, I think we've run the list of questions. We got through them all, man. We did, we got through it. Thank you for listening to this week's episode of The Drive. It's extremely important to me to provide all of this content without relying on paid ads. To do this, our work is made entirely possible by our members, and in return, we offer exclusive member-only content and benefits above and beyond what is available for free. So if you want to take your knowledge of this space to the next level, it's our goal to ensure members get back much more than the price of the subscription. Premium membership includes several benefits.
Starting point is 01:47:49 First, comprehensive podcast show notes that detail every topic, paper, person, and thing that we discuss in each episode. And the word on the street is, nobody's show notes rival hours. Second, monthly ask me anything or AMA episodes. These episodes are comprised of detailed responses to subscribe questions, typically focused on a single topic and are designed to offer a great deal of clarity and detail on topics of special interest to our members. You'll also get access to the show notes for these episodes, of course. Third, delivery of our premium newsletter, which is put together by our dedicated team
Starting point is 01:48:25 of research analysts. This newsletter covers a wide range of topics related to longevity and provides much more detail than our free weekly newsletter. Fourth, access to our private podcast feed that provides you with access to every episode including AMAs, Sons, the Speel you're listening to now, and in your regular podcast feed. Fifth, the Qualies. An additional member-only podcast we put together that serves as a highlight reel featuring the best excerpts from previous episodes of the drive. This is a great way to catch up on previous episodes without having to go back and listen to each one of them. And finally,
Starting point is 01:49:03 other benefits that are added along the way. If you want to learn more and access these member-only benefits, you can head over to peteratia-md.com forward slash subscribe. You can also find me on YouTube, Instagram, and Twitter, all with the handle peteratia-md. You can also leave us, review on Apple podcasts podcasts or whatever podcast player you use. This podcast is for general informational purposes only and does not constitute the practice of medicine, nursing, or other professional healthcare services, including the giving of medical advice.
Starting point is 01:49:35 No doctor-patient relationship is formed. The use of this information and the materials linked to this podcast is at the user's own risk. The content on this podcast is not intended to be a substitute for professional medical advice, diagnosis, or treatment. Users should not disregard or delay an obtaining medical advice from any medical condition they have, and they should seek assistance of their healthcare professionals for any such conditions. Finally, I take all conflicts of interest very seriously.
Starting point is 01:50:04 For all of my disclosures and the companies I invest in or advise, please visit peteratimd.com Thanks for watching! you

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.