Orchestrate all the Things - Garbage in, garbage out: data science, meet evidence-based medicine. Featuring Critica CMO David Scales

Episode Date: July 18, 2020

Did you ever wonder how data is used in the medical industry? The picture that emerges leaves a lot to be desired. In the early 90s,the evidence-based medicine movement tried to make medicine mo...re data-driven. Three decades later, we have more data, but not enough context, or transparency. Today's episode features David Scales, Chief Medical Officer at Critica, and an Assistant Professor of Medicine at Weill Cornell Medical College. Scales specialized in internal medicine, and also has a PhD in sociology, with a particular interest in the sociology of science. The conversation with David covers the fundamentals of evidence-based medicine, how data is generated through Randomized controlled trials and accessed via Cochrane Reviews. We draw parallels with best practices in data science and data governance, touching upon provenance, context, and metadata. We also explore the dark side of data in the medical industry: pharmaceutical company involvement, bias, the controversy around Cochrane, and the tyranny of the Randomized controlled trial. We also explore predictive models and data parasites in COVID-19 times, and the role of the World Health Organization. Articles published on ZDNet: https://linkeddataorchestration.com/2020/07/20/garbage-in-garbage-out-data-science-meet-evidence-based-medicine/ https://linkeddataorchestration.com/2020/07/27/data-governance-and-context-for-evidence-based-medicine-transparency-and-bias-in-covid-19-times/

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Orchestrate All the Things podcast. I'm George Amatiotis and we'll be connecting the dots together. Today's episode features David Scales, Chief Medical Officer at Critica and an Assistant Professor of Medicine at Weill Cornell Medical College. Scales specialized in internal medicine and also has a PhD in sociology with a particular interest in the sociology of science. Did you ever wonder how και έχει επίσης έναν ποινό πρόγραμμα στην Σοσιαλόγη με συγκεκριμένο ενδιαφέρον στην Σοσιαλόγη της Φυσικής. Έχετε ποτέ αμύγειο πώς χρησιμοποιείται τα δίκαια στην επιχείρηση της φυσικής επιχείρησης.
Starting point is 00:00:29 Το σχέδιο που εμφανίζεται αφαιρεί πολύ να προσπαθήσει. Στις αρχές των 90, το δημοσιογραφικό δημοσιογραφικό κίνημα προσπαθούσε να κάνει τη δημοσιογραφία πιο διδοξημένη. Τρεις δεκαετίες μετά, έχουμε περισσότερα δίκαια, αλλά δεν έχει αρκετό κοντεκστή ή αυτοκίνηση. Η συζήτηση με τον David επισκεφτεί τα βασικά της δημοσιογραφικής δημοσιογραφικής δημοσιοότερα δίδα, αλλά δεν έχει αρκετό κοντεκστικό ή αυτοκίνητο. Η συζήτηση με τον David επηρέαζε τα βασικά πράγματα της επιβιωτικής φυσικής φυσικής, πώς τα δίδα δημιουργούνται μέσω διδοξιών μεταναστευτικών διαδικασίων και ανάπτυξης μέσω κοχρενών επισκευών. Επιτρέψαμε παραλληλές με τις καλύτερες πρακτικές στην διδάσκαση και την διδάσκαση των δίδας, επηρεάζοντας την προβληματικότητα, τον κοντεκστικό και τα μεταδίδα.
Starting point is 00:01:02 Επιτρέπαμε επίσης την ασφαλή πλευρά των δίδας στην επιχειρηματική ιδρύνη. context and metadata. We also explore the dark side of data in the medical industry, pharmaceutical company involvement, bias, the controversy around Cochrane and the tyranny of the randomized control trial. We also explore predictive models and data parasites in COVID-19 times and the role of the World Health Organization. I hope you will enjoy the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook. So thank you again, David, for making the time for this conversation. And I think, you know, something I typically do with people I have these kind of conversations with is I start by asking
Starting point is 00:01:46 them to say a few things about themselves, their backgrounds, and in your case, that would involve, since the touching point that sort of connected us was your work in Kritika. I was wondering if you'd like to also say a few things about that. So what is critica and you know its mission and what you do there basically sure um so i'm uh currently an assistant professor of medicine at uh weill cornell medical college um i'm uh specialized in internal medicine, um, but also, um, have, uh, training. Uh, I also have a PhD in sociology, um, and have a particular interest in, uh, essentially kind of, uh, sociology of science, uh, is part of what I've studied. Um, and so that affects a lot of my current work
Starting point is 00:02:43 because a lot of my current work is on, uh, what I would essentially call in the broad bucket of doctor patient communication. Um, and that's everything from kind of the placebo effect, um, which I have one project on kind of very much, obviously very kind of clinical in the hospital, kind of the doctor and the patient to other things such as combating misinformation, which is a lot of the work that I do with Critica. Critica, it's a, it's a small NGO started by Jack and Sarah Gorman after they wrote their book denying to the grave.
Starting point is 00:03:22 There's psychological reasons behind science denial. Denying to the Grave, the Psychological Reasons Behind Science Denial. And we linked up because we were both very interested in a lot of the different factors that lead to science denial, denial and make it hard for scientific consensus to kind of guide decision-making. And so essentially our mission is essentially trying to help ensure that scientific consensus is what guides kind of most decision-making, both kind of public decision- making, policy decision-making. That's essentially our goal. I mean, we see obviously scientific consensus is something that can be contested, but it's the type of contesting that should go on among experts. Um, and, uh, and, you know, consensus should be revised, but that consensus is what should guide, um, our behavior, uh, as, uh, as, you know, non-experts and policymakers. I, my role in Critica is basically I'm the chief medical officer, which means I just have an kind of extra time to weigh in on a lot of the medical aspects that we focus on.
Starting point is 00:04:58 So we have a number of different kind of projects and protocols that we're working on. And so I kind of weigh in on a lot of that and bring in my kind of sociological expertise as we try to put that mission into action. Okay. Okay. Thank you. Thank you for the introduction. And so I'll try to give a little background from my side because for people who may be familiar with my work what you do you know at first blush may not sound you know that that's relevant or that familiar
Starting point is 00:05:34 because mostly what i deal with in my column has to do with data databases data science data-driven decision making this kind of thing there is a line that connects these two, but it's not entirely obvious at first. Plus, I'll give a little bit of background to make it more clear, maybe. So the way I came to know about Kritika and what you do is through a letter, an open letter that was signed by yourself and Critica's co-founders, Sarah and Jack Gorman a letter sent to Nature in which he called for
Starting point is 00:06:11 counteracting pseudoscience with respect instead of ridicule as I also mentioned as a reply let's say to another letter sent by another scientist Jane Caulfield from I don't remember his exact affiliation, to be honest, but he's Canadian.
Starting point is 00:06:31 And, well, in his letter, he kind of scoffed, let's say, against pseudoscience, and he called for a kind of different approach. And that letter resonated with me for a number of reasons. First, from a purely communication, let's say, point of view, I think what you called for is just playing more and more efficient, basically. And it also touched me on another level. So basically, one of the pillars, let's say, of science and research, which is exactly what this is about, is keeping one's options open and being open to revising beliefs
Starting point is 00:07:12 and what you basically also touched upon in your introduction. So I think that, especially in this climate, where science is kind of being conducted out in the open and for everyone to see, there's a danger that lurks here. So having blind faith in data and data-driven processes. And this is something I've also kind of touched upon in previous columns. So in a way, more and more organizations are turning towards this evidenced way
Starting point is 00:07:48 of decision-making and operation. And this is also true for healthcare and for organizations operating there. And this is why data scientists are in such high demand. However, there's a famous saying in the data science world, which is garbage in, garbage out. So, you know, your findings can only be a function of the data used to drive them. And so this brings us to another very interesting viewpoint that you put forward.
Starting point is 00:08:17 The analysis you did on what you call the coherent controversy. And, you know, I've already spoken too long for my part, let's say, here. So without any further ado, I'll just let you explain in your own words what Cochrane is, what the controversy around it is, and, you know, where data comes into play into this picture. Sure. I mean, this is interesting. And, George, I'll be very interested to kind of talk more because the kind of the work that you do on data and your audience, I think, I think is exactly some of the folks that Critica would like to reach as we think about exactly kind of what constitutes evidence.
Starting point is 00:09:00 And how do we understand how data and evidence becomes a fact, which I think is a bit of a crux of the issue. So you asked about kind of the, what is Cochrane and kind of what do they do? So for those that haven't heard of Cochrane, Cochrane is a little bit of the gold standard of evidence in medicine in particular. So a lot of physicians that are working in hospitals or in primary care clinics, you know, it's a difficult thing to make every decision based on evidence. And if I have to make every decision and kind of like go to Medline and do a search to kind of weed through all of the clinical data to kind of help me help guide my decisions. And there's just so many questions and so much evidence, so many papers out there that that's impossible. So Cochrane is an organization, it's a consortium. And it's changed over time, but you can essentially think of it as a loose network of scientists and physicians who carry high standards for what constitutes quality evidence, who then do systematic reviews of various different questions in medicine and publish those reviews as Cochrane Reviews.
Starting point is 00:10:29 And these are very high quality in the sense that they're often, um, often the, I don't want to say the last word because they're often kind of done and redone, but they are very much focused on trying to pull in what is considered to be the highest quality evidence. And that's usually randomized controlled trials. That's essentially the things that make it into the Cochrane reviews. And they obviously, even those randomized controlled trials get assessed for their quality as well. But at the end of the day, when somebody picks up a Cochrane review, they're often using that as a shortcut to, okay, what is the best evidence that can answer my question?
Starting point is 00:11:18 And some examples of Cochrane reviews that I've read recently, for example, are, you know, is it a good idea to use something we call BiPAP, which is essentially, if you've ever heard of a CPAP machine, it's something very similar to that. But whether or not that's a good idea to use in people with asthma exacerbations, whether in people with asthma exacerbations, we should give them antibiotics when they come into the hospital. What is the association between different cultures and antibiotic prescribing among primary care physicians? So a lot of these questions that affect clinical management of patients is the kind of thing that we've wanted answers to for a long time. And Cochrane provides a fantastic service by summarizing a lot of that evidence for physicians.
Starting point is 00:12:17 Okay. Just to take a step back, because you mentioned in your answer a couple of things that maybe not everyone is familiar with. So let's start with randomized controlled trial, if you'd like to briefly explain what that is, and then move on to evidence-based medicine, because this is the core of what Cochrane does. Sure. So a randomized controlled trial is, let's just break down the terms, right? So a randomized controlled trial is randomized. It's essentially kind of trying to compare a group that receives the treatment to a control group that either does not receive the treatment or receives a placebo. And it's
Starting point is 00:13:07 controlled in the sense that it has that control group. And people are randomized to one of those conditions or the other. And the idea behind that is that that randomization helps to essentially make the study more generalizable by helping to distribute equally and randomly any potential confounders. And a very powerful way to test if it works, because there's a lot of things that go with giving a medication. If you do a trial that does not have a control group with a placebo, for example, it's really easy to say that, oh, well, this medication worked when really, you know, like that question I just posed of giving antibiotics for an asthma exacerbation, maybe people with asthma would have gotten better anyway. And so you have to ask yourself, does giving the antibiotics actually help reduce the amount of time, reduce the severity? So these are the types of questions that you can
Starting point is 00:14:26 answer with a randomized controlled trial. People also often talk about blinding. You'll hear people talk about kind of, you know, randomized double-blind placebo-controlled trials. And these are other things that increase the quality because, you know, even the people in the study can be biased. And so if you blind them, essentially, if you shield the people in the study to the conditions so that even the doctors giving the medications don't know if the patient is getting a treatment or a placebo, then that also helps reduce the confounding and makes it so that we can trust the results more. Okay. Just to make the connection here with how data scientists are used to think, I would say that this is a very good example of how data scientists also conduct their experiments, their lab trials.
Starting point is 00:15:29 So you're basically trying to break down the parameters that influence your outcome. And then you try to take them in isolation and try to see, okay, so if I change this parameter, what will happen? What's the outcome? And you basically do that for all your parameters, and then you try to come up with an outcome in the end. So would you say that this methodology that you just outlined is what's the core of evidence-based medicine? Yes. I mean, so evidence-based medicine is, you could argue it's a social movement. It kind of congealed in 1992 with the publication of an article kind of recommending it. And it's
Starting point is 00:16:20 important to know what evidence-based medicine was arguing for and arguing against, because it seems intuitive, right, to use the best evidence we have to make clinical decisions. That seems obvious. But before the evidence-based medicine movement, a lot of decisions were essentially made by deferring to kind of the most eminent, you know, professor in the medical college who was probably a very wise physician, but was making decisions based on their experience, based on kind of their clinical judgment, which, you know, I'm not saying was bad. It might've been fantastic. But I think the thing was there was no evidence to support that their decisions were good,
Starting point is 00:17:10 or who knew if they were even biased against particular subpopulations. It was kind of internalized knowledge, let's say, without necessarily being transparent. Exactly. Okay. Yeah, please go on. Yeah, so that started the movement. And that movement has gained a lot of momentum.
Starting point is 00:17:38 And Cochrane, which started about 25 years ago, has just grown in size, in influence, and in importance because it provides such an essential service for physicians who don't have much time to try to weed through a lot of the data and put, you know, that evidence-based medicine ideal into practice. Okay. So, you know, that all sounds, you know, very, very good in theory, like a very, you know, solid and objective and scientific approach of conducting medicine. And, you know, since Cochrane is built around these principles, you know, people may wonder, okay, so what exactly is the controversy around that? And to cut to the chase, I will just quote from your piece what to me was the, you know, what is the focus of that.
Starting point is 00:18:42 So you wrote that Cochrane relies on what is widely agreed is the highest quality evidence, the randomized controlled trial, which you just explained, published in peer-reviewed journals. But the thing is that some people argue that data are often biased, both in individual instances of randomized controlled trials and in the fact that most of those trials and most of that trial data comes from industry-funded sources. So would you like to say a few words about, you know, who raised the controversy and what it's all about?
Starting point is 00:19:18 Yeah, I mean, this has been an internal debate at Cochrane for a long time. I think the name that's most associated with this debate is, and I apologize, I'm going to butcher the pronunciation of his name, but Peter Gottschi. He's been a member of the Cochrane Collaboration for quite a long time. And so he's, you could very much argue he's one of the pioneers of the evidence-based medicine movement. And he's argued in a number of his books that there can be a lot of bias in randomized controlled trials. As you can imagine, if you're a pharmaceutical company and you're trying to show that your drug works, there's a number of slate tricks that you can do that are, you know, within the rules of randomized controlled trials, but that are handicapping your drug to try to make it look better. And that's things sometimes whether or not and how you do the blinding. other things such as what control that you pick, um, because sometimes it's
Starting point is 00:20:29 important to choose a control that is kind of the standard of care. But if you're choosing a control that is, for example, no treatment in a situation where no one would actually not be receiving treatment, um, then that's sometimes something that can affect the outcome in ways that seem to handicap people that are doing drug trials, trying to show evidence that their drug works. So since randomized controlled trials have become kind of the top of the pyramid of evidence, a lot of, you know, interests that have a lot of money to gain by not, I wouldn't say, you know, falsifying because they're doing these studies honestly, but just, you know, just putting a little bit of a finger on the scale to try to make sure that whatever it is that they're working on comes up with positive results. This is the kind of thing that
Starting point is 00:21:27 Peter Gauthier has railed against and has noticed increasingly that the proportion of randomized controlled trials that are out there in the evidence that are supported by pharmaceutical or other kind of private interests has been growing more and more and more to the point where he argues that much of the evidence that Cochran ends up using is essentially coming from what could very easily be biased sources, which raises a lot of questions, right? Because Cochran ostensibly just takes the evidence that is out there and does these reviews and draws conclusions from it. But if the evidence that is out there is biased, then, you know, as George, as you said earlier, you know, garbage in, garbage out.
Starting point is 00:22:20 You know, Peter Gottschalk is worried that if the best evidence we have is tainted by bias, then it's possibly garbage. And then Cochran is possibly only taking that garbage and putting it through their machine, putting this imprimatur of a non-biased systematic review and therefore kind of cleaning the garbage and making it look like it's a it is perhaps better than it is. And so this has created a large controversy where he has in very, you know, you could argue in polite ways, criticized Cochran for this approach and argued for a much more stringent way of determining bias and inclusion of the trials that Cochrane might use. Okay. Well, to me, this kind of argumentation does sound plausible. So if all or most, at at least of your data comes from from one place and that place has you know a vested interest in making that data look you
Starting point is 00:23:31 know one or the other way then yes you know any i would say that any not just any data science but any kind of you know reasonable person would kind of question uh the validity of the conclusions you can draw based on that data but then to me the question would really become okay so is there anything that can be done about it because to a large extent i would say that you know it also kind of makes sense that most of Cochrane's data would come from pharmaceutical companies because well you know they're they're the ones that are largely producing medications and doing trials. So is there anything that can be done about it? Is there anything in his argumentation or anything
Starting point is 00:24:13 that you have in mind that could be done to alleviate that? I think this is what gets to kind of the crux of the issue because I think people at Cochrane, I don't think there's a single person at Cochrane that doesn't recognize that the potential for industry bias is a problem that can sway Cochrane reviews, but it's more a question of what to do about it. And I think that's where there's a lot of very passionate people on various different sides of the debate. And there's not really a consensus because, and this is something that I think your data scientists that read your work, George, would appreciate is that trying to weed out, weed the bias out of your data is extremely difficult.
Starting point is 00:25:01 You essentially have to pick which biases you want. Or at least, if you can't weed out the biases, then at least try to be as transparent as possible about what biases might be there or make the data as transparent and kind of your metadata as transparent as possible so other people can look through it to decide what the biases are. And so there's a number of solutions that are kind of along these lines. Like some people are suggesting that, well, more public money should be put into randomized clinical trials because they're essentially a public good. And therefore, one way to reduce the bias is to make sure that non-biased studies are set up so that there's not one particular interest
Starting point is 00:25:49 that's being represented and using public money to do those studies. There's others that just suggest, you know, any randomized controlled trial is often an enormous multi-year undertaking that gets summarized in what's often like an eight-page journal article. So there's no way to highlight all of the important details and the potential biases in such a small journal article. And so people have been calling for essentially trial registries to put all of the information from these trials in one location so that those that are really concerned about some of these issues can, you know, dig into the weeds and decide whether or not there's any additional biases from some of the original raw data.
Starting point is 00:26:38 So there's a number of solutions out there that some people are suggesting. And some others are even suggesting that we need to revise what we consider as evidence, because the answer might not necessarily be to double down on randomized controlled trials as much as to recognize when randomized controlled trials make a lot of sense to do, like with a therapy or treatment. And when some other type of evidence gathering needs to be done, such as with a complex intervention where it's really impossible to control for all of the potential confounders. Okay, yeah. Again, what you outlined makes sense to me,
Starting point is 00:27:24 even though I'm an outsider in your domain, just coming at it from the data science or critical thinking, if you will, point of view. That all sounds reasonable. The problem, I would argue, with those approaches, in a way, is that they sound radical, basically. And, you know, to me, that's not necessarily a problem that, you know, as we said, oftentimes you need to rethink how you do things. But for many people, I acknowledge it may be an issue that, you know, this is quite disruptive in a way. So I'm wondering if, you know, if you have an idea of how this could be done and to bring another angle into this. And like we said, this is a very timely, I would say,
Starting point is 00:28:15 opportunity to be having this discussion because of the whole context and how science and medical research has been accelerated and so on. So to bring another angle into this discussion, I would say that a few people have in the past kind of put forward similar questions, not just for Cochrane, but for the World Health Organization, arguing, for example, which is kind of true, you know, at least to the best of my knowledge as far as I've been able to research, that a lot of the funding that goes into the World Health Organization comes actually from the same pharmaceutical companies and it's actually tied to specific outcomes, research into specific diseases and so on and
Starting point is 00:28:57 so forth. So the liberty that the World Health Organization has to pursue its own goals in its own ways, perhaps limited. So, and I'm bringing that into this discussion because, you know, one kind of natural thought would be like, okay, you know, this sounds like something which is really big and that needs a kind of, you know, really global discussion and an umbrella organization to host that discussion. But, you know, if we have the whole, which, you know, most people would think of as the most appropriate organization to do that, and we have also similar kinds of questions raised there, then one doesn't, you know, can't help but wonder, okay, so what is there to do and what's the right, you know, what's the right framework to raise these questions and
Starting point is 00:29:45 have this discussion so george i i'm not sure if i understood the question exactly uh right so my i'll try to to rephrase it so i was just saying that you know these um uh these ways of dealing with with bias basically that you uh that you mentioned uh they they do make sense but they also sound in a way disruptive because they you know they would they would mean that we would have to rethink a big part of uh you know how how medicine how evidence-based medicine is conducted, basically. So that's one part of the comments last question, let's say. So in order to do that, the second part would be like,
Starting point is 00:30:35 okay, what's the best way to have a discussion about something which is potentially disruptive? What's the right organization to do that? And therefore, that brings us to the whole issue of the World Health Organization, for which the same kind of questions have been raised. The same kinds of questions that have been raised for Cochrane Electric is,
Starting point is 00:30:57 does the business model affect the way that Cochrane works? And therefore, the same kinds of questions have also been raised for the World Health Organization so my question is and I realize it's actually it's a complex one and there's a number of questions lurking let's say in there but I'm trying to make it a little bit simpler so my question okay, so which is the right place and the right way to be having this discussion about rethinking the way evidence-based medicine is conducted? So George, one of the things I would say is the idea that some of the recommendations are radical might be true, but I think this is why it's important to kind of go back to the beginning of the evidence-based medicine movement, because thinking about
Starting point is 00:31:55 where things stood in 1992, where there wasn't necessarily a good framework for thinking through how to make some of these decisions and what evidence to use. We now have evidence hierarchies. I think the problem is that there's been a lot of unintended consequences from basically setting up those evidence hierarchies and putting so much weight on randomized controlled trials. Because this might not be something that data scientists run into frequently,
Starting point is 00:32:28 but this is something that I see a lot, which is the application of randomized controlled trials to situations that don't really lend themselves to a randomized controlled trial. For example, I can give you, so there's a concept that, you know, some data scientists might be familiar with called hotspotting. This has been used in crime statistics. This is the idea of kind of somewhat using big data to pinpoint exact locations where crime has been the highest. And it's been applied to medicine in situations of like trying to find the places where, in the United States at least,
Starting point is 00:33:06 where kind of the healthcare spending is the highest. And there's a famous group that works in Camden, New Jersey, that's done a lot of hotspotting and trying to put extra resources towards the group of people that are considered to be the highest utilizers of healthcare. With the idea of, oh, if we put more resources into helping these people, then we can hopefully keep them healthier and keep them from costing the health system so much money. So all of this makes a lot of sense. And so they had created a very kind of labor and labor intensive and expensive program of kind of targeting these hotspotters. And people had the idea of doing a randomized controlled trial of randomizing people in this area to getting this, you know, extra intervention that was trying to help coordinate people to extra social services,
Starting point is 00:34:06 or they were not put in this program. And basically, using the rubric of a randomized controlled trial to try to answer this question. And this study came out and it showed that there was actually no difference between the group that got the extra intervention and the group that did not. They essentially had equal hospitalizations over the time period that they were studied. And so that's one of those things where it's easy to think that, oh, well, I guess this intervention didn't work. We shouldn't put our money into it. But context is often key. And the question that we often need to be asking is whether or not a randomized controlled trial really can control for all of the variables.
Starting point is 00:34:48 Because one of the important factors in this study is if you're going to take patients with a lot of complex social needs and you're going to provide them a lot of coordination to other services, you can imagine how well that program works is dependent upon the other services that those people are directed to. And one of the people that was, you know, very intimately involved in the creation of the group kind of recognized that a big problem with the study was that they were, quote, coordinating to nowhere. And in the United States where they were working, there really actually wasn't much in terms of extra services that they were coordinating them to. So it was a good idea, but using a randomized controlled trial was essentially testing no intervention against this intervention that didn't actually have the firepower to help anybody. And so what we've seen is we've seen a kind of this reliance on randomized controlled trials as the epitome of the highest quality of evidence. But the unintended consequence has been that randomized controlled trials are probably being overused. And so you could argue that we've
Starting point is 00:36:04 actually become radical about using randomized controlled trials and probably being overused. And so you could argue that we've actually become radical about using randomized controlled trials and applying them in situations where they don't actually make sense. And so, you know, that kind of points to your question of whether or not these things that people are suggesting are too radical. My argument would be our perspective on randomized controlled trials. You know, and some people in economics that advocate for randomized controlled trials in complex social situations have actually been called randomistas, kind of suggesting that this is a political ideology that they're clinging to, despite the fact that there's a lot of confounding variables that can't be controlled for.
Starting point is 00:36:44 And a number of people are starting to talk about the tyranny of the randomized controlled trial because it's being used in situations that don't make sense and also in situations where the research question, the research methodology should fit the question. And so it's being used inappropriately in times where the research question begs a different methodology. And in terms of, you know, places like the World Health Organization, I mean, it's interesting that you brought that up because there's a lot going on at the World Health Organization that is beyond simply kind of what evidence the World Health Organization uses. Because the best way that I could describe a lot of the biases
Starting point is 00:37:35 that the WHO runs into is, you know, my PhD dissertation was actually looking pretty closely at the World Health Organization. And one of the, one of the key things that I found was in interviewing one of my informants there was he, he used the term, he said, our clients are our member States. The WHO functions, not necessarily to improve the health of individual people around the world, but to serve its clients, which are its member states. And so a lot of what the WHO does and a
Starting point is 00:38:12 lot of how it reacts is not necessarily based on the best evidence, but it is a highly rational organization. But that rationality is often based on the clout of different member states and what those different member states want. And so industry has made its way into the WHO through, you know, governments such as the United States that promote a lot of collaborations with industry. But this has also created a lot of consternation, which you might have seen. There's been a number of mechanisms where this has become kind of a dividing line. One of the best examples was in 2005 and for a few years after, Indonesia refused to share influenza viruses because the influenza viruses that influenza
Starting point is 00:39:08 shared with a global network became patented by Australian pharmaceutical companies. So needless to say, Indonesia was reticent to share things that they might then not be able to afford because they were produced by Australian pharmaceutical companies and so stopped sharing. And it's created a number of dividing lines that essentially comes down to what is the role of industry in general in a lot of the work that the WHO does. And so it's not just the trials, but it's everything from how your influenza vaccine gets made to, you know, how much the WHO recommends sugar be in an average person's diet. Okay. Okay. realize you know this is a huge topic and we could probably be talking for hours just about that but just to move on a little bit because not not because it's not interesting it's super interesting but there's a number of other topics
Starting point is 00:40:21 that i'd like to touch upon but quickly before we get to those um you mentioned that you know basically the um uh you know the gist of what you uh said earlier about the randomized control trial is that you know it's not maybe it's not always the best uh method to use for uh for everything um are you aware of uh or other methods, other methodologies that people are using? They may be at early stage, but are you aware of any of those that could be more suitable for different situations? Yeah, I mean, I think a randomized controlled trial at every opportunity or what I often hear when I'm working in the hospital is like, oh, well, that wasn't a randomized controlled trial with the assumption that because it was not a randomized controlled trial, it doesn't tell us anything, I think is a reflection of bias itself. Because there are certain times that it's impossible to do randomized controlled trials in any reasonable framework of time.
Starting point is 00:41:35 And one example, so Tricia Greenhough at Oxford is someone who is a little bit on the other side of the Cochrane controversy and has written some articles discussing a lot of this. But she's someone who I think is on the right track because she talks about other instances where different types of empirical studies are warranted. And one example that she gave recently in an article on Boston Review is all of the public health measures that we're talking about in a pandemic of, you know, asking the question of do masks work, hand washing, social distancing, wearing eyewear, a number of these things that, you know, you could argue would actually make a lot of sense to do in a randomized controlled trial. But we don't actually have the
Starting point is 00:42:33 time to do a randomized controlled trial for these different public health interventions while we're trying to control the spread of a pandemic. So one of the things that she talks about is how in this situation, because time is of the essence, we might be limited in what we can do. And so sometimes we need to draw in other types of evidence. She describes at times we need to bring in some narrative evidence. And also in a case like this, I think modeling is extremely important because that is sometimes the closest approximation we could get to some sort of trial within the timeframe that we would need to be able to implement a lot of
Starting point is 00:43:27 these public health measures. I do a lot of qualitative work. And so I also find qualitative data extremely important, not on its own. I often talk about how quantitative data is extremely important and can provide a lot of insights and raise a lot of questions. That quantitative data can be used to help kind of really extract the context. And I think the combination of quantitative and qualitative data is extremely important. And yet kind of our evidence-based structure prizes the quantitative randomized controlled trial data without enough of the qualitative data to put those trials in context. Just like the trial of hotspotters that I mentioned that was done in a randomized controlled trial, you know, I think that's fantastic. I think we should do a randomized controlled trial.
Starting point is 00:44:31 But I also think what's equally important is some of the qualitative data that puts that trial in context to help explain why it might have failed so that the next time we can actually do a trial and improve upon that rather than just make the overstated conclusion that like, oh, these hotspotting programs don't work. Okay. So just to give a little bit of context on qualitative data in data science or data governance parlor, I probably call those metadata or or context adding metadata and context to your data sets basically yeah and i would say the two need to work hand in hand
Starting point is 00:45:13 and it's unfortunate because right now there's so much uh emphasis on the quantitative that what we're getting is we're getting an overabundance of quantitative data without sufficient context that it's making it hard to see the biases and the challenges and the problems that a lot of these randomized controlled trials might have. Mm-hmm. Okay. Let's move to, since you mentioned the pandemic earlier and you know it's obviously the backdrop for this discussion, let's quickly assess a couple of cases that are related to that that recently came to light and let's start with modeling. You refer to modeling as one potential way of doing
Starting point is 00:46:04 research in cases where you know it's ongoing or you maybe want to have some predictive results in advance of whatever is taking place. And one kind of famous or infamous, depending on which way you look at it, model is the one put forward by the Imperial College, by this professor named Neil Ferguson, on the basis of which a number of decisions were made earlier in the course of the pandemic. And as you may know, this specific model was later scrutinized by a number of initiatives and researchers.
Starting point is 00:46:46 And what they found was that in terms of the quality of the software or how it was maintained and whether it was transparent or not, it scored pretty low on pretty much all of these dimensions. So in a way, I would say that that kind of brings forward the question of how controlled trials and Cochrane, since these are essentially results that have to do with public health, perhaps they should be publicly funded and therefore belonging to the public domain. Do you think that the same could be said for models, for example,
Starting point is 00:47:40 that these things should be open source and transparent and open to review? I absolutely agree. I mean, George, one of the subtexts of a lot of what we're talking about here is the incentives that different researchers have for kind of the work that they do. And, you know, one of the challenges is that the incentive for most researchers is to keep things private, because if they keep things private and they advance a model or even a randomized trial, that advances their career.
Starting point is 00:48:23 Where if they make it public, people aren't necessarily going to cite their work is their fear. And so that then might hurt their career. They might be able to kind of, you know, put something out there, but then it, it doesn't, it doesn't get them any more research funding. It doesn't get them more papers. And so if they, if they can protect it, then they're protecting their career. And I think this is rational, right? This is one of the things that is a very, very common thing, at least in medicine and
Starting point is 00:49:00 a lot of biomedical research, is that just assumption that things need to be private in order to make money off of them or advance a career. We see this with patents, with pharmaceutical industry work, all the way to some of these models. But I think, George, what you're pointing out is that a lot of people that come from software and are familiar with open source and the global public license and a lot of the efforts to make things in the public domain see that that has actually made things better for everybody um and i think this is this is an ongoing debate and definitely not my field of expertise but the thing that i see is is overall the the more transparency there is the more robust the science becomes not just with modeling but there's uh i I would actually argue with almost all of the statistics that we do. I think there was a fantastic paper that was published in 2018 that I can send you,
Starting point is 00:50:14 George, that it's called Many Analysts, One Dataset. And they say, making transparent how variations in analytic choices affect results. This doesn't sound like a thrilling paper by its title, but what they did is they essentially took one data set, which was an interesting data set for Europeans listening to this. It was a data set of whether or not soccer or football referees seem to have any bias towards dark-skinned players when they were calling fouls. And so they took this data set of fouls and in race and asked 61 different teams to analyze the data and they got 61 different ways of analyzing. There were multiple different types of analyses.
Starting point is 00:51:09 And they got a wide range of results. And any one of these probably could have been published in a journal. But I think what was interesting was by kind of crowdsourcing this, they probably got a little bit more towards a better sense of what might have been going on, rather than just publishing one single paper and keeping the data set private. So I think, you know, when it comes to models, George, like exactly the one that you're describing by Neil Ferguson, and arguably much of the statistical research that we are doing in medical and economics and social science journals needs to be in the public domain and needs to be available for that secondary analysis. And people that created these data sets would need to get essentially kind of more credit for the data sets than for the papers that come from them. There was a debate about this even in the New England Journal of Medicine,
Starting point is 00:52:18 where the editor-in-chief called a group of people, quote-unquote, data parasites on the idea that, you know, if someone did a clinical trial and made the data available publicly immediately, that there would be a lot of data parasites that would steal the data and do put all of the blood, sweat, and tears into creating the data and doing the clinical trial kind of got their just reward of publishing the paper that they wanted to publish. And, you know, I think the argument behind that, you know, rightfully so, the editor-in-chief got pilloried by calling people data parasites because, you know, this is the kind of thing where the more this stuff is analyzed and tempered in kind of the spheres, the scientific spheres, where we can have these debates, I think the more robust our conclusions would be, both with modeling and the statistical work. But there's a cultural shift that has to happen. And academia is lagging behind in providing the incentives
Starting point is 00:53:30 that would make this a lot more feasible. Yeah, I think many people in the data science and machine learning community would actually cringe by hearing this story because for them it's pretty standard that you may get a lot of value out of data sets that other people produce and you may reuse them in very imaginative ways and this is the norm basically I would say. I mean in parallel you do have the fact that many organizations are being gone to their data sets because they get lots of value out of that. So there is that as well.
Starting point is 00:54:11 So these cultures kind of coexist in parallel. they totally see the value in what you describe and just getting additional value out of re-analyzing and re-interpreting somebody else's data. So to call this work, to call someone that does this work a data parasite would be unheard of, I would say. Yeah. I mean, I see what the editor-in-chief is describing
Starting point is 00:54:48 in the sense of like, the amount of work that goes into doing a clinical trial is enormous. And that should be, if you are part of a team that does that work, there should be a reward. But I don't necessarily agree that that reward should be that you simply, he was arguing for data protection, which, you know, is one type of
Starting point is 00:55:14 reward to allow people to kind of write the papers that they want. But my argument would be, rather than protect the data so that they can write, you know, one take on how to analyze the research. My thought would be, let's figure out some way to reward the people who put the effort into making that data robust to begin with. And there's not enough effort put into doing that. Yeah, I don't know. A very simplistic maybe way of thinking about it would be, you know, datasets should be citable just like publications. So if I use a dataset created by some research team, then I should cite them in my publication.
Starting point is 00:55:55 And, you know, for people who do research professionally, that certainly counts for something. Yeah, I would agree with that. And so to kind of wrap up, let's visit another topic, which actually has to do a lot with this, what I would call data provenance, because again, to
Starting point is 00:56:17 kind of mix the two worlds, in the data science parlor, what we're talking about, so having, knowing the origin talking about. So, knowing the origin of your data basically and giving proper attribution where attribution is due, it's called data
Starting point is 00:56:33 product. So, a recent incident that kind of touches upon this aspect is a study conducted on the effectiveness of chloroquine, which involves an entity called Surgisphere. So in a nutshell, what happened there is
Starting point is 00:56:51 what was initially referred to as the most influential COVID-related research up to date was called into question as the result of lack of transparency regarding the origin and trustworthiness of the data, it was based on. So the researchers who conducted the research based their findings on data acquired from Surgisphere, which is a startup claiming to operate as a data broker and providing access
Starting point is 00:57:18 to data collected by a number of hospitals around the world. However, whether that data is veracious or whether the data has been acquired in a transparent fashion is not clear. And as a result of that, the results of the research were put into question and the decisions made as a consequence by the World Health Organizations have been reverted.
Starting point is 00:57:41 So the question there is that, do you think that people who did this research, and not just them in specific, but people who do any kind of research and acquire data sets as part of that, have a sort of responsibility to verify the sources of that data? The short answer is yes.
Starting point is 00:58:03 I think there's obvious challenges to that. It's often, you know, I think for your data scientists, they would know that data can be very deep. And so knowing the data and all the metadata to the degree that is sometimes required to know it as well as the person who constructed the databases is often, you're never going to know it as well as they did. But I think there needs to be a due diligence. And I don't think people can abdicate that responsibility by just saying that there's an aggregator of data. I think it's important to recognize that the quality of data is extremely important. Knowing the data provenance is extremely important and your responsibility as a researcher.
Starting point is 00:58:53 And researchers have to sign a document saying that they have examined kind of the quality of the work that they are submitting to a journal. So it's hard for me to understand how people can sign that document when they're submitting to a journal without doing the due diligence necessary for a lot of the controversies that we're seeing out there about data provenance. So, yeah, we've covered lots of ground and we could easily go on for a long time too. But since we have to wrap up quickly, then I would just do that by asking you whether you think there is, well, obviously, you know, people who conduct medical research are very well trained or have to be very well trained in their specific domain.
Starting point is 00:59:51 Do you think, however, that getting them exposed to some additional training in data science techniques, like how to manage data sets or keeping track of provenance and perhaps even going as far as trying different data analysis techniques. Do you think that would be beneficial for them? Yeah, George, what I would actually say is we should have enough training to know what we don't know. But I'm a big fan of a division of labor. And I recognize that it doesn't make sense for me to get so much training to be able to kind of manage my own relational database and put in the metadata myself, as much as to know what would make a high quality relational database
Starting point is 01:00:48 and be able to hire someone who can actually do that. So I do think more training is necessary because I don't even think we're at the point, on the medical side at least, that we know enough about data science to be able to even always kind of hire the right people and collaborate with the right people who are doing this well. But I think there needs to be a lot of education on that front. And because what I see is that, excuse me, that a lot of physicians are getting education. A lot of physicians are getting education, a lot of scientists are getting education, but there's an assumption that, oh, if we just take a couple of courses in statistics, then we can do our own statistics for these papers. And a lot of times what I see is we don't
Starting point is 01:01:41 know what we don't know. And we're often applying statistical tests that we shouldn't be applying in certain situations, not digging deeply enough into the data to be able to describe kind of what biases are there. And it's sometimes beyond our expertise, but we need to be collaborating with people who can help us do that. Because otherwise, garbage in, garbage out. under our expertise but we need to be collaborating with people who can help us do that um because
Starting point is 01:02:05 otherwise uh otherwise garbage in uh garbage out and a lot of what ends up in medical journals uh can sometimes be um of poor quality okay okay thanks yeah we you've you've summed it up pretty pretty nicely i think thank you it's been a super interesting discussion, one of the most interesting and the most enjoyable I've had. So thanks a lot for your time. I hope you enjoyed the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.