Orchestrate all the Things - Garbage in, garbage out: data science, meet evidence-based medicine. Featuring Critica CMO David Scales
Episode Date: July 18, 2020Did you ever wonder how data is used in the medical industry? The picture that emerges leaves a lot to be desired. In the early 90s,the evidence-based medicine movement tried to make medicine mo...re data-driven. Three decades later, we have more data, but not enough context, or transparency. Today's episode features David Scales, Chief Medical Officer at Critica, and an Assistant Professor of Medicine at Weill Cornell Medical College. Scales specialized in internal medicine, and also has a PhD in sociology, with a particular interest in the sociology of science. The conversation with David covers the fundamentals of evidence-based medicine, how data is generated through Randomized controlled trials and accessed via Cochrane Reviews. We draw parallels with best practices in data science and data governance, touching upon provenance, context, and metadata. We also explore the dark side of data in the medical industry: pharmaceutical company involvement, bias, the controversy around Cochrane, and the tyranny of the Randomized controlled trial. We also explore predictive models and data parasites in COVID-19 times, and the role of the World Health Organization. Articles published on ZDNet: https://linkeddataorchestration.com/2020/07/20/garbage-in-garbage-out-data-science-meet-evidence-based-medicine/ https://linkeddataorchestration.com/2020/07/27/data-governance-and-context-for-evidence-based-medicine-transparency-and-bias-in-covid-19-times/
Transcript
Discussion (0)
Welcome to the Orchestrate All the Things podcast.
I'm George Amatiotis and we'll be connecting the dots together.
Today's episode features David Scales,
Chief Medical Officer at Critica and an Assistant Professor of Medicine
at Weill Cornell Medical College. Scales specialized in
internal medicine and also has a PhD in sociology with a particular
interest in the sociology of science. Did you ever wonder how και έχει επίσης έναν ποινό πρόγραμμα στην Σοσιαλόγη με συγκεκριμένο ενδιαφέρον στην Σοσιαλόγη της Φυσικής.
Έχετε ποτέ αμύγειο πώς χρησιμοποιείται τα δίκαια στην επιχείρηση της φυσικής επιχείρησης.
Το σχέδιο που εμφανίζεται αφαιρεί πολύ να προσπαθήσει.
Στις αρχές των 90, το δημοσιογραφικό δημοσιογραφικό κίνημα προσπαθούσε να κάνει τη δημοσιογραφία πιο διδοξημένη.
Τρεις δεκαετίες μετά, έχουμε περισσότερα δίκαια, αλλά δεν έχει αρκετό κοντεκστή ή αυτοκίνηση.
Η συζήτηση με τον David επισκεφτεί τα βασικά της δημοσιογραφικής δημοσιογραφικής δημοσιοότερα δίδα, αλλά δεν έχει αρκετό κοντεκστικό ή αυτοκίνητο. Η συζήτηση με τον David επηρέαζε τα βασικά πράγματα της επιβιωτικής φυσικής φυσικής,
πώς τα δίδα δημιουργούνται μέσω διδοξιών μεταναστευτικών διαδικασίων και ανάπτυξης
μέσω κοχρενών επισκευών.
Επιτρέψαμε παραλληλές με τις καλύτερες πρακτικές στην διδάσκαση και την διδάσκαση των δίδας,
επηρεάζοντας την προβληματικότητα, τον κοντεκστικό και τα μεταδίδα.
Επιτρέπαμε επίσης την ασφαλή πλευρά των δίδας στην επιχειρηματική ιδρύνη. context and metadata. We also explore the dark side of data in the medical industry,
pharmaceutical company involvement, bias, the controversy around Cochrane and the tyranny of
the randomized control trial. We also explore predictive models and data parasites in COVID-19
times and the role of the World Health Organization. I hope you will enjoy the podcast.
If you like my work, you can follow Link Data
Orchestration on Twitter, LinkedIn, and Facebook. So thank you again, David, for making the time
for this conversation. And I think, you know, something I typically do with people I have
these kind of conversations with is I start by asking
them to say a few things about themselves, their backgrounds, and in your case, that
would involve, since the touching point that sort of connected us was your work in Kritika.
I was wondering if you'd like to also say a few things about that.
So what is critica
and you know its mission and what you do there basically sure um so i'm uh currently an assistant
professor of medicine at uh weill cornell medical college um i'm uh specialized in internal medicine, um, but also, um, have, uh, training. Uh, I also have a PhD
in sociology, um, and have a particular interest in, uh, essentially kind of, uh, sociology of
science, uh, is part of what I've studied. Um, and so that affects a lot of my current work
because a lot of my current work is on,
uh, what I would essentially call in the broad bucket of doctor patient communication.
Um, and that's everything from kind of the placebo effect, um, which I have one project
on kind of very much, obviously very kind of clinical in the hospital, kind of the doctor and the patient to other things such as combating
misinformation, which is a lot of the work that I do with Critica.
Critica, it's a,
it's a small NGO started by Jack and Sarah Gorman after they wrote their book
denying to the grave.
There's psychological reasons behind science denial. Denying to the Grave, the Psychological Reasons Behind Science Denial.
And we linked up because we were both very interested in a lot of the different factors that lead to science denial, denial and make it hard for scientific consensus to kind of guide decision-making.
And so essentially our mission is essentially trying to help ensure that scientific consensus is what guides kind of most decision-making, both kind of public decision-
making, policy decision-making. That's essentially our goal. I mean, we see obviously
scientific consensus is something that can be contested, but it's the type of contesting that should go on among experts. Um, and, uh, and, you know,
consensus should be revised, but that consensus is what should guide, um, our behavior, uh, as,
uh, as, you know, non-experts and policymakers. I, my role in Critica is basically I'm the chief medical officer, which means I just have an
kind of extra time to weigh in on a lot of the medical aspects that we focus on.
So we have a number of different kind of projects and protocols that we're working on. And so I kind of weigh in on a lot of that and bring in my kind of sociological expertise
as we try to put that mission into action.
Okay.
Okay.
Thank you.
Thank you for the introduction.
And so I'll try to give a little background from my side because for people who may be familiar with my work
what you do you know at first blush may not sound you know that that's relevant or that familiar
because mostly what i deal with in my column has to do with data databases data science data-driven
decision making this kind of thing there is a line that connects these two, but it's not entirely obvious at first.
Plus, I'll give a little bit of background to make it more clear, maybe.
So the way I came to know about Kritika and what you do is through a letter, an open letter
that was signed by yourself and Critica's
co-founders, Sarah and Jack Gorman
a letter sent to Nature
in which he called for
counteracting pseudoscience with
respect instead of ridicule
as I also mentioned
as a reply let's say
to another letter sent
by another scientist
Jane Caulfield from I don't remember his exact affiliation,
to be honest, but he's Canadian.
And, well, in his letter, he kind of scoffed, let's say,
against pseudoscience, and he called for a kind of different approach.
And that letter resonated with me for a number of reasons. First, from a purely
communication, let's say, point of view, I think what you called for is just playing more and more
efficient, basically. And it also touched me on another level. So basically, one of the pillars,
let's say, of science and research,
which is exactly what this is about,
is keeping one's options open and being open to revising beliefs
and what you basically also touched upon in your introduction.
So I think that, especially in this climate,
where science is kind of being conducted out in the open
and for everyone to see, there's a danger that lurks here.
So having blind faith in data and data-driven processes.
And this is something I've also kind of touched upon in previous columns.
So in a way, more and more organizations
are turning towards this evidenced way
of decision-making and operation.
And this is also true for healthcare
and for organizations operating there.
And this is why data scientists are in such high demand.
However, there's a famous saying in the data science world,
which is garbage in, garbage out.
So, you know, your findings can only be a function of the data used to drive them.
And so this brings us to another very interesting viewpoint that you put forward.
The analysis you did on what you call the coherent controversy.
And, you know, I've already spoken too long for my part, let's say, here.
So without any further ado, I'll just let you explain in your own words what Cochrane
is, what the controversy around it is, and, you know, where data comes into play into
this picture.
Sure.
I mean, this is interesting.
And, George, I'll be very interested to kind of talk more because the kind of the work that you do on data and your audience, I think, I think is exactly some of the folks that Critica would like to reach as we think about exactly kind of what constitutes evidence.
And how do we understand how data and evidence becomes a fact, which I think is a bit of a crux of the issue.
So you asked about kind of the, what is Cochrane and kind of what do they do?
So for those that haven't heard of Cochrane, Cochrane is a little bit of the gold standard of evidence in medicine in particular. So
a lot of physicians that are working in hospitals or in primary care clinics,
you know, it's a difficult thing to make every decision based on evidence. And if I have to
make every decision and kind of like go to Medline and do a search to kind of weed through all of the clinical data to kind of help me help guide my decisions.
And there's just so many questions and so much evidence, so many papers out there that that's impossible.
So Cochrane is an organization, it's a consortium. And it's changed over time, but you can essentially think of it as a loose network of scientists and physicians who carry high standards for what constitutes quality evidence, who then do systematic reviews of various different questions in medicine and publish those reviews as Cochrane Reviews.
And these are very high quality in the sense that they're often,
um, often the, I don't want to say the last word because they're often kind of done and redone,
but they are very much focused on trying to pull in
what is considered to be the highest quality evidence. And that's usually randomized controlled
trials. That's essentially the things that make it into the Cochrane reviews. And they obviously,
even those randomized controlled trials get assessed for their quality as well.
But at the end of the day, when somebody picks up a Cochrane review, they're often using that
as a shortcut to, okay, what is the best evidence that can answer my question?
And some examples of Cochrane reviews that I've read recently, for example, are, you know, is it a good idea to use
something we call BiPAP, which is essentially, if you've ever heard of a CPAP machine,
it's something very similar to that. But whether or not that's a good idea to use in people with
asthma exacerbations, whether in people with asthma exacerbations,
we should give them antibiotics when they come into the hospital. What is the association between
different cultures and antibiotic prescribing among primary care physicians? So a lot of these
questions that affect clinical management of patients is the kind of thing that we've wanted answers to for a long time.
And Cochrane provides a fantastic service by summarizing a lot of that evidence for physicians.
Okay.
Just to take a step back, because you mentioned in your answer a couple of things that maybe not everyone is familiar with.
So let's start with randomized controlled trial, if you'd like to briefly explain what that is,
and then move on to evidence-based medicine, because this is the core of what Cochrane does.
Sure. So a randomized controlled trial is,
let's just break down the terms, right? So a randomized controlled trial is randomized.
It's essentially kind of trying to compare a group that receives the treatment to a control group
that either does not receive the treatment or receives a placebo. And it's
controlled in the sense that it has that control group. And people are randomized to one of those
conditions or the other. And the idea behind that is that that randomization helps to essentially make the study more generalizable by helping to
distribute equally and randomly any potential confounders. And a very powerful way to test if it works, because there's a lot of things
that go with giving a medication. If you do a trial that does not have a control group
with a placebo, for example, it's really easy to say that, oh, well, this medication worked when really, you know, like that question I just
posed of giving antibiotics for an asthma exacerbation, maybe people with asthma would
have gotten better anyway. And so you have to ask yourself, does giving the antibiotics actually
help reduce the amount of time, reduce the severity? So these are the types of questions that you can
answer with a randomized controlled trial. People also often talk about blinding. You'll hear
people talk about kind of, you know, randomized double-blind placebo-controlled trials. And these
are other things that increase the quality because, you know,
even the people in the study can be biased. And so if you blind them, essentially, if you shield
the people in the study to the conditions so that even the doctors giving the medications don't know
if the patient is getting a treatment or a placebo, then that also helps reduce the confounding and makes it so that we
can trust the results more. Okay. Just to make the connection here with
how data scientists are used to think, I would say that this is a very good example of how data scientists also conduct their experiments, their lab trials.
So you're basically trying to break down the parameters that influence your outcome.
And then you try to take them in isolation and try to see, okay, so if I change this parameter, what will happen?
What's the outcome?
And you basically do that for all your parameters, and then you try to come up with an outcome
in the end.
So would you say that this methodology that you just outlined is what's the core of evidence-based medicine?
Yes. I mean, so evidence-based medicine is, you could argue it's a social movement.
It kind of congealed in 1992 with the publication of an article kind of recommending it. And it's
important to know what evidence-based medicine was arguing for and arguing against,
because it seems intuitive, right, to use the best evidence we have to make clinical decisions.
That seems obvious. But before the evidence-based medicine movement, a lot of decisions were
essentially made by deferring to kind of the most eminent, you know, professor in the
medical college who was probably a very wise physician, but was making decisions based on
their experience, based on kind of their clinical judgment, which, you know, I'm not saying was bad.
It might've been fantastic. But I think the thing was there was no evidence to support
that their decisions were good,
or who knew if they were even biased against particular subpopulations.
It was kind of internalized knowledge, let's say,
without necessarily being transparent.
Exactly.
Okay.
Yeah, please go on.
Yeah, so that started the movement.
And that movement has gained a lot of momentum.
And Cochrane, which started about 25 years ago, has just grown in size, in influence, and in importance because it provides such an essential service for physicians who don't have much time to try to weed through a lot of the data and put, you know, that evidence-based medicine ideal into practice.
Okay. So, you know, that all sounds, you know, very, very good in theory, like a very,
you know, solid and objective and scientific approach of conducting medicine. And, you know,
since Cochrane is built around these principles,
you know, people may wonder, okay,
so what exactly is the controversy around that?
And to cut to the chase, I will just quote from your piece
what to me was the, you know, what is the focus of that.
So you wrote that Cochrane relies on what is widely agreed is the highest quality evidence,
the randomized controlled trial, which you just explained, published in peer-reviewed
journals.
But the thing is that some people argue that data are often biased, both in individual
instances of randomized controlled trials and in the fact that most of those trials
and most of that trial data comes from industry-funded sources.
So would you like to say a few words about, you know,
who raised the controversy and what it's all about?
Yeah, I mean, this has been an internal debate at Cochrane for a long time.
I think the name that's most associated with this
debate is, and I apologize, I'm going to butcher the pronunciation of his name, but Peter Gottschi.
He's been a member of the Cochrane Collaboration for quite a long time. And so he's, you could
very much argue he's one of the pioneers of the evidence-based medicine movement. And he's argued in a number of his books that there can be a lot of bias in randomized controlled trials.
As you can imagine, if you're a pharmaceutical company and you're trying to show that your drug works, there's a number of slate tricks that you can do that are, you know, within the rules of
randomized controlled trials, but that are handicapping your drug to try to make it look
better. And that's things sometimes whether or not and how you do the blinding. other things such as what control that you pick, um, because sometimes it's
important to choose a control that is kind of the standard of care.
But if you're choosing a control that is, for example, no treatment in a situation where
no one would actually not be receiving treatment, um, then that's sometimes something that can affect the outcome in ways that
seem to handicap people that are doing drug trials, trying to show evidence that their drug works.
So since randomized controlled trials have become kind of the top of the pyramid of evidence, a lot of, you know, interests that
have a lot of money to gain by not, I wouldn't say, you know, falsifying because they're doing
these studies honestly, but just, you know, just putting a little bit of a finger on the scale
to try to make sure that whatever it is that they're working on comes up with positive results. This is the kind of thing that
Peter Gauthier has railed against and has noticed increasingly that the proportion of randomized
controlled trials that are out there in the evidence that are supported by pharmaceutical
or other kind of private interests has been growing more and more
and more to the point where he argues that much of the evidence that Cochran ends up using is
essentially coming from what could very easily be biased sources, which raises a lot of questions,
right? Because Cochran ostensibly just takes the evidence that is out
there and does these reviews and draws conclusions from it. But if the evidence that is out there
is biased, then, you know, as George, as you said earlier, you know, garbage in, garbage out.
You know, Peter Gottschalk is worried that if the best evidence we have is tainted by bias, then it's possibly garbage. And then Cochran is possibly only taking that garbage and putting it through their machine, putting this imprimatur of a non-biased systematic review and therefore kind of cleaning the garbage and making it look
like it's a it is perhaps better than it is. And so this has created a large controversy where he
has in very, you know, you could argue in polite ways, criticized Cochran for this approach and argued for a much more stringent way of determining bias
and inclusion of the trials that Cochrane might use.
Okay.
Well, to me, this kind of argumentation does sound plausible.
So if all or most, at at least of your data comes from
from one place and that place has you know a vested interest in making that data look you
know one or the other way then yes you know any i would say that any not just any data science but
any kind of you know reasonable person would kind of question uh the validity of the conclusions you
can draw based on that data but
then to me the question would really become okay so is there anything that can be done about it
because to a large extent i would say that you know it also kind of makes sense that most of
Cochrane's data would come from pharmaceutical companies because well you know they're
they're the ones that are largely producing medications and doing trials.
So is there anything that can be done about it? Is there anything in his argumentation or anything
that you have in mind that could be done to alleviate that? I think this is what gets to
kind of the crux of the issue because I think people at Cochrane, I don't
think there's a single person at Cochrane that doesn't recognize that the potential for industry
bias is a problem that can sway Cochrane reviews, but it's more a question of what to do about it.
And I think that's where there's a lot of very passionate people on various different sides of
the debate. And there's not really a consensus
because, and this is something that I think your data scientists that read your work, George,
would appreciate is that trying to weed out, weed the bias out of your data is extremely difficult.
You essentially have to pick which biases you want. Or at least,
if you can't weed out the biases, then at least try to be as transparent as possible
about what biases might be there or make the data as transparent and kind of your metadata
as transparent as possible so other people can look through it to decide what the biases are.
And so there's a number of solutions
that are kind of along these lines. Like some people are suggesting that, well, more public
money should be put into randomized clinical trials because they're essentially a public good.
And therefore, one way to reduce the bias is to make sure that non-biased studies are set up so that there's not one particular interest
that's being represented and using public money to do those studies. There's others that just
suggest, you know, any randomized controlled trial is often an enormous multi-year undertaking
that gets summarized in what's often like an eight-page
journal article. So there's no way to highlight all of the important details and the potential
biases in such a small journal article. And so people have been calling for essentially trial
registries to put all of the information from these trials in one location so
that those that are really concerned about some of these issues can, you know, dig into the weeds
and decide whether or not there's any additional biases from some of the original raw data.
So there's a number of solutions out there that some people are suggesting. And some others are even suggesting that we need to
revise what we consider as evidence, because the answer might not necessarily be to double down on
randomized controlled trials as much as to recognize when randomized controlled trials
make a lot of sense to do, like with a therapy or treatment.
And when some other type of evidence gathering needs to be done,
such as with a complex intervention where it's really impossible to control for all of the potential confounders.
Okay, yeah.
Again, what you outlined makes sense to me,
even though I'm an outsider in your domain, just coming at it from the data science or critical thinking, if you will, point of view.
That all sounds reasonable.
The problem, I would argue, with those approaches, in a way, is that they sound radical, basically. And, you know, to me, that's not necessarily a problem that, you know,
as we said, oftentimes you need to rethink how you do things.
But for many people, I acknowledge it may be an issue that, you know,
this is quite disruptive in a way.
So I'm wondering if, you know, if you have an idea of how this could be done and to bring another
angle into this. And like we said, this is a very timely, I would say,
opportunity to be having this discussion because of the whole context and how science and medical
research has been accelerated and so on. So to bring another angle into this discussion,
I would say that a few people have in the past kind of put forward similar questions,
not just for Cochrane, but for the World Health Organization,
arguing, for example, which is kind of true, you know,
at least to the best of my knowledge as far as I've been able to research,
that a lot of the funding that goes into the World Health Organization comes actually from the same pharmaceutical companies
and it's actually tied to specific outcomes, research into specific diseases and so on and
so forth. So the liberty that the World Health Organization has to pursue its own goals in its own ways, perhaps limited.
So, and I'm bringing that into this discussion because, you know, one kind of natural thought
would be like, okay, you know, this sounds like something which is really big and that
needs a kind of, you know, really global discussion and an umbrella organization to host that
discussion. But, you know, if we have the whole, which, you know, most people would think of as the
most appropriate organization to do that, and we have also similar kinds of questions
raised there, then one doesn't, you know, can't help but wonder, okay, so what is there
to do and what's the right, you know, what's the right framework to raise these questions and
have this discussion so george i i'm not sure if i understood the question exactly
uh right so my i'll try to to rephrase it so i was just saying that you know these um
uh these ways of dealing with with bias basically that you uh that you mentioned
uh they they do make sense but they also sound in a way disruptive because they you know they
would they would mean that we would have to rethink a big part of uh you know how how medicine
how evidence-based medicine is conducted, basically.
So that's one part of the comments last question, let's say.
So in order to do that, the second part would be like,
okay, what's the best way to have a discussion
about something which is potentially disruptive?
What's the right organization to do that?
And therefore, that brings us to the whole issue
of the World Health Organization,
for which the same kind of questions have been raised.
The same kinds of questions that have been raised
for Cochrane Electric is,
does the business model affect the way that Cochrane works?
And therefore, the same kinds of questions have also been raised for the World Health Organization so my question is and I realize
it's actually it's a complex one and there's a number of questions lurking let's say in there
but I'm trying to make it a little bit simpler so my question okay, so which is the right place and the right way to be having
this discussion about rethinking the way evidence-based medicine is conducted?
So George, one of the things I would say is the idea that some of the recommendations are
radical might be true, but I think this is why it's important to kind of go
back to the beginning of the evidence-based medicine movement, because thinking about
where things stood in 1992, where there wasn't necessarily a good framework for thinking through
how to make some of these decisions
and what evidence to use.
We now have evidence hierarchies.
I think the problem is that there's been a lot of unintended consequences from basically
setting up those evidence hierarchies and putting so much weight on randomized controlled
trials.
Because this might not be something that data scientists run into frequently,
but this is something that I see a lot,
which is the application of randomized controlled trials
to situations that don't really lend themselves to a randomized controlled trial.
For example, I can give you, so there's a concept that, you know,
some data scientists might be familiar with called hotspotting. This has been used in crime
statistics. This is the idea of kind of somewhat using big data to pinpoint exact locations where
crime has been the highest. And it's been applied to medicine in situations of like trying to find
the places where, in the United States at least,
where kind of the healthcare spending is the highest. And there's a famous group that works
in Camden, New Jersey, that's done a lot of hotspotting and trying to put extra resources
towards the group of people that are considered to be the highest utilizers of healthcare. With the idea of, oh, if we put more resources into helping these people, then we can hopefully
keep them healthier and keep them from costing the health system so much money.
So all of this makes a lot of sense.
And so they had created a very kind of labor and labor intensive and expensive program of kind of targeting these
hotspotters. And people had the idea of doing a randomized controlled trial of randomizing people
in this area to getting this, you know, extra intervention that was trying to help coordinate people to extra social services,
or they were not put in this program. And basically, using the rubric of a randomized
controlled trial to try to answer this question. And this study came out and it showed that there
was actually no difference between the group that got the extra intervention and the group that did
not. They essentially had
equal hospitalizations over the time period that they were studied. And so that's one of those
things where it's easy to think that, oh, well, I guess this intervention didn't work. We shouldn't
put our money into it. But context is often key. And the question that we often need to be asking
is whether or not a randomized controlled trial really can control for all of the variables.
Because one of the important factors in this study is if you're going to take patients with a lot of complex social needs and you're going to provide them a lot of coordination to other services, you can imagine how well that program works is dependent upon the other services that those people are directed to.
And one of the people that was, you know, very intimately involved in the creation of the group kind of recognized that a big problem with the study was that they were, quote, coordinating to nowhere.
And in the United States where they were working, there really actually wasn't much
in terms of extra services that they were coordinating them to. So it was a good idea,
but using a randomized controlled trial was essentially testing no intervention
against this intervention that didn't actually have the firepower to help anybody.
And so what we've seen is we've seen a kind of this reliance on randomized controlled trials as the epitome of the highest quality of evidence. But the unintended consequence has been that
randomized controlled trials are probably being overused. And so you could argue that we've
actually become radical about using randomized controlled trials and probably being overused. And so you could argue that we've actually become
radical about using randomized controlled trials and applying them in situations
where they don't actually make sense. And so, you know, that kind of points to your question
of whether or not these things that people are suggesting are too radical. My argument would be
our perspective on randomized controlled trials.
You know, and some people in economics that advocate for randomized controlled trials in complex social situations have actually been called randomistas, kind of suggesting
that this is a political ideology that they're clinging to, despite the fact that there's
a lot of confounding variables that can't be controlled for.
And a number of people are starting to talk about the tyranny of the randomized controlled
trial because it's being used in situations that don't make sense and also in situations
where the research question, the research methodology should fit the question.
And so it's being used inappropriately
in times where the research question begs a different methodology. And in terms of, you know,
places like the World Health Organization, I mean, it's interesting that you brought that up because
there's a lot going on at the World Health Organization that is beyond simply kind of what evidence the
World Health Organization uses. Because the best way that I could describe a lot of the biases
that the WHO runs into is, you know, my PhD dissertation was actually looking pretty
closely at the World Health Organization. And one of the,
one of the key things that I found was in interviewing one of my informants
there was he, he used the term, he said,
our clients are our member States.
The WHO functions,
not necessarily to improve the health of individual people around the world,
but to serve its clients, which are its member states. And so a lot of what the WHO does and a
lot of how it reacts is not necessarily based on the best evidence, but it is a highly rational
organization. But that rationality is often based on the clout of different member states and what those different
member states want. And so industry has made its way into the WHO through, you know, governments
such as the United States that promote a lot of collaborations with industry. But this has also
created a lot of consternation, which you might have seen.
There's been a number of mechanisms where this has become kind of a dividing line.
One of the best examples was in 2005 and for a few years after, Indonesia
refused to share influenza viruses because the influenza viruses that influenza
shared with a global network became patented by Australian pharmaceutical companies. So needless
to say, Indonesia was reticent to share things that they might then not be able to afford because they were produced by Australian
pharmaceutical companies and so stopped sharing. And it's created a number of dividing lines
that essentially comes down to what is the role of industry in general in a lot of the work that the WHO does. And so it's not just the trials, but it's
everything from how your influenza vaccine gets made to, you know, how much the WHO recommends
sugar be in an average person's diet. Okay. Okay. realize you know this is a huge topic and we
could probably be talking for hours just about that but just to move on a little bit because
not not because it's not interesting it's super interesting but there's a number of other topics
that i'd like to touch upon but quickly before we get to those um you
mentioned that you know basically the um uh you know the gist of what you uh said earlier about
the randomized control trial is that you know it's not maybe it's not always the best uh method
to use for uh for everything um are you aware of uh or other methods, other methodologies that people are using?
They may be at early stage, but are you aware of any of those that could be more suitable for different situations?
Yeah, I mean, I think a randomized controlled trial at every opportunity or what I often hear when I'm working in the hospital is like, oh, well, that wasn't a randomized controlled trial with the assumption that because it was not a randomized controlled trial, it doesn't tell us anything, I think is a reflection of bias itself.
Because there are certain times that it's impossible to do randomized controlled trials
in any reasonable framework of time.
And one example, so Tricia Greenhough at Oxford is someone who is a little bit on the other side of the Cochrane controversy
and has written some articles discussing a lot of this. But she's someone who I think
is on the right track because she talks about other instances where different types of
empirical studies are warranted. And one example that she gave recently
in an article on Boston Review is all of the public health measures that we're talking about
in a pandemic of, you know, asking the question of do masks work, hand washing, social distancing,
wearing eyewear, a number of these things that, you know, you could argue would
actually make a lot of sense to do in a randomized controlled trial. But we don't actually have the
time to do a randomized controlled trial for these different public health interventions
while we're trying to control the spread of a pandemic.
So one of the things that she talks about is how in this situation, because time is of the essence, we might be limited in what we can do.
And so sometimes we need to draw in other types of evidence.
She describes at times we need to bring in some
narrative evidence. And also in a case like this, I think modeling is extremely important because
that is sometimes the closest approximation we could get to some sort of trial within the
timeframe that we would need to be able to implement a lot of
these public health measures. I do a lot of qualitative work. And so I also find qualitative
data extremely important, not on its own. I often talk about how quantitative data is extremely important and can provide a lot of insights and raise a lot of questions.
That quantitative data can be used to help kind of really extract the context.
And I think the combination of quantitative and qualitative data is extremely important. And yet kind of our evidence-based
structure prizes the quantitative randomized controlled trial data without enough of the
qualitative data to put those trials in context. Just like the trial of hotspotters that I
mentioned that was done in a randomized controlled trial,
you know, I think that's fantastic. I think we should do a randomized controlled trial.
But I also think what's equally important is some of the qualitative data that puts that
trial in context to help explain why it might have failed so that the next time we can actually do a
trial and improve upon that rather than
just make the overstated conclusion that like, oh, these hotspotting programs don't work.
Okay.
So just to give a little bit of context on qualitative data in data science or data governance
parlor, I probably call those metadata or or context adding metadata and
context to your data sets basically yeah and i would say the two need to work hand in hand
and it's unfortunate because right now there's so much uh emphasis on the quantitative that
what we're getting is we're getting an overabundance of quantitative data without
sufficient context that it's making it hard to see the biases and the challenges and the
problems that a lot of these randomized controlled trials might have.
Mm-hmm. Okay. Let's move to, since you mentioned the pandemic earlier and you know it's
obviously the backdrop for this discussion, let's quickly assess a
couple of cases that are related to that that recently came to light and let's
start with modeling. You refer to modeling as one potential way of doing
research in cases where you know it's ongoing or you maybe want
to have some predictive results in advance of whatever is taking place.
And one kind of famous or infamous, depending on which way you look at it, model is the
one put forward by the Imperial College, by this professor named Neil Ferguson,
on the basis of which a number of decisions were made earlier
in the course of the pandemic.
And as you may know, this specific model was later scrutinized
by a number of initiatives and researchers.
And what they found was that in terms of the quality of the software
or how it was maintained and whether it was transparent or not,
it scored pretty low on pretty much all of these dimensions.
So in a way, I would say that that kind of brings forward the question of how controlled trials and Cochrane,
since these are essentially results that have to do with public health,
perhaps they should be publicly funded
and therefore belonging to the public domain.
Do you think that the same could be said for models, for example,
that these things should be open source and transparent
and open to review?
I absolutely agree. I mean, George, one of the subtexts of a lot of what we're talking about
here is the incentives that different researchers have for kind of the work that they do.
And, you know, one of the challenges is that the incentive for most researchers
is to keep things private, because if they keep things private
and they advance a model or even a randomized trial,
that advances their career.
Where if they make it public,
people aren't necessarily going to cite their work is their fear.
And so that then might hurt their career.
They might be able to kind of, you know, put something out there, but then it,
it doesn't, it doesn't get them any more research funding.
It doesn't get them more papers. And so if they, if they can protect it, then they're protecting their career.
And I think this is rational, right?
This is one of the things that is a very, very common thing, at least in medicine and
a lot of biomedical research, is that just assumption that things need to be private
in order to make money off of them or advance a career. We see this with patents, with
pharmaceutical industry work, all the way to some of these models. But I think, George,
what you're pointing out is that a lot of people that come from software and are familiar with open source and the global public license and a lot of the efforts to make things in the public domain see that that has actually made things better for everybody um and i think
this is this is an ongoing debate and definitely not my field of expertise but the thing that i
see is is overall the the more transparency there is the more robust the science becomes
not just with modeling but there's uh i I would actually argue with almost all of the statistics
that we do. I think there was a fantastic paper that was published in 2018 that I can send you,
George, that it's called Many Analysts, One Dataset. And they say, making transparent how
variations in analytic choices affect results. This doesn't sound like a thrilling
paper by its title, but what they did is they essentially took one data set, which was an
interesting data set for Europeans listening to this. It was a data set of whether or not
soccer or football referees seem to have any bias towards dark-skinned players when they were
calling fouls. And so they took this data set of fouls and in race and asked 61
different teams to analyze the data and they got 61 different ways of analyzing.
There were multiple different types of analyses.
And they got a wide range of results.
And any one of these probably could have been published in a journal.
But I think what was interesting was by kind of crowdsourcing this, they probably got a little bit more towards a better sense of what might
have been going on, rather than just publishing one single paper and keeping the data set private.
So I think, you know, when it comes to models, George, like exactly the one that you're describing
by Neil Ferguson, and arguably much of the statistical research that we are doing in medical and economics and social science journals needs to be in the public domain and needs to be available for that secondary analysis. And people that created these
data sets would need to get essentially kind of more credit for the data sets than for the papers
that come from them. There was a debate about this even in the New England Journal of Medicine,
where the editor-in-chief called a group of people, quote-unquote, data parasites on the idea that, you know, if someone did a clinical trial and made the data available publicly immediately, that there would be a lot of data parasites that would steal the data and do put all of the blood, sweat, and tears into creating the data and doing the
clinical trial kind of got their just reward of publishing the paper that they wanted to publish.
And, you know, I think the argument behind that, you know, rightfully so, the editor-in-chief got
pilloried by calling people data parasites because, you know, this is the kind of thing where
the more this stuff is analyzed and tempered in kind of the spheres, the scientific spheres,
where we can have these debates, I think the more robust our conclusions would be,
both with modeling and the statistical work. But there's a cultural shift that has to happen.
And academia is lagging behind in providing the incentives
that would make this a lot more feasible.
Yeah, I think many people in the data science
and machine learning community would actually cringe
by hearing this story because for them it's pretty standard that
you may get a lot of value out of data sets that other people produce and you may reuse them in
very imaginative ways and this is the norm basically I would say. I mean in parallel you do
have the fact that many organizations are being gone to their data sets because they get lots of value out of that.
So there is that as well.
So these cultures kind of coexist in parallel. they totally see the value in what you describe
and just getting additional value out of re-analyzing
and re-interpreting somebody else's data.
So to call this work,
to call someone that does this work a data parasite
would be unheard of, I would say.
Yeah.
I mean, I see what the editor-in-chief is describing
in the sense of like,
the amount of work that goes into doing a clinical trial
is enormous.
And that should be,
if you are part of a team that does that work,
there should be a reward.
But I don't necessarily agree that that reward should be
that you simply, he was arguing for data protection, which, you know, is one type of
reward to allow people to kind of write the papers that they want. But my argument would be,
rather than protect the data so that they can write, you know, one take on how to analyze the research.
My thought would be, let's figure out some way to reward the people who put the effort
into making that data robust to begin with.
And there's not enough effort put into doing that.
Yeah, I don't know.
A very simplistic maybe way of thinking about it would be, you know, datasets should be citable just like publications.
So if I use a dataset created by some research team, then I should cite them in my publication.
And, you know, for people who do research professionally, that certainly counts for something.
Yeah, I would agree with that.
And so to kind of
wrap up, let's
visit
another topic, which actually has to do a lot
with this, what I would call
data provenance, because again, to
kind of mix the two worlds,
in the data
science parlor, what we're talking about,
so having, knowing the origin talking about. So, knowing
the origin of your data
basically and giving
proper attribution where attribution
is due, it's called data
product. So,
a recent incident that
kind of touches upon this aspect
is a study
conducted on the effectiveness
of chloroquine,
which involves an entity called Surgisphere.
So in a nutshell, what happened there is
what was initially referred to
as the most influential COVID-related research up to date
was called into question
as the result of lack of transparency
regarding the origin and trustworthiness of the data,
it was based on.
So the researchers who conducted the research based their findings on data acquired from
Surgisphere, which is a startup claiming to operate as a data broker and providing access
to data collected by a number of hospitals around the world.
However, whether that data is veracious
or whether the data has been acquired
in a transparent fashion is not clear.
And as a result of that,
the results of the research were put into question
and the decisions made as a consequence
by the World Health Organizations have been reverted.
So the question there is that,
do you think that people who did this research,
and not just them in specific,
but people who do any kind of research
and acquire data sets as part of that,
have a sort of responsibility
to verify the sources of that data?
The short answer is yes.
I think there's obvious challenges to that. It's often,
you know, I think for your data scientists, they would know that data can be very deep.
And so knowing the data and all the metadata to the degree that is sometimes required to
know it as well as the person who constructed the databases is often, you're never going to
know it as well as they did. But I think there needs to be a due diligence. And I don't think
people can abdicate that responsibility by just saying that there's an aggregator of data.
I think it's important to recognize that the quality of data is extremely important.
Knowing the data provenance is extremely important and your responsibility as a researcher.
And researchers have to sign a document saying that they have examined kind of the quality of the work that they are submitting to a journal.
So it's hard for me to understand how people can sign that document
when they're submitting to a journal without doing the due diligence necessary
for a lot of the controversies that we're seeing out there about data provenance.
So, yeah, we've covered lots of ground and we could easily go on for a long time too.
But since we have to wrap up quickly, then I would just do that by asking you whether you think there is,
well, obviously, you know, people who conduct medical research are very
well trained or have to be very well trained in their specific domain.
Do you think, however, that getting them exposed to some additional training in data science
techniques, like how to manage data sets or keeping track of provenance and perhaps even
going as far as trying different data analysis techniques.
Do you think that would be beneficial for them?
Yeah, George, what I would actually say is we should have enough training to know what
we don't know. But I'm a big fan of a division of
labor. And I recognize that it doesn't make sense for me to get so much training to be able to kind
of manage my own relational database and put in the metadata myself, as much as to know what would make a high quality relational database
and be able to hire someone who can actually do that. So I do think more training is necessary
because I don't even think we're at the point, on the medical side at least, that we know
enough about data science to be able to even always kind of hire the right people
and collaborate with the right people who are doing this well. But I think there needs to be
a lot of education on that front. And because what I see is that, excuse me, that a lot of
physicians are getting education. A lot of physicians are getting education, a lot of scientists are getting
education, but there's an assumption that, oh, if we just take a couple of courses in statistics,
then we can do our own statistics for these papers. And a lot of times what I see is we don't
know what we don't know. And we're often applying statistical tests
that we shouldn't be applying in certain situations,
not digging deeply enough into the data
to be able to describe kind of what biases are there.
And it's sometimes beyond our expertise,
but we need to be collaborating with people
who can help us do that.
Because otherwise, garbage in, garbage out. under our expertise but we need to be collaborating with people who can help us do that um because
otherwise uh otherwise garbage in uh garbage out and a lot of what ends up in medical journals uh
can sometimes be um of poor quality okay okay thanks yeah we you've you've summed it up pretty
pretty nicely i think thank you it's been a super interesting discussion,
one of the most interesting and the most enjoyable I've had.
So thanks a lot for your time.
I hope you enjoyed the podcast.
If you like my work, you can follow Link Data Orchestration
on Twitter, LinkedIn, and Facebook.