99% Invisible - 489- Pandemic Tracking and the Future of Data
Episode Date: May 4, 2022Data is the lifeblood of public health, and has been since the beginning of the field. But essential data gathering for the COVID pandemic was hindered by a couple of of underlying weakness in the US ...public health apparatus. We have a fractured system where the power lies in US states that don't always coordinate effectively. Also there has been inconsistent funding. When there was an immediate crisis, there would be an infusion of cash. But then, when the crisis passed, the resources would evaporate. We take a look at data gathering in regards to public health from the 1600s to today and how it might change in the future.Support for this episode was provided by the Robert Wood Johnson Foundation (RWJF). The views expressed here do not necessarily reflect the views of the Foundation. RWJF is working to build a culture of health that ensures everyone in the United States has a fair and just opportunity for health and well-being. For more information, visit www.rwjf.org. If you have a hunch about how changes to the way we live, learn, work and play today are shaping our future, share it here: www.shareyourhunch.org
Transcript
Discussion (0)
This episode is one in a four-part series that we're calling the future of.
We'll be exploring how changes of the way we live, learn, work, and play may shape our health and well-being in years to come.
Thanks to the Robert Wood Johnson Foundation for supporting this episode.
The Robert Wood Johnson Foundation is committed to improving health and health equity in the United States.
Learn more about them at rwjf.org.
This is 99% Invisible.
I'm Roman Mars.
Hey Roman.
Hey producer, Delaney Hall.
So you're here with the next story in our series
called The Future of Dota Dota.
And you're here to talk about The Future of Data.
Yes, that's right.
And I feel like I should just level with you and our listeners.
This is a story about spreadsheets, okay? It's a story about spreadsheets and data entry and data
systems. The story about spreadsheets is not even my birthday, Delini. How do I get this marvelous gift?
I should have known I wouldn't have to convince you. You do not have to convince me, I'm already riveted, so let's go.
Okay, so we'll start in the early days of the pandemic.
Today, the entire city of Wuhan is on lockdown.
And as COVID spread in Asia,
and then in other parts of the world,
weighing the most active hotspot right now,
is Italy nearly two months.
You were seeing officials take these wildly unprecedented steps
to control the disease, lockdowns,
quarantines, massive amounts of testing.
And I think a lot of people thought we'd see the same kind of response in the U.S. when
the virus arrived here.
You know, as someone who grew up reading the hot zone, I expected that we had the world's
foremost infectious disease fighting agency in the world. This is Alexis Madrigal. At the time the pandemic started, he was a journalist at the Atlantic,
and he's still a contributing writer there. So I was expecting to see the United States of America
assume global leadership and come up with ways for us to deal with it. And like Alexis,
and come up with ways for us to deal with it. And like Alexis, I was basically imagining scenes from outbreak and contagion in my head.
You know, like brave epidemiologists and hazmat suits working with the best technology available
to get a scary situation under control. And so of course we all wanted that to be the reality.
We all wanted that to be what our pandemic response looked like.
Here's Robinson Meyer, Alexis's colleague at The Atlantic.
Right, the CDC seemed like the last version
of like absolutely competent American technocracy.
Right, like they were on top of it.
But as we started to see cases of COVID here in the US,
Alexis and Rob felt like the response from our public health agencies
was surprisingly muted.
We were not seeing mass testing, like was happening in Asian countries.
And in fact, when they started looking, it was hard to find any concrete numbers
at all about how much testing was happening here.
Which was not good because testing was the most crucial data point that we had for understanding the pandemic at this point.
So I guess the CDC was not tracking that at the time.
Well, at first it was, but then in early March 2020, the agency stopped reporting the total number of nationwide tests.
And basically, they said, most testing is happening at the state level.
So if you want to know, go ask the states.
The CDC just wasn't saying.
So I was like, fine, we're going to go to the states.
So they reached out to all the states and asked some really basic questions, you know, how
many people have you tested?
How many positive cases have you had?
And how many tests can you do each day?
And they were shocked by what they found.
Oh, man, we tested almost nobody.
So what's that mean, almost nobody?
Like, what are the numbers for almost nobody?
Well, on March 6th, which was just a few days after
the CDC stopped reporting testing numbers.
Robin Alexis could only verify that 1,895 people nationwide had been tested for COVID.
Even though the White House was saying that tens or hundreds of thousands of people could
be tested per day by that point.
So I do remember some of this.
There was this huge shortage of tests for hospitals to use at the beginning of the pandemic.
And there were pretty strict rules by the CDC about who could get tested at all, because
you know, this test were in short supply.
Yeah.
And that ended up having enormous implications, because without much testing, it was hard
to know what was happening on the ground.
If you looked at the state of the data, there was no data.
In many states, Kutesta does in people a day.
If the virus was circulating, we had no way to know.
And Alexis and Rob felt like they had wandered
into this weird twilight zone,
where people at the top of our public health agencies
were talking about carrying out a data-driven pandemic response,
but without actually having much data.
And it'd be like, well, it seems like the virus could be everywhere then,
and they'd be like, well, the data doesn't say it is. But the data was sh-
And it wasn't just the testing data that was limited. It was also hard to find numbers on how
many people were hospitalized and how many people had died.
And so Robin Alexis decided to team up with a guy named Jeff Hammerbacher.
He's a scientist and software developer.
And they started compiling their own nationwide data sets based on what states were publishing.
And they thought of this as a stopgap.
These are not fancy things, right?
Cases, hospitalizations, deaths,
number of tests performed.
Like this is basic stuff.
Like you look around the world and like,
all these other places just like have this stuff, you know?
So we were just like, well, obviously,
we'll also soon have this stuff.
Because how could you run a public health response
without good data?
It's like the lifeblood of public health,
and it has been since the beginning of the field.
Okay, before we continue with the saga of Rob and Alexis,
I wanna tell you about some of that early public health history
because some of it's fun.
Honestly, that's big reason,
but also because it shows why data is so important.
And why our system makes it so hard to collect.
Okay, let's do it.
Okay, so there were many innovators in the realm of health data, you know, people like John
Snow, William Far, W.E.B. Du Bois. But I'm going to zoom in on one of the earliest examples. We'll start with
John Grant and the Bills of Mortality.
The Bills of Mortality. I like this one, this is very ominous. I like it a little bit.
So the Bills of Mortality were these mortality reports. They were published in London in
the 1600s. And every week a group of parish clerks would gather and share information about who had died in the neighborhood and how.
And people would read them and they would kind of pick up the local parish thing and be like, oh, you know, John died.
So this is Stephen Johnson. He wrote a book called Extra Life, a short history of living longer.
And he says the bills were a source of fascination for many people, people liked to gossip about them,
but they weren't what you would call structured data.
Like there wasn't a way to detect useful patterns
in the information there.
But then John Grant comes along
and Grant was a Haberdashur slash amateur demographer.
That's a very 1600s combination of interests. Yeah, you don't really get that kind of mix anymore.
So he was a classic kind of engaged amateur and he looked at the bills of mortality and
thought there is more to be learned from these documents.
And so it occurred to him at some point in the early 1660s that if you try to
assemble that data and look at it systematically, that you might be able to tell a lot more about
what was really happening to the health of Londoners. And so he spent all this time going from
kind of parish to parish and reading through all these reports and kind of tabulating basically the data and trying to organize it in a more structured way. And Grant ended up
publishing a pamphlet with a very catchy title. It was called Natural and Political Observations
mentioned in a following index and made upon the bills of mortality. And what it enables both
Grant to do, but also health officials around the city is to suddenly
be able to answer the question, what is really killing people? You know, where are the real threats?
And how are those threats changing over time? Grant's work represented a huge conceptual breakthrough.
His pamphlet was basically the founding document of medical statistics and public health data.
the founding document of medical statistics and public health data. But it ultimately wasn't that useful because there was still a limited understanding of what was actually killing people.
For example, let me share with you the causes of death tabulated in this report.
The list is really pretty funny. According to Grant in 1662,
about 1,300 people in London died of Apoplex,
38 died of Cut of the Stone, 74 died of Falling sickness,
243 died of Dead in the Streets.
Only six died of Leprasy, actually,
which is like you would have expected more in 1662.
158 died of Luna tick,
and then my favorite category,
454 died of suddenly.
You can totally imagine the scenario
where you get the answer suddenly,
you ask, you know, some official comes up
and says, how do you die?
And someone says, suddenly.
Which is, I'm just right down suddenly.
Totally, yeah, like I wonder if gradually shows up in the tabulations.
Well, then I can see, if bad data comes in,
there's not much you can do about that,
but I can still see how it's groundbreaking.
Yeah, it was groundbreaking in that it took these anecdotes
and turned them into data about a population.
But it wasn't actionable, partly because of answers like suddenly,
but also because to act effectively on that data, they needed institutions that could coordinate
a public health response. And those did not really come into being at least in the U.S.
until the middle of the 19th century.
at least in the US until the middle of the 19th century. In the 1800s, a number of cholera epidemics swept across the United States.
And in response, cities and states across the country started to establish local public
health departments.
But the system here grew in a mostly bottom-up way.
It was cities and states first.
It took decades before any real national public health institutions came along.
So when did the federal government began to get more involved with public health across
the whole nation?
Pretty late in the game.
There was one national agency called the Marine Hospital Service, and it cared for sick and injured sailors. Eventually,
in 1912, that agency became the Public Health Service. It started taking on greater powers, but it
was still very hesitant to interfere in anything that was local. This is David Rosner. He's a historian
of Public Health at Columbia University.
The federal government is basically the weakest part of the national structure.
This was, you know, not a country that saw the federal government's much more than a bunch of buildings
in a swamp in the DC area.
Over the next few decades, the national system grew.
The CDC eventually emerged from the public health service
and became, in many ways, the best disease-fighting agency in the world. They pioneered the whole idea
of a data-driven response, you know, using stats to figure out who was being hit by disease
and how to intervene. It's this moment when a lot of American culture begins to turn to technology and science in general
as a means of addressing all these very sticky social problems that plagued us.
And for that, you needed data. You needed some sense of what society looked like.
So if you saw statistics that showed high death rates
in one community or another,
you could begin to rationalize your resources
and identify resources.
And the CDC had some huge successes.
The agency helped eradicate smallpox.
It started the fight against HIV.
It stopped Ebola more than once.
And over a time, it developed a really heroic reputation.
But there was always this underlying weakness in our system, which is that it's very fractured.
It wasn't a coordinated system like some countries have. Instead, our system is this patchwork
of thousands of state and local health departments who all operate pretty independently.
So the CDC can issue guidance,
but ultimately state and local health departments
answer more to their local elected officials
than they do to the CDC.
Yeah, I mean, it sounds like federalism
as a concept is a real challenge to public health.
Like a lot of power resides in the states
and it's made that way, you know, on purpose. But I mean, when it comes to public health,
these things don't have state boundaries. I mean, a flu or a COVID can pass through state boundaries
and does not care about federalism at all as a concept. Right. Viruses do not care about states,
right? They do not care about state jurisdictions.
Over time, there was another damaging pattern that began to develop with the CDC and the
public health system more widely, which was that it struggled for consistent funding. So when
there was an immediate crisis, there would be an infusion of cash, but then when the crisis had passed, the resources would evaporate.
And that only accelerated from the 1980s onward during the Reagan era.
Yeah. Reagan is at the start of many of these things when it comes to like the social safety
net being eroded. Yeah. There's a familiar story for a lot of agencies.
Yeah. And you know, this is a complicated part of the story that we're not going to wade super
deep into.
But to simplify, public health agencies saw their budgets get cut decade after decade
all the way into the 2000s.
The CDC's budget dropped overall from 2010 to 2019.
Over the same time period, local public health departments lost more than 50,000 jobs due to funding cuts.
And we also saw a ton of privatization during this time.
So the hiring of private contractors to do what the government used to do.
And so this was the underfunded and very complicated system
that Robinson Meyer and Alexis Madrigal encountered And so this was the underfunded and very complicated system
that Robinson Meyer and Alexis Madrigal encountered
when they set out to gather their own COVID-related data
at the start of the pandemic.
Here's Alexis.
The federal government itself
like doesn't actually have the people to do the things
that other governments do
because we just decided from Reagan onwards that we were essentially
going to take out state capacity and instead pay consulting fees and contracts to companies
because I guess somehow that's like less government years or something.
And I imagine all of this makes data gathering and sharing really tough.
Yeah, it's kind of hard to imagine a system worse than this one when it comes to data
sharing.
All these entities report data on what's happening locally, often a little bit differently, using
different systems and data conventions.
You just have so many reporting entities and officials and jurisdictions, and it's not just
because we're a large country.
It's also because we have built these data systems
over time in this kind of sedimentary pile.
And it's very difficult to change them
when it's not a crisis.
You know, like it's just hard.
And so did Alexis and Rob like fully understand
the complexities of the health system
when they started to gather these nationwide statistics about COVID.
Not really. Like they just knew that the government wasn't releasing much information about what was going on.
And so they started their own DIY efforts and recruited volunteers to help with it.
And it turned out I was literally the first person
to fill out the volunteer form.
Erin Kassane has worked on a bunch of web-focused projects over the years,
most recently as an editor and director of content with Night Mozilla Open News.
And she had been following COVID really closely and just jumped right into the data gathering.
And the work immediately was just things like, hey, can someone set us up a website?
And at the same time, like, how do we open up this spreadsheet and get it to hold up
with 20 people going and collecting data and what kind of quality process this should
be set up?
Because we obviously need to have people double checking these numbers.
So it was sort of everything all at once.
This group came to be known as the COVID tracking project.
And full disclosure, the COVID tracking project received funding from the Robert Wood Johnson
Foundation, which also funded this story.
Within a few weeks, the COVID tracking project had several hundred volunteers helping out.
There were students, there were people from tech, people from medicine, journalists.
And so what exactly did the work they were doing look like? Like how do you begin to gather it?
Well, this was very simple and very complicated at the same time.
Every day they would reach out to every state and territory in the US to find out how many tests they'd done and numbers of positive cases, hospitalizations, and deaths.
And they used a range of methods to get those numbers.
They watched a ton of press conferences, for example.
And they would also scrape numbers from state websites.
Then they'd put everything into a spreadsheet.
And then like once a day, you sort of commit those numbers
and you're like, okay, those are the days numbers.
And it is easy to do one time.
And it's easy to even do, you know, 20 times.
But as time went on, it started to get much harder
because they began to understand
the idiosyncrasies of the system
and also of the data itself.
For instance, take the most basic unit of this data,
a case of COVID.
Like I would think that would be pretty easy
to define and count, right?
Yeah, yeah, totally.
Well, it is not.
We had to sort of figure out over time
that a case is not a case.
In some places, a case is a confirmed case.
In some places, it's confirmed in what's called probable cases.
Well, what are those?
Okay, here's the definitions.
Is this state combining confirmed in probable cases?
We can't tell.
Let's call the state.
Oh my God.
This is making my head hurt just thinking about it.
I know, and this was true for most of the metrics they were trying to track.
There were all of these different ways that what the states were sending to the federal
government was just slightly different.
You had different data definitions, you used different systems, and no one really had
any idea of how to standardize those things, particularly in the midst of a crisis like
this.
And there are just so many examples of how our COVID data was unstandardized or incomplete.
Like another example, did you know that a lot of our local health
departments are still at the mercy of facts machines?
Oh my god.
What can against him surprise?
I'm still a Paul.
Yeah.
It caused all kinds of problems.
Like some states only reported electronically submitted lab results, others combined electronic submissions
and faxed submissions.
But because people had to enter the faxed information
to their tallies manually, there would be these big
distortions in the data where all of a sudden,
a backlog of faxed results would just be dumped
into the numbers all at once.
Right. You hold all the facts once to the end and then you enter them and then all of a sudden
the numbers leap. And it makes people mistrust data is what it does.
Or here's another example. When there would be a bad surge somewhere and lots of deaths,
the people who fill out death certificates would get really behind. And so death numbers would lag.
death certificates would get really behind. And so death numbers would lag. Or, yet another example, really critical demographic data would just be missing.
There really were whole areas, especially race and ethnicity, where most states and territories
never developed a really robust way to collect that kind of demographic data.
And when they did, it wasn't standardized.
Which of course meant it was hard to tell who was being most impacted by the disease.
And this stuff can feel kind of dry, like ultimately we're talking about spreadsheets and data standardization.
But there was stuff going on at this time,
early in the pandemic,
where really good data could have helped.
But these N95 respirator masks in particular
are in high demand and short supply.
I mean, for example,
there was a massive shortage of protective gear
for healthcare workers.
But still, I have spoken to healthcare workers
in San Francisco, Oakland and San Jose today,
all who say the shortage of supplies in their hospitals is a problem.
If we had had the hospitalization data, knowing who had PPE, who had access to medications,
who had staffing shortages, the federal government can, in fact, step in and help with that
stuff.
Instead, that became this weird, crazy scramble that benefited nobody, as far as I can tell.
So was the U.S. government doing this type of data collection from the states and just
not reporting it, or were they just not doing it at all?
Well, in those early days, the people at the COVID tracking project certainly thought that
the government had its own comprehensive numbers.
They thought that someone somewhere within the vast CDC bureaucracy had stats,
and you know, better stats than they did.
I think all of us really thought that the data did exist,
and we just couldn't see it.
So the work that we were all doing was a stopgap. And presumably, we would
need to do it for a little while until the federal government released the numbers they had.
But then they started noticing some weird coincidences. So this would have been in March and
April 2020. And back then, there were these press conferences that Vice President Mike Pence was doing.
Thank you, Mr. President.
And let me echo your words about all the dedicated men and women on the White House coronavirus
task force.
And the COVID tracking people would watch those press conferences to see what kind of numbers
the government shared and how they compared to the numbers that they were gathering.
It was reported to us that at this moment more than 746,000 Americans have tested positive
for the coronavirus.
Unfortunately, more than 68,000 Americans have fully recovered, but sadly, more than 41,000
Americans have lost their lives
to the coronavirus
uh... and we were like hey you know his numbers really close to ours we must be
doing a good job
because he's getting those federal numbers clearly
and you know
here we are just scraping from public data despite the fact that there have been
more than eight hundred
forty three thousand americans a contract the coronavirus and we grieve the from public data. Despite the fact that there have been more than 843,000 Americans
of contract to the coronavirus and we grieve the loss of more than 47,000 of our
countrymen. And then we realized they were tracking really closely and I think a
few of us started to suspect at that point that he was actually reading our
numbers just rounded. Wow. Wow. Okay.
And then their suspicions were later confirmed when the Trump administration published a report
that clearly used the project's data and charts and cited them in the footnotes.
And it was like, oh, they're just looking at our site. Like, we are the process.
We are the process.
We are the ones who are making this data.
We were waiting for the cavalry, and then it turned out,
like, we were the cavalry, and we were like, no, no, no.
We don't even have horses.
We can't be the cavalry.
And it's kind of darkly funny, but it is also scary.
You know, you kind of think sometimes, like,
well, you know, if disaster X were to happen, well, you know, if disaster X were to happen,
well, you know, somebody's thinking about that.
You know what I mean?
Like, well, somebody's gonna do that.
And the truth is, no, nobody's gonna
f**king do it sometimes.
We were not ready.
We did not have a system in place.
And so I can imagine them feeling panic
about like, oh my God, okay.
So we thought we were just messing around.
We're doing our best.
And we thought that the government was going to save us and they turned out they're not.
But what was part of them kind of proud that the government was using their numbers?
I don't think so.
No.
Okay.
Like, I'm not sure how everyone on the project felt, but at least some of them were pretty shaken
by this realization, like Aaron.
That's an immensely stressful position
to be in for a bunch of volunteers
because on one hand, yeah, it's great
that our numbers are actually really useful.
And on the other hand, are you kidding me?
That's the best you can do with the entire resources
of the federal government is get the data
that we make every day by looking at websites.
I mean, it's really hard for me to understand
at this point, like what was going on at the CDC?
Like what is going on within the agency at this time?
Well, I'll start by saying that we came into the pandemic
with data systems that just were not designed to gather
and process the kind of fast, high-resolution data
that people wanted.
Like the demand for data just went way beyond
what public health officials had ever encountered before
and they were caught flat-footed.
There was also the general
organizational chaos of the Trump administration, which certainly didn't help.
But there were a number of ways the CDC tried to compensate. So as an example, in
April 2020, the agency started working on a new electronic reporting system that would collect detailed
testing data from every state.
It took a long time to get all the states onboarded to that system, like more than a year.
As that was happening, the agency was also using some of its existing surveillance systems
and methods to track this new disease. Finally, CDC quietly launched a new website.
It's, you go to, so for example, this is another press conference held by the coronavirus task force in early April,
where they talk about data that the CDC has started to release.
This surveillance data is bringing together our influenza-like illnesses with their
syndromic management databases so that you can track-
And just to parse that for you.
Yeah, that would be helpful.
So what they were doing is that the agency had adapted a couple of their existing systems
for this new task of tracking COVID.
Their system that tracks unusual levels of disease
in places like emergency rooms and urgent care centers.
That's the syndromeic surveillance data.
And they're flu reporting systems.
And the way that the CDC tracks flu
is that they sample the population
and then model a broader picture.
So the data isn't comprehensive.
And there are definitely some reasons
for using the older system.
For one thing, the states were already used to it.
The United States, the states are used to using this system.
It's in emergency rooms, it's in hospitals, it's in doctors' offices,
and it gives you insight and you can see very clearly.
But the fact was these existing systems
just weren't working that well.
The agency was struggling to keep track of testing
and case rates across the country.
It was struggling to update hospital data,
which includes really critical stuff
like bed availability and ventilator supply.
And with the hospital data, this is like a whole other story.
But the CDC was moving so slowly that eventually the agency that oversees them,
HHS, just took over gathering those stats and built a much better and faster system.
But the CDC still seems to be at the center of all this today.
So does that mean that they eventually started gathering their stats themselves in a different
way or updated their systems?
Yeah, they eventually pivoted, but it took months before they started aggregating and sharing
more of their own data.
So they released their own data tracker in early May,
which was 15 weeks after the first reported case
of COVID in the US, and more than eight weeks after
the launch of the COVID tracking project.
And even when they did that,
they're continued to be problems with the data
and big discrepancies between their testing numbers and state numbers.
And is there a sense of why it would take so long? I mean like a few
enterprises and journalists and a bunch of volunteers had something very quickly.
Why do you think it took the CDC so long? Well, I have reached out to the CDC a number of times
to try and get their take on all this and they haven't responded. But I think a lot of times to try and get their take on all this, and they haven't responded.
But I think a lot of critics of the CDC
think there is something in the structure and culture of the agency
that keeps them from moving fast and breaking protocol in an emergency.
You know, I think there was an attitude among a lot of people in the CDC about not overreacting
to COVID, basically like, oh, well, you know, if we design all these systems around the
disease de jour, they're even have like a comment like this on the CDC's data modernization
page for one of like the conferences they had, you know, basically like quoting someone
at the conference saying like, we can't't like over respond to the disease de jour.
And I'm like, oh, I'm sorry.
Did your other diseases de jures kill 600,000 people?
Like, it's closer to a million now.
I maybe we should be over reacting to this one.
Seems reasonable to design things around this, you know?
And I think that was really a big piece of it.
It was like they didn't want to custom design systems
just for COVID.
That was not what CDC wanted to do.
But in an unprecedented situation with a new disease we had never encountered before,
moving through the population in ways that we were only beginning to understand, we needed
new systems and we needed the public health establishment to be as deep in the data as
the COVID tracking project was.
If you're not in the data every day or every few days, if you don't know how it's constructed,
you don't understand what's actually happening and like where the future hotspots are
and the future places to be concerned, and you don't understand what it looks like
when a state explodes with cases.
So I know the COVID tracking project no longer exists.
So how did they make that decision to end the project?
Well, you know, about a year into the pandemic as Biden was coming into office,
you know, vaccinations were happening, and the pandemic seemed to be
on the way. And so that was when they stopped it. And part of it was that it was taking a toll
on the people involved. Like they had never intended it to be a long-term project. And even
something as mundane as data entry had a high cost.
While this was happening, we had family members dying.
We had people we knew who were in those statistics.
But there was another reason as well, which is that the COVID tracking project founders
really thought the government should be responsible for this work. You know, we did not think that the public health data in widest distribution for the United
States and the COVID pandemic should come from volunteer labor.
And the Biden administration had promised to create a pandemic dashboard, which the COVID
tracking project people were excited about.
They even helped advise on a framework for how to do it.
But now, more than a year into Biden's term, that still has not materialized.
Even though COVID-related data remains both very critical and quite confusing to understand.
And you know, what made the COVID tracking project unique
as an organization was that they were dedicated to researching
and explaining all these various flaws
and inconsistencies in the data.
There were other data trackers and other volunteer data
groups, but it was the COVID tracking project
that was really dedicated to that kind of analysis,
which we still badly need. Data can't talk. Data can't explain itself, particularly when you're
speaking either to, you know, this idea of a general public, but also to reporters, to anybody
in media, to people, even in government agencies, the data has to be contextualized and
explained, and that's still largely not happening.
I should say, all the folks I spoke with at the COVID tracking project recognize that there are
some things the CDC does really, really well. There are incredible scientists at CDC and NIH doing this remarkable world-changing work
on vaccines and all these other things.
And the CDC did have some successes with data as well.
You know, the electronic lab reporting system, the agency help build, has apparently really increased the speed
and accuracy of state data coming into the agency.
But we're still nowhere near
having the kind of surveillance systems we'll need.
The next time a pandemic happens.
If you talk to pandemic people, you know,
like this was like a starter pandemic.
I mean, it just could have been so much worse and it will be so much worse. You know, we know we
are going to face worse threats. And the thing that I have never seen is any real reckoning
in the federal government with what we didn't do, with the failures to build a real surveillance system.
What did you do to make your life a better place?
While I was working on this story,
I ended up thinking a lot about this thing
that Stephen Johnson told me.
He was one of the historians of public health.
I asked him if he had followed the work
of the COVID tracking project and what he made of it.
I had two reactions in a way to the COVID data project,
which was on the one hand,
it seemed scandalous that they had to do the work
that they had to do that that should have already been
underway.
Yeah, that makes sense.
That's my reaction, too.
Yeah.
But two was, there was part of me that was like,
okay, these are the airs to John Grant.
The amateur data collector who does it because they perceive something is missing in the system and
There's not a lot of time to lose and they need to get in there and fill in this missing piece
That's a beautiful
Tradition in the history of health and so it was part of me was kind of moved to see it kick into gear.
I mean, I get that. I love that story, and I'm struck by the fact that John Grant
is a romantic figure of a person like jumping in and filling in a need,
in inventing whole new fields of science at the same time.
But, you know, the story of the COVID tracking project, I'm really impressed by the people who did it,
but it doesn't seem like a romantic story at all.
Like, it really feels like a tragedy to me.
Mm-hmm.
Yeah, I think it is a tragedy.
Like, we shouldn't have to John Grant the pandemic.
Not after hundreds of years of public health developments.
Yeah, I mean, what we really want is a boring story
where a bureaucracy just does its job.
Yeah, competent bureaucrats.
Yeah, here's to the competent bureaucrats.
Coming up next, we'll talk about some potential fixes
for our public health data systems.
We'll hear from a former CDC director
and someone who has thought a lot about who gets erased from our current data and how to make it better.
Stay with us.
Support for this four-part series exploring the future of health and wellbeing
comes from the Robert Wood Johnson Foundation,
which is committed to improving health and health equity in the United States.
Knowing that the healthy, equitable future we all deserve won't simply arrive,
RWJF is exploring how new technologies, scientific discoveries, cultural shifts,
and unforeseen events like those in today's story may shape our lives in years
to come.
Through these explorations, they're learning what it will take to build a future that
provides every individual with a fair and just opportunity to thrive no matter who they
are, where they live, or how much money they make.
Learn more about their efforts at rwjf.org.
If you like thinking about the future of things and have a hundred about the future, share it at shareyourhf.org. And if you like thinking about the future of things and have a hunch about the future,
share it at shareyourhunch.org.
I'm going there now.
Okay, I'm selecting the prompt, I have a hunch.
I have a hunch that the increasing misery of air travel
will cause people to reconsider train travel in the US
and it will be more popular than it has been for decades.
Check out other hunches and share your own hunch
at shareyourhunch.org.
All right, I'm back with Delaney Hall
and we're gonna be talking about how to fix
our public health surveillance systems,
which that sounds really ominous when I say it that way
but that's really what we're trying to fix.
Yeah, this is the good kind of surveillance systems.
This is the surveillance that we want.
So how do we fix them?
I guess, spoiler alert, I don't think there is one clear answer
to that massive problem.
Yeah, I can't say I'm surprised to hear that.
But I did speak with people during my reporting
who had a range of interesting ideas
about how to make things work better, both within the system
and with data
in particular, and especially with race and ethnicity data, which represents one of the biggest
failures in our current system. Oh, that's really interesting. Okay. So tell me more about that.
So I will get to the race and ethnicity data a little bit later. I wanted to start with one immediate fix that came up in
conversation with Alexis Madrigal of the COVID tracking project. And he said it really would have
helped if the federal government had just been extremely clear with states about what information
they wanted reported and how. Because then the data coming from the states
would have been more standardized
and easier to compare and analyze.
Probably the thing that would have made
the biggest single difference on a data level
would be if the federal government had said basically
on an ultra, ultra, ultra precise level
is what we need.
Like we need it to come like from this system and answer all those small questions.
And this was actually something that the COVID tracking project ended up doing in the absence
of really clear guidance from the government.
They pulled together their own guidelines and distributed them to the states.
Oh, that's interesting, but I'm guessing that if that guidance was coming straight from
the government, it would probably be more successful.
Yeah, totally.
I mean, the government, believe it or not, has more authority than a volunteer effort,
however impressive that effort was.
The other thing is, even if the government had been really clear about how it wanted the
data reported,
that would have made things better for sure.
But it wouldn't have solved the underlying issues with the surveillance system.
Those are much bigger and more complex.
You can't fix the data system without fixing the broader public health system.
This is Dr. Thomas Frieden.
He was the director of the CDC from 2009 to 2017.
He also worked as the health commissioner for New York City for seven years.
And he testified before Congress in March 2021.
So this was around the time the COVID tracking project shut down.
And he said in that testimony that, quote, lack of accurate real time information
was one of the greatest failures of the US response to the COVID-19 pandemic.
Wow. Okay. That's definitive. Yeah. I mean, he said that our data systems are broken
basically from the top to the bottom. And he said that it is not just the CDC's fault here.
So I think saying, well, CDC couldn't get the data together,
CDC was dealing with local and state health departments
that were overwhelmed and couldn't collect the data,
hospitals that didn't have standardized data,
and laboratory testing that was insufficient
and the contact tracing system that never really
worked effectively in most places. Wow, I mean, that is a wide range of failures. I mean, I imagine
this all goes back to both the fractured nature of our public health system and the way the states
and the federal government really don't work hand in hand when it comes to these things. And then
we also talked about the hauling out of these institutions and systems
that has been going on for decades, really.
Right. Our system got to this very bad point,
thanks to underfunding it for decades.
And we talked about this a little bit in the piece,
but we're already seeing the cycle of panic and neglect as it's known,
kick in yet again. So a crisis happens, money and
resources pour in, the crisis fades and the money goes away. And this is just not a good way to fund
a system that needs to rebuild some of its critical infrastructure from the bottom up.
You can't build a sustainable system with one-time dollars.
And he really wants to see us build that sustainable system.
He knows everyone in public health knows
that this kind of real-time data collection is important.
But Dr. Frieden says it's going to take years of investment
to fix it.
There needs to be an agreement
about a national architecture for data gathering and sharing.
The government needs to be able to hire really talented programmers.
We need to find workarounds for some very tricky problems,
including the fact that we don't have national health
identifiers in this country, which
means that tracking people across different systems is a big challenge.
There's just, there's a lot to sort out.
We need a multi-year investment to modernize it.
It's not just a matter of replacing fax machines with an electronic secure interchange.
But as we also heard in the main story, this is not just about money or technology. Those
things are definitely important. But there are elements of the CDC's current culture and how it
interacts with local health departments that also needs to change. Right. So my impression of the
CDC is that it's a very scientific and academic
organization in terms of its outlook. They do very careful analysis before they release anything
or recommend anything, and that's not always what the public demands.
Yeah, I think it's safe to say that the agency tends to move slow. It's also freedom-pointed out,
sometimes subject to political oversight
and vetting that can contribute to that slowness.
But for whatever reasons,
I think it moves at a different pace
than people in local health departments
who are frontline responders.
They need to move fast,
sometimes with just the best data available at that moment.
And so freedom thinks there should be more movement sometimes with just the best data available at that moment.
And so, Frieden thinks there should be more movement back and forth between the CDC and local
and state level health departments,
so they can understand each other's needs.
There are two few people working at CDC headquarters
in Atlanta who have worked for two or five or 10 years
at a county or city or state or global health
department embedded to understand that if you need an answer sometimes it's in the next four or
five minutes not in the next four or five hours and certainly not next four or five days.
And so what Dr. Frieden has proposed is having thousands of staff on the CDC payroll who are actually
embedded in county and city health departments for a few years and who then rotate back
to CDC headquarters.
That's an interesting suggestion.
I mean, are there any indications that the CDC is seriously looking at this or changing
its culture in any way?
Well, I mean, I have not heard of anything like what Dr. Friedin is proposing, like a cultural
exchange between the CDC and local health departments.
I also think that the COVID tracking project founders would say there is nothing close
to the level of soul searching that they would like to see happening at the CDC right now. Like a real reckoning with what went wrong during the pandemic.
But you know, there are some new efforts at the agency.
Like for example, a center within the CDC called the Center for Forecasting and Outbreak Analytics,
the CFA.
Okay.
So what's the CFA supposed to do? It's being built as a weather service for disease, a group that can forecast outbreaks,
which is interesting and challenging work.
And how it relates to data is that the quality of any given model and its resulting forecast
depends very heavily on the quality of data that goes into it.
So in our current system, we're even simple metrics like test positivity rates or hospitalizations
are ambiguous, that is going to be a problem for the pandemic modelers.
Totally. I mean, if the CFA is going to be successful, they've got to sort out that
data stuff from the get go because you can have the greatest model in the world and all these people
willing to do it and they could predict amazing things. But if the data is not there, then it does
not matter. Yeah, the data stuff has to come first. It's foundational. Yeah. And then finally,
there's one other aspect of the data that we should talk about because
it represents a huge gap in our current knowledge.
And that is how we collect or rather do not collect data related to race and ethnicity.
Like if we're going to be rebuilding our data systems in the way that Friedin is describing,
it's worth thinking through this question in particular.
Yeah, I remember when I was following the work
at the COVID tracking project,
like this was an issue that they really focused on.
Yeah, it definitely was.
The COVID tracking project ended up developing
a whole wing of their project devoted specifically
to race and ethnicity data.
They did that work in collaboration with Dr. Ibrahim
X. Kendi, who runs the Boston University Center for Anti-Racist Research. And early in the pandemic,
Dr. Kendi wrote a series of essays in the Atlantic where he argued that we really do not know
who's being most impacted by COVID-19 because the data around race is just so limited.
And so why is that? Is it that race and ethnicity information is just not gathered? Is it just not
shared? Like, what is the breakdown here? The data is insufficient in a number of ways. And to help
explain how I'd like to introduce you to Abigail Echo Hawk. We all know somebody. We all know
somebody who is impacted by COVID-19. We all know somebody who is impacted by a death even if
we weren't ourselves. Echo Hawk is a citizen of the Pawnee Nation of Oklahoma and she's the
director of the Urban Indian Health Institute in Seattle, Washington. It's one of 12 tribal epidemiology centers in the country.
And Echo Hawk says that it's clear native people
were disproportionately affected by COVID,
even just anecdotally, like she said.
Everybody knows somebody,
but that it is impossible to know just how much.
Even when data was collected, they weren't collecting the race and ethnicity of American
Indians and Lasca Natives and many other people of color.
So while we know the impact in our people was great with the scarce data we had, we know
it's a gross underreporting.
And what's interesting is that the COVID pandemic has recently focused people's attention on this issue.
Like, this is something people are now talking about, the fact that the pandemic disproportionately affected people of color.
But Echo Hawk has been interested in the issue for much longer.
Because ever since she started her career in public health, she has seen the ways that native people
and other people of color are made invisible in the data.
And it happens through a couple mechanisms.
One is by virtue of being a small population that can be difficult to gather, quote, statistically
significant data about.
And I would be in meeting after meeting after meeting where we would be a little
asterisk down at the bottom that would say not statistically significant or not
reported on. And so what it was is we were invisible and we were invisible in
conversations that policymakers were having. We were invisible when they were
allocating resources and what I saw was incredible health disparities and the deaths of my community members,
of my family members as a direct result.
So that's one way the data around ethnicity is lacking.
Another thing that happens is that people will sometimes
be given options on a form, like maybe black, white, and other.
It's like a limited range.
And other might be the only option that applies,
so they check that box.
That could include Japanese people. It could include American Indians and Alaska natives. It could
include other race or ethnicities. And even when you fill that in, they never disegger get it. That
means that they kind of put it all together and they just count that other. What that does is it effectively hides what's happening
to a particular population of people.
And this issue of aggregation and disaggregation is important.
So, disaggregating the data just means breaking the data
down into smaller units or segments,
instead of bundling a bunch of it up together,
which is what happens in the
other category.
It also sometimes happens with people who are multiple races.
Say somebody like my children who are Mexican-American and also American-Indian, and they mark on a
form Hispanic, they mark American-Indian, and they mark Filipino, because they're also
Filipino.
And when the data is calculated instead, they put them into a category that says multi-race.
But they don't just aggregate it in a way that says they are both Hispanic, they are both
Filipino, and they are also American Indian and Alaska Native.
I mean, anyone could look at this and know that multi-race is a meaningless category that
probably doesn't yield very much information at all. Someone could look at this and know that multi-race is a meaningless category that probably
doesn't yield very much information at all.
That's right.
It is not a very useful category.
And what's interesting is that these other broad categories that we use a lot, like Asian
American as an example, when that's used in public health data, it hides the fact that Asian Americans are an incredibly
diverse group, with very diverse experiences related to health and illness.
Finally, Echo Hawk described yet another way that people of color get erased from the data,
which is racial misclassification.
Racial misclassification is when you go in and instead of asking you what
race or ethnicity you are, somebody might look at you and instead check white
based on visual appearance. Check black based on visual appearance. Oh wow, so she
means that whoever is filling out the form doesn't even ask. Like they just
make an assumption based on appearance and they just fill it in.
Yeah, that's right.
And Echo Hawk says it happens all the time.
And it disproportionately hurts people of color by making the data around their existence
and their health issues just fuzzy and incomplete.
So does Echo Hawk have ideas about how to change the way we collect data so this type of stuff doesn't happen?
Yeah, she does.
She talks about decolonizing data.
And in addition to basic stuff, like disaggregating data and allowing for greater nuance in race and ethnicity categories. She wants to see communities be more involved in deciding
what gets gathered and shared about them.
So she talks about how there's a deficit-based framework
in public health, where in the case of her community,
the data always shows high rates of obesity,
high rates of diabetes, you know, health challenges.
But she also sees a lot of strengths in her community, strengths that can actually
measurably improve health.
And so she'd like to see data gathered around that too.
Yes, we need to know the gaps, but we also need to know how do our youth see themselves
in the future.
If you can see yourself in the future,
that's a protective factor against suicidality. We also want to know where their cultural ties are.
Are they culturally engaged? Do they have the access to the resources for their cultural engagement?
We want to use the strengths of our community, our cultural protective factors. All of those things are things that
can be measured and they can be weighted against the gaps. And her bigger point is just that data
should serve the community and the needs of the community. It shouldn't just be to study the
community and write academic papers about it. It should be actionable and lead to better health
for the communities it comes from.
Well, this is really fascinating and interesting, Delaney.
Thank you so much.
And full disclosure, like a lot of people
who work at the intersection of health and justice,
Abigail Echohawk has received funding
from the Robert Wood Johnson Foundation,
the group that also funded this episode.
the group that also funded this episode.
99% of visible was produced this week by Delaney Hall, music by Swan Rial, sound mix by Amidic and Atra, fact checking by Graham Haysha.
Kurt Coleset is our digital director, the rest of the team includes Vivian Leigh,
Joe Rosenberg, Christopher Johnson, Emmett Fitzgerald, Lashemadon, Jason De Leon, Martin Gonzalez, Sophia Klatsker,
and me Roman Mars.
We are part of the Stitcher and Series XM podcast family.
Now headquartered six blocks north in the Pandora building.
And beautiful.
Uptown, Oakland, California.
You can find the show and join discussions
about the show on Facebook.
You can tweet at me at Roman Mars
and the show at 99PI org.
We're on Instagram and read it too.
You can find links to other Stitcher shows I love,
as well as every past episode of 99PI at 99PI.org.
Thanks again to the Robert Wood Johnson Foundation for underwriting support of this special episode.
Keep an eye out for each episode in this four-part series, The Future of Dota-Dota, and remember,
if you have a hunch about the future, share it at shareyourhunch.org.
you