CyberWire Daily - Contact tracing as COVID-19 aid. [Research Saturday]
Episode Date: April 25, 2020Successful containment of the Coronavirus pandemic rests on the ability to quickly and reliably identify those who have been in close proximity to a contagious individual. Mayank Varia from Boston Uni...versity describes how his team suggests an approach based on using short-range communication mechanisms, like Bluetooth, that are available in all modern cell phones. The research can be found here: Anonymous Collocation Discovery: Harnessing Privacy to Tame the Coronavirus Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
You're listening to the Cyber Wire Network, powered by N2K. of you, I was concerned about my data being sold by data brokers. So I decided to try Delete.me.
I have to say, Delete.me is a game changer. Within days of signing up, they started removing my
personal information from hundreds of data brokers. I finally have peace of mind knowing
my data privacy is protected. Delete.me's team does all the work for you with detailed reports
so you know exactly what's been done. Take control of your data and keep your private life Thank you. JoinDeleteMe.com slash N2K and use promo code N2K at checkout.
The only way to get 20% off is to go to JoinDeleteMe.com slash N2K and enter code N2K at checkout.
That's JoinDeleteMe.com slash N2K, code N2K.
Hello, everyone, and welcome to the CyberWire's Research Saturday.
I'm Dave Bittner, and this is our weekly conversation with researchers and analysts tracking down threats and vulnerabilities and solving some of the hard problems of
protecting ourselves in a rapidly evolving cyberspace.
Thanks for joining us.
And now, a message from our sponsor, Zscaler, the leader in cloud security.
Enterprises have spent billions of dollars on firewalls and VPNs,
yet breaches continue to rise by an 18% year-over-year increase in ransomware attacks
and a $75 million record payout in 2024.
These traditional security tools expand your attack surface with public-facing IPs
that are exploited by bad actors more easily than ever with AI tools. It's time to rethink your
security. Zscaler Zero Trust plus AI stops attackers by hiding your attack surface, making
apps and IPs invisible, eliminating lateral movement, connecting users only to specific apps,
not the entire network, continuously verifying every request based on identity and context.
Simplifying security management with AI-powered automation.
And detecting threats using AI to analyze over 500 billion daily transactions.
Hackers can't attack what they can't see.
Protect your organization with Zscaler Zero Trust and AI.
Learn more at zscaler.com slash security.
We started with a group of three of us at Boston University,
Professors Ari Trachtenberg, Ron Canetti, and myself.
That's Mayank Varia.
He's a research associate professor of computer science at Boston University.
The research we're discussing today is titled
The PACT Protocol Specification.
Soon after the social distancing protocols were put into place,
after all of the stay-at-home orders were put into place,
we started thinking about what might happen in a future world where those orders eventually
start to become lifted, subject, of course, to public health guidance, and people start moving
about in the world again, and what might be done as a society from a health perspective to try to reduce the risk of the spread of the coronavirus
in this future world. And based on our reading and talking to people in the healthcare space,
epidemiologists, et cetera, it sounds like from a healthcare perspective, there's sort of two
categories of approaches to dealing with the spread of a disease. One approach is a general quarantine,
which is the state of the world right now. Everybody stays at home. And the other is a
targeted quarantine, where if you identify people who are susceptible, who may have been diagnosed
or likely to have coronavirus, you can ask them to self-quarantine rather than the entire population.
And one of the challenges with the coronavirus specifically is that the disease is something
that you could potentially have and be able to transmit before you even realize, in the sense
before you're symptomatic. And so the question is, how can you find a method to learn as early as possible whether you might be at higher risk of having acquired the coronavirus? diagnosed with coronavirus, then they, together with medical health professionals, will try to
identify all of the people with whom they have come into close contact. And that is a very
important process, a great thing that the medical community is doing. But just like everything else
in the healthcare space nowadays, they're becoming stressed. They're hitting up to capacity with a lot of the people who are reporting being diagnosed
with this disease. So the question that we had that started off our work was, if there is a world
where we want some kind of automated system that can supplement the existing manual contact tracing
efforts, what could an automated computerized system look like? And how could
we use our own specific speciality of security and privacy research to ensure that if there is
an automated contact tracing system, it is as privacy preserving as possible and also has as
strong of integrity as possible against the threats of people uploading spurious information.
So how can we ensure privacy and authenticity of any information in any kind of automated contact tracing system?
So PEC stands for Private Automated Contact Tracing.
Let's dig into the protocol together.
Again, let's start at the high level.
What are you all proposing here?
So at a high level, what we're proposing is a system
that would allow people through their cell phones
or any other kind of electronic device that they may carry with them,
like a wearable, et cetera, to identify in a privacy-preserving manner when they come into close contact with
somebody else and their smartphone or wearable device or whatnot. So to use, rather than
tracking, say, absolute locations, or rather than using GPS or something that would measure where
everyone is actually moving around town,
which is very sensitive data that's very privacy invasive and is kind of overkill for this
question.
Instead, we just want to know if two people come into close contact, they want to somehow
exchange some piece of information, some random number that they can use to identify that
singular encounter.
that they can use to identify that singular encounter.
So when two phones or two devices come into close contact, close proximity
within the CDC and World Health Organization guidelines,
they exchange a random number, a long random number
that just identifies that single contact
and has nothing to do with anything else in the world.
It does not have to do with their names or phone numbers or any other kind of identifier. And it doesn't have to do with any
future or prior encounter with other people. And the reason for exchanging these private
random numbers is so that if one of these two people from this close contact, if one of the
two people is later diagnosed with the coronavirus, then they can report in a private way to the other person that this event has occurred.
So they can share this private number and that will be an indicator to the other person in that contact event that I've now come into contact with somebody who has later been diagnosed with the coronavirus. And so then you can take the appropriate healthcare precautions, such as self-quarantining to see
if symptoms develop, etc. So on the technical side, you're proposing using low-power Bluetooth?
Yes, that's right. So the intent is to use any kind of short band radio system that only operates over small distances.
A popular example that is commonly available in a lot of different consumer electronics like smartphones is Bluetooth.
Another potential use would be something like NFC, but there's some challenges with NFC in the sense that it's not always available for
applications to use on all consumer devices. And also its range may be, in fact, too short.
There's other signals like Wi-Fi radios or even cellular connections, but they have very long
distances over which they communicate. And so the goal is to try to find a ubiquitous radio that's already in a lot of consumer devices
that operates at approximately the scale of the recommended guidance for what constitutes
close proximity.
Although the Bluetooth radios operate at a longer distance than the recommended guidance
is for recording close contacts.
And so there's a lot of ongoing research by both our team and many others around the world
to try to figure out how you can estimate
whether the Bluetooth signal strength is large enough
that it can constitute a close proximity,
say within two meters,
as per many of the usual healthcare guidances
and for a sufficiently long period of time.
Now, there are three main components here that we're dealing with. You've got your chirping
layer, your tracing layer, and your interaction with medical professionals. Can we go through
each of those one at a time and explain what's going on?
Sure. So at the first layer, when two devices come into close proximity, each device will send each other one a random number.
Actually, your device, I should say, is sending out. Your device is transmitting all the time because you don't know necessarily whether anybody else is within close
proximity to hear it. So your device will send this out all the time, but the number will change.
Every single time you send, it will be a different number so as to prevent anybody from doing any
kind of long-term tracking of your location. So it's not a persistent identifier for you across time and space. It's just a one-time number, this chirp. And if somebody
happens to be in close proximity to you, then they will record that information. And similarly,
their device would also be sending you a different chirp or random number that your device would be
recording. So that's the first stage when two people come into close contact.
And these numbers are being stored locally?
These numbers are purely being stored locally.
They're not transmitted anywhere.
And in fact, we're working to make sure
that that information is stored,
protected using encryption to be protected at rest
so that even your own device
does not even have those numbers until
the second stage occurs. So even you do not remember your own metadata until the second
stage occurs, which is if one of the two people involved in that particular contact is later
diagnosed with COVID-19, then they, together with their certified healthcare professionals,
there's a process by which they can upload information to a publicly accessible database.
It's a public database, so they're uploading information that is not personally sensitive to them.
It's not like their name or anything like that.
It's uploading information that would allow the other
person from that interaction to realize that they have come into contact with someone who is
diagnosed with COVID-19. So let's say I'm someone who's just been going about my business and have
not been diagnosed with having COVID-19. My device is gathering, it's collecting chirps, it's generating
chirps. And then so someone that I've crossed paths with has been verified as being infected.
If all of my information is being stored on my phone, how is that information that's being
stored locally, how's that going to interact with the larger database for me to be notified?
So in terms of you being notified,
let me describe sort of a few iterations
of how that system might work.
So I'll start with a simple example
that is more or less the flavor
of what we're going for.
And then I'll add some extra features
to get extra privacy
and integrity
protections. So the simplest version of the simplest way to understand how the system works
is all of these devices are just sending out these chirps, these completely random temporary numbers.
And if you happen to come into contact with somebody who later is diagnosed with COVID-19, they, together with their medical
professionals, could upload to a database the random number, the chirp that you had.
So they sent you a chirp in the past, which your device recorded just locally, purely on your own
device. And then when they later get diagnosed with COVID-19, they upload that chirp to the
database. And then you can just download a copy of that entire database of all people's chirps of the diagnosed patients,
and you can compare it against your own local database. So this would allow you just to do a
match by saying, okay, I just want to do an equality check of does anybody's chirps from
the uploaded database match with my local device?
So that's like a way to do this check.
But there are some potential concerns here.
For instance, the size of this data set is going to be very large.
So just from the standpoint of downloading this,
like everybody is chirping these random numbers all the time.
And so if they have to send out a lot of them, that could potentially be an issue.
Also, a potential actor, a bad actor,
might upload somebody else's chirps.
So somebody who you never came into close proximity with
might actually try to upload the chirps of someone
with whom you did come into close proximity,
which would generate panic and false alarms.
So the actual system that we have in order to resolve these issues
effectively uses a one-way function, like a cryptographic hash
that can be computed only in the forward direction, but not the backward direction.
And in order to, these chirps are actually generated as a cryptographic hash of some random number.
So the person that you interacted with actually has a number that they choose for that interaction
that is not the one that they send to you.
They actually only send to you a cryptographic hash of this number in their own phone.
And then if they later are diagnosed with COVID-19, they upload the pre-image to the chirp.
So they upload some, it's basically a proof
that they were the person who generated the chirp
in the first place,
that only the phone and the device
that actually generated the chirp has this information.
So that sort of addresses
some of these kinds of trolling attacks
or scaring attacks where people might try to
scare other people that they haven't actually come into contact with. And finally, we wanted
to make sure that our algorithm, our protocol, the system that we created, this PACT system,
provides people with full autonomy and choice for if they are diagnosed with COVID-19, that they
have full control and choice over what they choose to upload. So for
instance, maybe they have a particularly sensitive event that they're going to, something that they
don't want any information ever recorded or any information ever chirped. So the specification
calls for a snooze operation to be built into the application so that somebody who is walking around and decides that they want
to momentarily pause the system can do so. Additionally, even later, even after the fact,
if they have sent out some chirps and then there's a particular subset that they choose they don't
want to upload, they can choose not to upload some fraction of these chirps, these random numbers.
And that procedure by which they choose what to upload and what not to upload
also has privacy protections in the sense of nobody else will know your decision
as to how to protect your own healthcare information.
So how to protect the information about what you choose to disclose to others.
To be clear, there's no location data being saved along with any of this?
There is no location data being saved within the application itself.
So it means that people with whom you have never come into contact will have no idea what these random numbers correspond to.
Now, the people with whom you have come into contact could potentially remember that I received a chirp,
say you and I came into close contact.
So I could remember I came into contact with you
at this date, at this time.
Now, the application will not store that.
The application is not going to remember any of that.
It purposely does not have an interest in keeping this kind of metadata around. But potentially, a bad actor
who's trying to use the system in order to track people's movements, they might try to do that. So
a bad actor might try to remember the time and location and person associated with the particular
chirp, which is why we have several mechanisms to try to
limit the ability for any of the information in the system, even by potentially malicious actors,
to be used to re-identify any of your tracking movements. So if a bunch of people decide to try
to get together and try to reconstruct your movement history, they will not be able to do so because, well, so first for people who do
not have COVID-19, who never upload anything to the database, they will not be able to track your
movements because you're just sending out totally random numbers at every single time. All of your
chirps have nothing to do with any other chirp. If you are a person who is diagnosed with COVID-19
and upload something to the data set,
the situation is a little bit more complicated because the mere fact that you're informing
other people that you have been contracted with the coronavirus could potentially,
just that one fact, independent of anything else about how the system works,
tells them some information about who it is, right?
So they know it's someone
that somebody with whom they've come into close contact has acquired the coronavirus.
So for instance, if you and I have been, you know, self-quarantining for 14 days or more,
and then say this particular interview was a face-to-face interview, like this is like our
only time that we've ever come into close contact with anybody in the last several weeks. And then later on, you get an alert from any kind of system,
automated, manual, whatever, any kind of system that says that you have come into close proximity
with someone who has the coronavirus, then you would know that it was me, not because of anything
about the details of how the system works or doesn't work, but because the actual system itself, the actual concept of providing
this idea of contact tracing itself could potentially reveal information, especially
if you have not come into contact with many people. You can infer who it might have been.
And that's an important message I want to get across, and it influences the decisions on how and when to deploy any kind of contact tracing system,
and the scope for which we may want any of this kind of technology to be used.
So just like any other kind of technology, it's important to design the system with an eye for
protecting against mission creep, to make sure that it's actually
targeted towards solving an important current epidemic. And then afterward, the system should
gracefully go away, which in some sense, the system we've built has this nice graceful
degradation property that the system that I should say we've built and many other academics have
proposed very similar
systems to ours. They have a very nice graceful degradation property in the sense that no
information is provided about anything related to any kind of contacts or interactions unless or
until somebody uploads to this data set that the chirps associated with them having contracted
with the information that they've sent for people who've been later contracted the coronavirus,
which means at a future world, hopefully soon in the future world, in which the coronavirus epidemic is behind us, in which the disease has gone away,
then in that world, there would be no information ever uploaded to the database ever again. And then the system would naturally no longer be revealing anything about anybody's movements.
So there is information that is revealed to the contacts of people who've contracted coronavirus,
namely the mere fact that you have come into contact with someone who has the coronavirus.
come into contact with someone who has the coronavirus.
And that information could potentially be something that will somewhat reduce the healthcare privacy
of people who have the coronavirus,
but only to their immediate contacts,
only to the people that they've come
into close proximity with and not to the general public,
not to any kind of remote system
that wants to try to like learn this in aggregate, in large scale for the general public, not to any kind of remote system that wants to try to learn this in aggregate,
in large scale for the entire population, just to the contacts with whom you've come into
close proximity. Help me understand, because it seems to me like in terms of fighting a disease
like this, the folks who are trying to track its progress, how it makes its
way through populations, location data is going to be very helpful for them. So is a system like
this relying on sort of a secondary location reporting system where presumably if I were to
find out that I'd been in contact with someone, my next step would be to speak with my doctor,
and then perhaps my doctor would be the one to report to the powers that be that, hey,
we have a case here, and here's where it is. Yes, that's right. So two sort of comments here.
First of all, within our system, the actual event of uploading these chirps to this public registry is something that you have to do in concert with a certified health authority.
And in the PACT specification, we go into some level of detail as to how to ensure that only information that is certified by a public health authority ever gets uploaded to this dataset, which is mostly to try to restrict the ability
of the system to be used for these kinds of trolling
or scaring attacks where people upload spurious information.
But I think your question also gets
to an even more important point,
which is that this system is meant not to be
some kind of substitute or replacement
or technology replacing anything
about the existing healthcare system.
It's only meant to solve a small part of the response
to the coronavirus epidemic,
which is to help certified health professionals
to do the impressive work that they're already doing
in terms of helping individual patients to treat them
and to help get a better understanding of the spread of the disease. So this is not meant to replace any aspect of what's currently happening. It's
meant to provide more information to you as a person who's maybe concerned about whether you've
come into contact with someone with the coronavirus and to your healthcare professionals, to your
personal physician, to make more professionals, to your personal physician,
to make more informed decisions about your own healthcare.
So it needs to work in conjunction with the way that everything else, the rest of the both the personal healthcare responses, like your personal physician and, you know, a public
health response at a nation level or state level, et cetera, what would happen.
Now, we've seen quite a bit of attention in the media lately that Apple and Google announced a
collaborative effort of doing something that to me sounds similar to what your efforts are here.
Is that indeed the case? And how do you see these varying systems that have been proposed sort of meeting together to have, you know, I guess the ideal situation would be to have one standard system that can interoperate?
news about Apple and Google's work, their joint endeavor in the space. It's very encouraging news and their specification, at least at the level of the information that is currently publicly
available, their specification is largely very similar to our approach for the PACT team and
to the approaches of many other researchers throughout the world who have proposed very similar systems.
There are some very small, slight differences
between our system and some of the other researchers' systems
and the Apple-Google thing.
They are largely trying to solve the same problem
of automated contact tracing
and have largely the same privacy and integrity guarantees,
which is very encouraging.
So when we first started this project back three or four weeks ago, the same privacy and integrity guarantees, which is very encouraging.
So when we first started this project
back three or four weeks ago,
our intent at the time was, you know,
let's try to build, like,
let's try to design the strongest possible
privacy protections that we can
into a system that's also incredibly simple to understand,
so that it's easy for the world to understand
the kind of privacy protections it provides, and very easy to build, because we want to
build on such short timeframe.
So, you know, that was our initial focus.
And so we were focused on sort of ease of development and deployment and understanding.
And now with this news that Apple and Google are working to build the same thing, I agree with you that the goal should be not to, you know, we don't want what the last thing we want to do is to confuse or fracture the base of people who like, you know, might outcomes that we want. If we walk past each other
and I'm chirping using protocol A, but you're chirping using protocol B, and we don't understand
each other, or we're looking at different data in different databases, that would be a problem.
So at this point, we're looking to see what is it that we can do to assist the ongoing efforts by
others, technology companies and other researchers and governments in other
countries who are all looking at sort of building and deploying these applications. So we're looking
to see how we can provide even stronger systems that provide even greater security and privacy
and integrity goals. So maybe not for the initial version, because the initial versions, we wanted
the protocol to be so simple that it was easy to build in short order to address the current epidemic. But sort of now we're thinking,
sort of looking ahead, what is it that we can do to provide even stronger assurances?
Furthermore, we're looking at, can we build a prototype application of the same kind of thing
as the Apples and Googles and other countries' governments are building, so that we can
understand what are the potential pitfalls that occur when the rubber meets the road, when you actually implement this thing,
and how does the specification hold up when it's running on a complicated device like a smartphone,
which has other sensors, which has other programs running at the same time. So we're trying to make
sure we can try to understand that as soon as possible so as to provide guidance to the teams that are building this out for production to provide, you know,
as soon as possible, you know, information to them about the kinds of concerns that they should think
about when implementing. Because if, you know, one of the lessons we've learned over and over again
from in the security and privacy community is you can have an idea that looks all well and good on
paper, but until and unless you implement it, you don't know necessarily where there's some subtle
issue that could go wrong, some kind of side channel attack, some kind of potential implementation
flaw to be wary of, etc. We think that our best role going forward is to try to see proactively
what are the kinds of issues that the other folks might run into
as they build and deploy and maintain these kinds of systems
so that we can provide actionable guidance to them.
Do you have any sense for what kind of timeline you're on in terms of testing this and making it available?
So I think that there are already research prototypes of this software available. Both our team is actively working on building a prototype right now, and I think there are other research teams around the world that already have some open source software deployments.
It's sort of our view that having one common spec for the world is a useful goal to move toward for the reasons that you were asking previously for interoperability reasons. But there can be value in software diversity of different implementations of this spec, not necessarily all deployed in practice, but all ready to be deployed in practice.
deployed in practice, but already to be deployed in practice. In case there's any issue identified with one, it's good for all of us to be trying to build independent implementations so that we can
understand better where this could go wrong because of the tight timeline constraints to get
this idea out there. You know, it's sort of better to maybe, given the state of the coronavirus
epidemic, it can be better for us as a society to be sort of, you know,
quote unquote, wasting time in parallel by having many people do work in parallel in order to save
time in sequence to get the ideas out there as soon as possible. That's our perspective.
I think that based on my current understanding of how other technology companies and academic
research teams are moving forward, there are active field trials like small scale experiments of using this technology in use today and starting to be spun up around the world.
And my understanding as to the tech technology companies like Apple and Google is that maybe they may have something built into either their operating systems or an application that they have built for download on their app stores,
maybe within a timeframe of, say, a month or so, to build and test and vet that.
That's Mayank Varia from Boston University.
The research is titled The PACT Protocol Specification.
We'll have a link in the show notes.
And now a message from Black Cloak.
Did you know the easiest way for cyber criminals to bypass your company's defenses
is by targeting your executives and their families
at home. Black Cloak's award-winning digital executive protection platform secures their
personal devices, home networks, and connected lives. Because when executives are compromised
at home, your company is at risk. In fact, over one-third of new members discover they've already
been breached. Protect your executives and their families 24-7, 365, with Black Cloak.
Learn more at blackcloak.io.
The CyberWire Research Saturday is proudly produced in Maryland
out of the startup studios of DataTribe,
where they're co-building the next generation of cybersecurity teams and technologies.
Our amazing CyberWire team is Elliot Peltzman, a tribe, where they're co-building the next generation of cybersecurity teams and technologies.
Our amazing CyberWire team is Elliot Peltzman, Puru Prakash, Stefan Vaziri, Kelsey Bond,
Tim Nodar, Joe Kerrigan, Carol Terrio, Ben Yellen, Nick Valecki, Gina Johnson, Bennett Moe,
Chris Russell, John Petrick, Jennifer Iben, Rick Howard, Peter Kilpie, and I'm Dave Bittner. Thanks for listening.