CyberWire Daily - Contact tracing as COVID-19 aid. [Research Saturday]

Episode Date: April 25, 2020

Successful containment of the Coronavirus pandemic rests on the ability to quickly and reliably identify those who have been in close proximity to a contagious individual. Mayank Varia from Boston Uni...versity describes how his team suggests an approach based on using short-range communication mechanisms, like Bluetooth, that are available in all modern cell phones. The research can be found here: Anonymous Collocation Discovery: Harnessing Privacy to Tame the Coronavirus Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 You're listening to the Cyber Wire Network, powered by N2K. of you, I was concerned about my data being sold by data brokers. So I decided to try Delete.me. I have to say, Delete.me is a game changer. Within days of signing up, they started removing my personal information from hundreds of data brokers. I finally have peace of mind knowing my data privacy is protected. Delete.me's team does all the work for you with detailed reports so you know exactly what's been done. Take control of your data and keep your private life Thank you. JoinDeleteMe.com slash N2K and use promo code N2K at checkout. The only way to get 20% off is to go to JoinDeleteMe.com slash N2K and enter code N2K at checkout. That's JoinDeleteMe.com slash N2K, code N2K. Hello, everyone, and welcome to the CyberWire's Research Saturday.
Starting point is 00:01:36 I'm Dave Bittner, and this is our weekly conversation with researchers and analysts tracking down threats and vulnerabilities and solving some of the hard problems of protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us. And now, a message from our sponsor, Zscaler, the leader in cloud security. Enterprises have spent billions of dollars on firewalls and VPNs, yet breaches continue to rise by an 18% year-over-year increase in ransomware attacks and a $75 million record payout in 2024. These traditional security tools expand your attack surface with public-facing IPs
Starting point is 00:02:20 that are exploited by bad actors more easily than ever with AI tools. It's time to rethink your security. Zscaler Zero Trust plus AI stops attackers by hiding your attack surface, making apps and IPs invisible, eliminating lateral movement, connecting users only to specific apps, not the entire network, continuously verifying every request based on identity and context. Simplifying security management with AI-powered automation. And detecting threats using AI to analyze over 500 billion daily transactions. Hackers can't attack what they can't see. Protect your organization with Zscaler Zero Trust and AI.
Starting point is 00:03:04 Learn more at zscaler.com slash security. We started with a group of three of us at Boston University, Professors Ari Trachtenberg, Ron Canetti, and myself. That's Mayank Varia. He's a research associate professor of computer science at Boston University. The research we're discussing today is titled The PACT Protocol Specification. Soon after the social distancing protocols were put into place,
Starting point is 00:03:41 after all of the stay-at-home orders were put into place, we started thinking about what might happen in a future world where those orders eventually start to become lifted, subject, of course, to public health guidance, and people start moving about in the world again, and what might be done as a society from a health perspective to try to reduce the risk of the spread of the coronavirus in this future world. And based on our reading and talking to people in the healthcare space, epidemiologists, et cetera, it sounds like from a healthcare perspective, there's sort of two categories of approaches to dealing with the spread of a disease. One approach is a general quarantine, which is the state of the world right now. Everybody stays at home. And the other is a
Starting point is 00:04:33 targeted quarantine, where if you identify people who are susceptible, who may have been diagnosed or likely to have coronavirus, you can ask them to self-quarantine rather than the entire population. And one of the challenges with the coronavirus specifically is that the disease is something that you could potentially have and be able to transmit before you even realize, in the sense before you're symptomatic. And so the question is, how can you find a method to learn as early as possible whether you might be at higher risk of having acquired the coronavirus? diagnosed with coronavirus, then they, together with medical health professionals, will try to identify all of the people with whom they have come into close contact. And that is a very important process, a great thing that the medical community is doing. But just like everything else in the healthcare space nowadays, they're becoming stressed. They're hitting up to capacity with a lot of the people who are reporting being diagnosed
Starting point is 00:05:47 with this disease. So the question that we had that started off our work was, if there is a world where we want some kind of automated system that can supplement the existing manual contact tracing efforts, what could an automated computerized system look like? And how could we use our own specific speciality of security and privacy research to ensure that if there is an automated contact tracing system, it is as privacy preserving as possible and also has as strong of integrity as possible against the threats of people uploading spurious information. So how can we ensure privacy and authenticity of any information in any kind of automated contact tracing system? So PEC stands for Private Automated Contact Tracing.
Starting point is 00:06:43 Let's dig into the protocol together. Again, let's start at the high level. What are you all proposing here? So at a high level, what we're proposing is a system that would allow people through their cell phones or any other kind of electronic device that they may carry with them, like a wearable, et cetera, to identify in a privacy-preserving manner when they come into close contact with somebody else and their smartphone or wearable device or whatnot. So to use, rather than
Starting point is 00:07:17 tracking, say, absolute locations, or rather than using GPS or something that would measure where everyone is actually moving around town, which is very sensitive data that's very privacy invasive and is kind of overkill for this question. Instead, we just want to know if two people come into close contact, they want to somehow exchange some piece of information, some random number that they can use to identify that singular encounter. that they can use to identify that singular encounter.
Starting point is 00:07:47 So when two phones or two devices come into close contact, close proximity within the CDC and World Health Organization guidelines, they exchange a random number, a long random number that just identifies that single contact and has nothing to do with anything else in the world. It does not have to do with their names or phone numbers or any other kind of identifier. And it doesn't have to do with any future or prior encounter with other people. And the reason for exchanging these private random numbers is so that if one of these two people from this close contact, if one of the
Starting point is 00:08:21 two people is later diagnosed with the coronavirus, then they can report in a private way to the other person that this event has occurred. So they can share this private number and that will be an indicator to the other person in that contact event that I've now come into contact with somebody who has later been diagnosed with the coronavirus. And so then you can take the appropriate healthcare precautions, such as self-quarantining to see if symptoms develop, etc. So on the technical side, you're proposing using low-power Bluetooth? Yes, that's right. So the intent is to use any kind of short band radio system that only operates over small distances. A popular example that is commonly available in a lot of different consumer electronics like smartphones is Bluetooth. Another potential use would be something like NFC, but there's some challenges with NFC in the sense that it's not always available for applications to use on all consumer devices. And also its range may be, in fact, too short. There's other signals like Wi-Fi radios or even cellular connections, but they have very long
Starting point is 00:09:36 distances over which they communicate. And so the goal is to try to find a ubiquitous radio that's already in a lot of consumer devices that operates at approximately the scale of the recommended guidance for what constitutes close proximity. Although the Bluetooth radios operate at a longer distance than the recommended guidance is for recording close contacts. And so there's a lot of ongoing research by both our team and many others around the world to try to figure out how you can estimate whether the Bluetooth signal strength is large enough
Starting point is 00:10:12 that it can constitute a close proximity, say within two meters, as per many of the usual healthcare guidances and for a sufficiently long period of time. Now, there are three main components here that we're dealing with. You've got your chirping layer, your tracing layer, and your interaction with medical professionals. Can we go through each of those one at a time and explain what's going on? Sure. So at the first layer, when two devices come into close proximity, each device will send each other one a random number.
Starting point is 00:11:05 Actually, your device, I should say, is sending out. Your device is transmitting all the time because you don't know necessarily whether anybody else is within close proximity to hear it. So your device will send this out all the time, but the number will change. Every single time you send, it will be a different number so as to prevent anybody from doing any kind of long-term tracking of your location. So it's not a persistent identifier for you across time and space. It's just a one-time number, this chirp. And if somebody happens to be in close proximity to you, then they will record that information. And similarly, their device would also be sending you a different chirp or random number that your device would be recording. So that's the first stage when two people come into close contact. And these numbers are being stored locally?
Starting point is 00:11:50 These numbers are purely being stored locally. They're not transmitted anywhere. And in fact, we're working to make sure that that information is stored, protected using encryption to be protected at rest so that even your own device does not even have those numbers until the second stage occurs. So even you do not remember your own metadata until the second
Starting point is 00:12:15 stage occurs, which is if one of the two people involved in that particular contact is later diagnosed with COVID-19, then they, together with their certified healthcare professionals, there's a process by which they can upload information to a publicly accessible database. It's a public database, so they're uploading information that is not personally sensitive to them. It's not like their name or anything like that. It's uploading information that would allow the other person from that interaction to realize that they have come into contact with someone who is diagnosed with COVID-19. So let's say I'm someone who's just been going about my business and have
Starting point is 00:12:59 not been diagnosed with having COVID-19. My device is gathering, it's collecting chirps, it's generating chirps. And then so someone that I've crossed paths with has been verified as being infected. If all of my information is being stored on my phone, how is that information that's being stored locally, how's that going to interact with the larger database for me to be notified? So in terms of you being notified, let me describe sort of a few iterations of how that system might work. So I'll start with a simple example
Starting point is 00:13:37 that is more or less the flavor of what we're going for. And then I'll add some extra features to get extra privacy and integrity protections. So the simplest version of the simplest way to understand how the system works is all of these devices are just sending out these chirps, these completely random temporary numbers. And if you happen to come into contact with somebody who later is diagnosed with COVID-19, they, together with their medical
Starting point is 00:14:06 professionals, could upload to a database the random number, the chirp that you had. So they sent you a chirp in the past, which your device recorded just locally, purely on your own device. And then when they later get diagnosed with COVID-19, they upload that chirp to the database. And then you can just download a copy of that entire database of all people's chirps of the diagnosed patients, and you can compare it against your own local database. So this would allow you just to do a match by saying, okay, I just want to do an equality check of does anybody's chirps from the uploaded database match with my local device? So that's like a way to do this check.
Starting point is 00:14:48 But there are some potential concerns here. For instance, the size of this data set is going to be very large. So just from the standpoint of downloading this, like everybody is chirping these random numbers all the time. And so if they have to send out a lot of them, that could potentially be an issue. Also, a potential actor, a bad actor, might upload somebody else's chirps. So somebody who you never came into close proximity with
Starting point is 00:15:15 might actually try to upload the chirps of someone with whom you did come into close proximity, which would generate panic and false alarms. So the actual system that we have in order to resolve these issues effectively uses a one-way function, like a cryptographic hash that can be computed only in the forward direction, but not the backward direction. And in order to, these chirps are actually generated as a cryptographic hash of some random number. So the person that you interacted with actually has a number that they choose for that interaction
Starting point is 00:15:51 that is not the one that they send to you. They actually only send to you a cryptographic hash of this number in their own phone. And then if they later are diagnosed with COVID-19, they upload the pre-image to the chirp. So they upload some, it's basically a proof that they were the person who generated the chirp in the first place, that only the phone and the device that actually generated the chirp has this information.
Starting point is 00:16:19 So that sort of addresses some of these kinds of trolling attacks or scaring attacks where people might try to scare other people that they haven't actually come into contact with. And finally, we wanted to make sure that our algorithm, our protocol, the system that we created, this PACT system, provides people with full autonomy and choice for if they are diagnosed with COVID-19, that they have full control and choice over what they choose to upload. So for instance, maybe they have a particularly sensitive event that they're going to, something that they
Starting point is 00:16:51 don't want any information ever recorded or any information ever chirped. So the specification calls for a snooze operation to be built into the application so that somebody who is walking around and decides that they want to momentarily pause the system can do so. Additionally, even later, even after the fact, if they have sent out some chirps and then there's a particular subset that they choose they don't want to upload, they can choose not to upload some fraction of these chirps, these random numbers. And that procedure by which they choose what to upload and what not to upload also has privacy protections in the sense of nobody else will know your decision as to how to protect your own healthcare information.
Starting point is 00:17:39 So how to protect the information about what you choose to disclose to others. To be clear, there's no location data being saved along with any of this? There is no location data being saved within the application itself. So it means that people with whom you have never come into contact will have no idea what these random numbers correspond to. Now, the people with whom you have come into contact could potentially remember that I received a chirp, say you and I came into close contact. So I could remember I came into contact with you at this date, at this time.
Starting point is 00:18:19 Now, the application will not store that. The application is not going to remember any of that. It purposely does not have an interest in keeping this kind of metadata around. But potentially, a bad actor who's trying to use the system in order to track people's movements, they might try to do that. So a bad actor might try to remember the time and location and person associated with the particular chirp, which is why we have several mechanisms to try to limit the ability for any of the information in the system, even by potentially malicious actors, to be used to re-identify any of your tracking movements. So if a bunch of people decide to try
Starting point is 00:18:58 to get together and try to reconstruct your movement history, they will not be able to do so because, well, so first for people who do not have COVID-19, who never upload anything to the database, they will not be able to track your movements because you're just sending out totally random numbers at every single time. All of your chirps have nothing to do with any other chirp. If you are a person who is diagnosed with COVID-19 and upload something to the data set, the situation is a little bit more complicated because the mere fact that you're informing other people that you have been contracted with the coronavirus could potentially, just that one fact, independent of anything else about how the system works,
Starting point is 00:19:41 tells them some information about who it is, right? So they know it's someone that somebody with whom they've come into close contact has acquired the coronavirus. So for instance, if you and I have been, you know, self-quarantining for 14 days or more, and then say this particular interview was a face-to-face interview, like this is like our only time that we've ever come into close contact with anybody in the last several weeks. And then later on, you get an alert from any kind of system, automated, manual, whatever, any kind of system that says that you have come into close proximity with someone who has the coronavirus, then you would know that it was me, not because of anything
Starting point is 00:20:22 about the details of how the system works or doesn't work, but because the actual system itself, the actual concept of providing this idea of contact tracing itself could potentially reveal information, especially if you have not come into contact with many people. You can infer who it might have been. And that's an important message I want to get across, and it influences the decisions on how and when to deploy any kind of contact tracing system, and the scope for which we may want any of this kind of technology to be used. So just like any other kind of technology, it's important to design the system with an eye for protecting against mission creep, to make sure that it's actually targeted towards solving an important current epidemic. And then afterward, the system should
Starting point is 00:21:13 gracefully go away, which in some sense, the system we've built has this nice graceful degradation property that the system that I should say we've built and many other academics have proposed very similar systems to ours. They have a very nice graceful degradation property in the sense that no information is provided about anything related to any kind of contacts or interactions unless or until somebody uploads to this data set that the chirps associated with them having contracted with the information that they've sent for people who've been later contracted the coronavirus, which means at a future world, hopefully soon in the future world, in which the coronavirus epidemic is behind us, in which the disease has gone away,
Starting point is 00:22:00 then in that world, there would be no information ever uploaded to the database ever again. And then the system would naturally no longer be revealing anything about anybody's movements. So there is information that is revealed to the contacts of people who've contracted coronavirus, namely the mere fact that you have come into contact with someone who has the coronavirus. come into contact with someone who has the coronavirus. And that information could potentially be something that will somewhat reduce the healthcare privacy of people who have the coronavirus, but only to their immediate contacts, only to the people that they've come
Starting point is 00:22:37 into close proximity with and not to the general public, not to any kind of remote system that wants to try to like learn this in aggregate, in large scale for the general public, not to any kind of remote system that wants to try to learn this in aggregate, in large scale for the entire population, just to the contacts with whom you've come into close proximity. Help me understand, because it seems to me like in terms of fighting a disease like this, the folks who are trying to track its progress, how it makes its way through populations, location data is going to be very helpful for them. So is a system like this relying on sort of a secondary location reporting system where presumably if I were to
Starting point is 00:23:20 find out that I'd been in contact with someone, my next step would be to speak with my doctor, and then perhaps my doctor would be the one to report to the powers that be that, hey, we have a case here, and here's where it is. Yes, that's right. So two sort of comments here. First of all, within our system, the actual event of uploading these chirps to this public registry is something that you have to do in concert with a certified health authority. And in the PACT specification, we go into some level of detail as to how to ensure that only information that is certified by a public health authority ever gets uploaded to this dataset, which is mostly to try to restrict the ability of the system to be used for these kinds of trolling or scaring attacks where people upload spurious information. But I think your question also gets
Starting point is 00:24:16 to an even more important point, which is that this system is meant not to be some kind of substitute or replacement or technology replacing anything about the existing healthcare system. It's only meant to solve a small part of the response to the coronavirus epidemic, which is to help certified health professionals
Starting point is 00:24:36 to do the impressive work that they're already doing in terms of helping individual patients to treat them and to help get a better understanding of the spread of the disease. So this is not meant to replace any aspect of what's currently happening. It's meant to provide more information to you as a person who's maybe concerned about whether you've come into contact with someone with the coronavirus and to your healthcare professionals, to your personal physician, to make more professionals, to your personal physician, to make more informed decisions about your own healthcare. So it needs to work in conjunction with the way that everything else, the rest of the both the personal healthcare responses, like your personal physician and, you know, a public
Starting point is 00:25:20 health response at a nation level or state level, et cetera, what would happen. Now, we've seen quite a bit of attention in the media lately that Apple and Google announced a collaborative effort of doing something that to me sounds similar to what your efforts are here. Is that indeed the case? And how do you see these varying systems that have been proposed sort of meeting together to have, you know, I guess the ideal situation would be to have one standard system that can interoperate? news about Apple and Google's work, their joint endeavor in the space. It's very encouraging news and their specification, at least at the level of the information that is currently publicly available, their specification is largely very similar to our approach for the PACT team and to the approaches of many other researchers throughout the world who have proposed very similar systems. There are some very small, slight differences
Starting point is 00:26:28 between our system and some of the other researchers' systems and the Apple-Google thing. They are largely trying to solve the same problem of automated contact tracing and have largely the same privacy and integrity guarantees, which is very encouraging. So when we first started this project back three or four weeks ago, the same privacy and integrity guarantees, which is very encouraging. So when we first started this project
Starting point is 00:26:47 back three or four weeks ago, our intent at the time was, you know, let's try to build, like, let's try to design the strongest possible privacy protections that we can into a system that's also incredibly simple to understand, so that it's easy for the world to understand the kind of privacy protections it provides, and very easy to build, because we want to
Starting point is 00:27:10 build on such short timeframe. So, you know, that was our initial focus. And so we were focused on sort of ease of development and deployment and understanding. And now with this news that Apple and Google are working to build the same thing, I agree with you that the goal should be not to, you know, we don't want what the last thing we want to do is to confuse or fracture the base of people who like, you know, might outcomes that we want. If we walk past each other and I'm chirping using protocol A, but you're chirping using protocol B, and we don't understand each other, or we're looking at different data in different databases, that would be a problem. So at this point, we're looking to see what is it that we can do to assist the ongoing efforts by others, technology companies and other researchers and governments in other
Starting point is 00:28:05 countries who are all looking at sort of building and deploying these applications. So we're looking to see how we can provide even stronger systems that provide even greater security and privacy and integrity goals. So maybe not for the initial version, because the initial versions, we wanted the protocol to be so simple that it was easy to build in short order to address the current epidemic. But sort of now we're thinking, sort of looking ahead, what is it that we can do to provide even stronger assurances? Furthermore, we're looking at, can we build a prototype application of the same kind of thing as the Apples and Googles and other countries' governments are building, so that we can understand what are the potential pitfalls that occur when the rubber meets the road, when you actually implement this thing,
Starting point is 00:28:49 and how does the specification hold up when it's running on a complicated device like a smartphone, which has other sensors, which has other programs running at the same time. So we're trying to make sure we can try to understand that as soon as possible so as to provide guidance to the teams that are building this out for production to provide, you know, as soon as possible, you know, information to them about the kinds of concerns that they should think about when implementing. Because if, you know, one of the lessons we've learned over and over again from in the security and privacy community is you can have an idea that looks all well and good on paper, but until and unless you implement it, you don't know necessarily where there's some subtle issue that could go wrong, some kind of side channel attack, some kind of potential implementation
Starting point is 00:29:35 flaw to be wary of, etc. We think that our best role going forward is to try to see proactively what are the kinds of issues that the other folks might run into as they build and deploy and maintain these kinds of systems so that we can provide actionable guidance to them. Do you have any sense for what kind of timeline you're on in terms of testing this and making it available? So I think that there are already research prototypes of this software available. Both our team is actively working on building a prototype right now, and I think there are other research teams around the world that already have some open source software deployments. It's sort of our view that having one common spec for the world is a useful goal to move toward for the reasons that you were asking previously for interoperability reasons. But there can be value in software diversity of different implementations of this spec, not necessarily all deployed in practice, but all ready to be deployed in practice. deployed in practice, but already to be deployed in practice. In case there's any issue identified with one, it's good for all of us to be trying to build independent implementations so that we can
Starting point is 00:30:50 understand better where this could go wrong because of the tight timeline constraints to get this idea out there. You know, it's sort of better to maybe, given the state of the coronavirus epidemic, it can be better for us as a society to be sort of, you know, quote unquote, wasting time in parallel by having many people do work in parallel in order to save time in sequence to get the ideas out there as soon as possible. That's our perspective. I think that based on my current understanding of how other technology companies and academic research teams are moving forward, there are active field trials like small scale experiments of using this technology in use today and starting to be spun up around the world. And my understanding as to the tech technology companies like Apple and Google is that maybe they may have something built into either their operating systems or an application that they have built for download on their app stores,
Starting point is 00:31:48 maybe within a timeframe of, say, a month or so, to build and test and vet that. That's Mayank Varia from Boston University. The research is titled The PACT Protocol Specification. We'll have a link in the show notes. And now a message from Black Cloak. Did you know the easiest way for cyber criminals to bypass your company's defenses is by targeting your executives and their families at home. Black Cloak's award-winning digital executive protection platform secures their
Starting point is 00:32:31 personal devices, home networks, and connected lives. Because when executives are compromised at home, your company is at risk. In fact, over one-third of new members discover they've already been breached. Protect your executives and their families 24-7, 365, with Black Cloak. Learn more at blackcloak.io. The CyberWire Research Saturday is proudly produced in Maryland out of the startup studios of DataTribe, where they're co-building the next generation of cybersecurity teams and technologies. Our amazing CyberWire team is Elliot Peltzman, a tribe, where they're co-building the next generation of cybersecurity teams and technologies.
Starting point is 00:33:10 Our amazing CyberWire team is Elliot Peltzman, Puru Prakash, Stefan Vaziri, Kelsey Bond, Tim Nodar, Joe Kerrigan, Carol Terrio, Ben Yellen, Nick Valecki, Gina Johnson, Bennett Moe, Chris Russell, John Petrick, Jennifer Iben, Rick Howard, Peter Kilpie, and I'm Dave Bittner. Thanks for listening.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.