ACM ByteCast - Kristian Lum - Episode 4
Episode Date: August 6, 2020In this episode of ACM ByteCast, Rashmi Mohan welcomes Kristian Lum to the podcast. Lum is part of the research faculty at the University of Pennsylvania's CIS Department. Previously, she was Lead Sta...tistician at the Human Rights Data Analysis Group (HRDAG), where she led the project on criminal justice in the United States. She's widely known for her work on algorithmic fairness and predictive policing and is a key organizer of the ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT). Lum discusses her transition from math and statistics into computer science, and how her lab work in bioinformatics expanded her interest into social issues. They touch on the sensitive nature of data and privacy and gray areas in criminal justice data collection. Lum mentions some applications of her work, advising the NYC Mayor's Office of Criminal Justice and partnering with the ACLU. She also describes a very timely project accounting for the time lag between COVID-19 infection and death. Finally, she traces her winding, fascinating career path from academic to industry and back.
 Transcript
 Discussion  (0)
    
                                         This is ACM ByteCast, a podcast series from the Association for Computing Machinery,
                                         
                                         the world's largest educational and scientific computing society.
                                         
                                         We talk to researchers, practitioners, and innovators
                                         
                                         who are at the intersection of computing research and practice.
                                         
                                         They share their experiences, the lessons they've learned,
                                         
                                         and their own visions for the future of computing.
                                         
                                         I am your host, Rashmi Mohan.
                                         
                                         The use of technology for social good is something we all dream of.
                                         
    
                                         Our guest today has lived that dream for many years.
                                         
                                         Christiane Lum has led the Human Rights Data Analysis Group as lead statistician, focusing
                                         
                                         her efforts on the uses of machine learning within the criminal
                                         
                                         justice system.
                                         
                                         Christiane, welcome to ACM ByteCast.
                                         
                                         Thanks so much for having me.
                                         
                                         I'd like to lead with a question that I ask all my guests.
                                         
                                         Please introduce yourself and talk about what you currently do and give us some insight
                                         
    
                                         into what drew you into the field of computer science.
                                         
                                         Yeah, so actually, you're right that I have been the lead statistician
                                         
                                         for the Human Rights Data Analysis Group for quite some time. But as of about a month ago,
                                         
                                         I joined the research faculty at the University of Pennsylvania in their computer and information
                                         
                                         science department. So that's kind of an exciting update to my affiliation. So what got me into
                                         
                                         computer science? You know, I started out in math and statistics, both, and there really is quite a
                                         
                                         bit of overlap between statistics and machine learning. And so I don't really even necessarily
                                         
                                         see there being these really stark disciplinary boundaries. This sort of just seemed like a
                                         
    
                                         natural transition based on the types of things I was working on and the sorts of affiliations
                                         
                                         that other people had in computer science relative to statistics and machine learning, this seemed like a good fit.
                                         
                                         But operationally, I still definitely take a lot of the viewpoints from my training as a statistician and use those in my work, even in computer science.
                                         
                                         Great. And the interest in computing in itself in order to sort of, you know, get into math, science and statistics.
                                         
                                         How did that come about? Is that something that was there from your school days? What was your first introduction to
                                         
                                         it? Oh, yeah. So, you know, I think, like a lot of people who start out on sort of a math stats path,
                                         
                                         I started out, you know, just in school doing the sorts of, you know, problem sets and things that
                                         
                                         people do in college and high school or whatever. And there wasn't a whole lot of computing. But then in college, as I realized how much I enjoyed actually applying the things I was
                                         
    
                                         learning in my statistics courses to real data, it became fairly clear that computing was an
                                         
                                         important aspect of being able to actually learn things from data, except in fairly toy type of
                                         
                                         problems. And so, yeah, I think that's where that came from, just sort of marrying the interest
                                         
                                         in statistics with actually wanting to learn something from data. So the component of applied
                                         
                                         statistics is really what got me headed in this area. Got it. Yeah. And, you know, I mean, I think
                                         
                                         it's very interesting that you say applied statistics because using data in the criminal
                                         
                                         justice system is a very unique application for somebody who comes from even computing or math and
                                         
                                         stats.
                                         
    
                                         How did you chance upon this field? What is it that sparked the interest for you?
                                         
                                         Yeah, you know, like a lot of my career, it's been a little bit of a winding path. And I think of this as a little bit serendipitous. So back in, I think, so 2012, 2014, I was working in this lab
                                         
                                         at Virginia Tech in the Virginia Bioinformatics Institute. This is going to seem like a little bit
                                         
                                         of an aside, but I promise I'm coming back to criminal justice here. And I was, as it turns
                                         
                                         out to be fairly relevant these days, working in this lab where we did micro simulations.
                                         
                                         So simulating things like epidemic outbreaks. And my job there was I was really focused on how you
                                         
                                         build a realistic population of agents that can pass the disease around to each other in
                                         
                                         ways that are realistic. And as I was working there, I started thinking about how much I just
                                         
    
                                         how my interest expanded beyond those sorts of applications and other sorts of application areas
                                         
                                         we were working on into the sort of like social problems. So how can we use these sorts of methods
                                         
                                         to understand social problems? And so one of the things I was thinking about was applying the methods that we were using
                                         
                                         to model incarceration as a contagious disease.
                                         
                                         And so that was really my sort of entree into this area was it was really a fairly direct
                                         
                                         move in the sense that I was applying the same sorts of methods that we were using to
                                         
                                         simulate things like infectious disease, the spread of infectious disease through a population, to think about what sorts of social
                                         
                                         influence can cause close associates of people who are incarcerated to themselves become incarcerated
                                         
    
                                         and looking at how racial disparities in sentence lengths could be playing a more important role
                                         
                                         than we previously thought in driving racial disparities in incarceration rates in the population.
                                         
                                         That's a whole project to explain, so I won't go into all of the details, but that was really kind
                                         
                                         of the beginning of my work in that area. So then years after that, I was working at the Human Rights
                                         
                                         Data Analysis Group. And this is a human rights organization, and we do quantitative analysis
                                         
                                         for projects that are pertinent to human rights issues. And a lot of the human rights community
                                         
                                         spend a lot of their time looking outward. So thinking about issues and things that are pertinent to human rights issues. And a lot of the human rights community spend a lot of their time looking outward. So thinking about issues and things that are
                                         
                                         happening outside of their own countries. As we were talking about it, we were thinking,
                                         
    
                                         you know, we have a lot of stuff going on here at home that is certainly relevant to human rights.
                                         
                                         And so we should be trying to take this lens of human rights and statistical data analysis and
                                         
                                         all of these things and apply them to the issues that we're seeing at home. And so we started out by looking at policing and we've sort of moved our way into the criminal
                                         
                                         justice system more broadly in our work there. Wow. And the kind of data that you're looking at,
                                         
                                         I mean, this is again, maybe this is just my naivety, but do you find that the data that you
                                         
                                         have is sufficient both within the US as well as outside, for you to actually make meaningful, you know, draw meaningful insights from it?
                                         
                                         Yeah, that's actually a great question.
                                         
                                         So when I started out doing this work, I was really having to rely only on data you could find on the internet.
                                         
    
                                         So luckily, a lot of cities and states are making criminal justice relevant data available on the internet for people to do
                                         
                                         their own analysis. And so it wasn't like there was a complete lack of data and we had to,
                                         
                                         you know, build things up from nothing. But it was more difficult to find the sorts of information
                                         
                                         that we needed to do really solid analysis. Now, as time has gone on, and I've gotten
                                         
                                         deeper into this field, I have been more able to come up with data sharing agreements
                                         
                                         with various institutions
                                         
                                         within the criminal justice system to have access to more private data that you're not
                                         
                                         just going to find on the internet for good reason. And so, yeah, I actually do think for
                                         
    
                                         the most part, I do have the data that is necessary to do the sorts of analyses that I'd like to do.
                                         
                                         But of course, there's always things that aren't measured or things you would like to be able to understand better. But
                                         
                                         yeah, I'd say by and large, I do now based on having built up a little bit of a career in this
                                         
                                         area and having made connections in this area, so that I can get that sort of access to data,
                                         
                                         again, under fairly strong data sharing agreements and things like that.
                                         
                                         I mean, that's a great segue sort of into what I was definitely going to get at, which is when we talk about data, then, you know, of course,
                                         
                                         privacy can't be very far behind. Even in areas such as like personalization and shopping, you
                                         
                                         know, there is a sense of intrusion that you feel if the recommendations that you get are too close
                                         
    
                                         to what you're studying, you know, your browsing patterns, etc. What kind of concerns have you
                                         
                                         dealt with in this specific, you know specific realm with regards to data privacy?
                                         
                                         And are there any special considerations that you need to make? Yeah, you know, so a lot of the data
                                         
                                         that you're going to find if you do start working in this area is going to be personally identifying
                                         
                                         information, right? And so you have to take any sort of precautions that you would do there with
                                         
                                         any other sort of data where you have that sort of personally identifiable information. So it's sensitive information, right? It's information about people's criminal history.
                                         
                                         In some cases, it's information about interviews that have been conducted with the people
                                         
                                         while they've been incarcerated. It's all sorts of stuff that really shouldn't entirely be public.
                                         
    
                                         And so there's really good reasons for it to be private. And of course, we take all the precautions
                                         
                                         that are necessary to make sure that that remains private as we're
                                         
                                         using it. I think when we're talking about data from the criminal justice system, we also get
                                         
                                         into some really kind of like different ethical gray areas where some of the data that we have
                                         
                                         is actually a matter of public record and you can kind of find it out there in the world. It's not
                                         
                                         really super private in the sense that, you know, it is sort of on the public record. And so you can just get it via FOIAs or, or however, there are questions about how appropriate it is, I think, to publish that data with the
                                         
                                         individual's names in it, even though it's there. It sort of makes it easier for people to find
                                         
                                         information that's not really relevant to the analysis itself, but could years down the road,
                                         
    
                                         I think, be embarrassing for the person or cause them more difficulty getting a job.
                                         
                                         If it's sort of out in areas that are more easily searchable and findable, say published in an academic journal or
                                         
                                         especially some sort of open access sort of paper. And so I think you get into this sort of strange
                                         
                                         gray area where there is data you can get where people are individually identified, but it is a
                                         
                                         matter of public record. So what do you do with that, right? Do you have any sort of obligation to not further put that person's name out there as sort of like tarring them as a
                                         
                                         criminal for life, right? I think these are areas that we need to think a lot about.
                                         
                                         Right. No, absolutely. And one of the other things that I wanted to touch upon, which you had
                                         
                                         mentioned previously, is while you go through and review this data and actually perform analysis on
                                         
    
                                         it,
                                         
                                         you're sort of starting to draw insights from it. I have sort of a two-part question. One is,
                                         
                                         if you feel like the data is not complete enough, what has been your experience in terms of influencing organizations to either collect more data or share more data with you?
                                         
                                         And the second question I had is, as you draw these insights and uncover these, you know, potential biases that you may see,
                                         
                                         how receptive are they to your findings? How receptive are they to actually take it and
                                         
                                         make changes within their organization based on what you find? Yeah. All right. So for the first
                                         
                                         question, how successful have I been in getting people to collect more data? Not at all.
                                         
                                         So that's a really easy one to answer there. You know, there's a sort of there's a lot
                                         
    
                                         of momentum around what sorts of things are collected and how they're stored. And I certainly
                                         
                                         can't go redesign all the systems that people use to store data. And so I haven't even really tried
                                         
                                         on that front. When it comes to the reception, you know, I think it's really mixed. And it
                                         
                                         really depends who you're talking about. So a lot of my work is sort of done through this critical lens that points out how unfairness in the criminal justice system ends up being
                                         
                                         inherited by models that are built on that data. And so in a lot of ways, the critiques I have of
                                         
                                         the models are critiques of the system itself, which I think can be tricky. What I've found is
                                         
                                         that the reception really depends on who you're
                                         
                                         talking about. But I've been surprised at how much people have been willing to listen and how much
                                         
    
                                         I've been invited in to various conversations to sort of make these critiques heard in places that
                                         
                                         I didn't expect to have that opportunity. So yeah, it's a mixed bag, I think.
                                         
                                         Interesting. Have you at all had the opportunity to observe maybe or follow
                                         
                                         like a certain organization over a period of time, like from a time that you sort of make
                                         
                                         suggestions or you're invited in to provide your opinion to see that, you know, they've
                                         
                                         made possibly made changes and you've had an opportunity to sort of evaluate the data coming
                                         
                                         out again. Have you had that sort of a timeline to review the data? So I wouldn't say anything. I've been involved in anything that's happened for quite that long.
                                         
                                         One example of something I've been involved in was this research advisory council with the
                                         
    
                                         mayor's office of criminal justice in New York City. And this group of researchers was convened
                                         
                                         to help give advice. We didn't ultimately end up being able to say you should do this or should
                                         
                                         not do that, but just give, or you can do this or you can't do that. But it was just to give advice on the redesign of a new risk assessment tool that they were rolling out and have since rolled out in New York City to assess the likelihood that someone who's been arrested will fail to appear for a court date if they are released. That was just rolled out fairly recently. So we're not really at the stage where I'm seeing data to be able to evaluate how it's performing now. And I'm not even
                                         
                                         sure I will have access to that. But I did get to see the whole design process from a fairly early
                                         
                                         stage. And it was definitely interesting. And I did feel like in several cases, they were listening
                                         
                                         to the people who are there representing points of view that weren't necessarily like traditional
                                         
                                         in terms of how risk assessments should be made. That's very, very heartening to hear. I'd like to switch gears a bit,
                                         
                                         Christiane, and go back to something that you said initially, which is that when you started
                                         
    
                                         your career, you were looking at spread of infectious diseases. Given the times that we're
                                         
                                         in right now, I'd love to hear more about, I know you're working on something related to the global health situation with COVID-19. Could you tell us a little bit more
                                         
                                         about that? Yeah, I guess, you know, I've got several projects teed up on this. It might be
                                         
                                         because I'm one of those people who, when presented with anxiety-inducing situations, just work myself
                                         
                                         into the ground. So that's probably, I should probably be embarrassed by the amount of things
                                         
                                         I have going on this, given the short timeline we're talking about. But one of the
                                         
                                         projects I'm really excited about is working with some epidemiologists. So an old colleague of mine,
                                         
                                         Eric Lofgren, who's at Washington State University, and a new colleague through this project,
                                         
    
                                         Nina Pfefferman, who's at the University of Tennessee, to model the spread of COVID in jails.
                                         
                                         And also this not just jails, but in jails and in the
                                         
                                         communities, because jails are a fairly porous barrier between the community and the jail
                                         
                                         itself. So people are constantly coming and going from jails. And so the top line finding that we
                                         
                                         have there is that if you can reduce transmission in jails, that has spillover effects into the
                                         
                                         community. So you can also reduce infections in the community as well.
                                         
                                         And so, you know, what sort of interventions could reduce transmissions in jails? Well, you could do things like arrest fewer people, expedite release, things like that. And so we
                                         
                                         have this model, a sort of standard epidemiological model with some additional bells and whistles
                                         
    
                                         attached to it. We're actually modeling the flow of people in and out of jail and within jail to their court dates
                                         
                                         and back that looks at how the spread of COVID in jails can have an impact on the whole community.
                                         
                                         So that's something that should be coming out within the next, what's a Friday? So early next
                                         
                                         week, I would think. And I'm really excited about that project. I should also mention,
                                         
                                         and this was a huge oversight to not say this in the beginning when I mentioned the other two
                                         
                                         researchers I'm working with. I'm also working with some researchers from the ACLU, Aaron Horowitz and Brooke Watson on this project.
                                         
                                         So that's been a really fantastic collaboration that has just been all consuming for like the past three weeks or a month.
                                         
                                         And the final thing that I'm pretty excited about on this topic is I'm working on a project estimating the parameters of epidemiological models.
                                         
    
                                         So like an SEIR model or an SIR model, while accounting for time lags in the data. So the
                                         
                                         idea here is that normally these types of models end up getting fit to the case count. So how many
                                         
                                         infections have we counted and over what period of time or what's the time series look like?
                                         
                                         But in this case, because in the United States and elsewhere, there's this lack of testing,
                                         
                                         right? The numbers that we see for the case counts are at least as much a reflection of how tests
                                         
                                         are being distributed as they are a reflection of the number of cases.
                                         
                                         And so we were thinking, what data do we think is fairly trustworthy that we can use?
                                         
                                         And we were thinking, well, you know, probably the number of deaths is more trustworthy.
                                         
    
                                         So the number of COVID-19 attributable deaths, that's probably something we should be building
                                         
                                         a model on instead.
                                         
                                         And other research groups have come to this conclusion as well.
                                         
                                         But the problem with this is that you have this time lag from the time of infection to
                                         
                                         the time of death.
                                         
                                         People don't typically die within a day of catching the disease.
                                         
                                         So we're working on building a model that incorporates that type of time lag explicitly into the likelihood of the model so that we can come up with new ways to estimate that.
                                         
                                         The epidemiological curves, given the reality that one, we don't really trust the case count data, so we have to base the model on something else.
                                         
    
                                         And two, there's this non-negligible time lag between what we get to see and what we're trying to estimate? You know, I mean, given the situation that all of us are in right now, and, you know, I think we're all sort of armchair data
                                         
                                         experts. I mean, I think we're consuming this data at a mind-blowing pace, right, both in terms of
                                         
                                         news as well as information like this, that everybody wants to know more, everybody is sort
                                         
                                         of anxious about it. So I think the work that you're doing is simply fascinating. But this idea that you mentioned of the count not being a reliable metric, is that something that you have seen? Is this primarily
                                         
                                         in the case of infectious diseases? Or are there other areas that also have this sort of challenge
                                         
                                         of not being able to get a reliable count? Yeah, so that you're actually asking a question that
                                         
                                         goes back to most of my work at the Human Rights Data Analysis Group. So I would say the bread and butter of
                                         
                                         that organization and really a lot of the work that gets done out of that is understanding
                                         
    
                                         undercounts in other types of situations. So what we focus on there is casualty estimation. So in
                                         
                                         the time of some sort of violence, it can be really difficult to
                                         
                                         collect reliable data on the number of people who've been killed. And what we found in our
                                         
                                         experience there is that when local organizations, government organizations, NGOs, whatever, are
                                         
                                         trying to collect data on the number of people who've been killed, it doesn't end up being a
                                         
                                         representative or complete sample. And that's not to say they're not doing an excellent job because
                                         
                                         all of the people who are trying to collect this data in times of conflict are doing
                                         
                                         incredible work under really, really difficult situations. But the reality is, just like I was
                                         
    
                                         saying with the COVID-19 case counts, that it's as much a reflection of distribution of tests as it
                                         
                                         is the number of cases. What we find in data on deaths in times of conflict is it's just as much
                                         
                                         a reflection of where resources are being allocated to do that sort of data collection
                                         
                                         as it is a reflection of how many people have been killed. And so the work I've been doing
                                         
                                         at the Human Rights Data Analysis Group for the past, I think, something like five years,
                                         
                                         has been developing methods to take lists of the names of people who've been killed from
                                         
                                         various on-the-crown organizations and apply statistical models to come up with an estimate
                                         
                                         of the number of people who weren't recorded by any of the organizations. So the sort of dark
                                         
    
                                         number, the number of people who ended up being completely unrecorded. And this goes back to some
                                         
                                         ecological models that originally were developed for animal populations. So estimating the size of, say, fish in a lake, you catch fish in a lake, you tag them, you throw them back in,
                                         
                                         and then you see how much overlap there is in your second catch with your first catch.
                                         
                                         This might seem like a little bit of a harsh comparison when we're talking about estimating
                                         
                                         human deaths. And of course, we don't like to imply that there's some sort of similarity there
                                         
                                         in terms of the seriousness. But when it comes down to the statistical modeling, the methods are fairly similar in the sense that they are based on
                                         
                                         looking at the overlaps among the lists of names of people who have been killed. So if one
                                         
                                         organization collects the names of a bunch of people and another organization collects another
                                         
    
                                         list of the names of people who've been killed, you can look at the overlaps among those lists.
                                         
                                         Normally, we wouldn't do it with only two lists, because that requires you to assume something like the lists are independent,
                                         
                                         which we don't believe. But as you get more and more lists, you can relax some of the assumptions
                                         
                                         you have to make. And sort of, again, to go back to, you know, the simple example, which again,
                                         
                                         this is just a simple example here. It's not actually how it's done. The intuition here is
                                         
                                         if there's a whole lot of overlap between the lists, well, that means
                                         
                                         you probably got most of the people because there's both collection efforts captured almost
                                         
                                         all the same names. If there's not a whole lot of overlap, well, then you think the universe of
                                         
    
                                         individuals out there who've been killed is probably much greater than what you're observing,
                                         
                                         even on the union of the lists that you're looking at. Again, it gets more complicated,
                                         
                                         and there's all sorts of different methods you can apply to account for capture heterogeneity, to account for dependence between the lists, all sorts of things
                                         
                                         like that. That ends up being, I think, a really interesting area to do research.
                                         
                                         Well, I'm curious to ask though, I mean, in this case of the counting being inaccurate,
                                         
                                         what would you say is the biggest risk of that? Is it mostly our response to some of
                                         
                                         these situations that might not be as urgent? Like what would, in your observation, what is
                                         
                                         the biggest challenge with this and how big of a problem is it? Yeah, I mean, so I think that really
                                         
    
                                         gets at why do we do this, right? Like what's the point? So in my work with the Human Rights Data
                                         
                                         Analysis Group, in many cases, we have partnered with truth commissions where the idea is that after some sort of violent conflict, in order for the
                                         
                                         peacemaking process to move forward, there really needs to be some sort of shared narrative about
                                         
                                         what happens and sort of shared understanding about who is committing atrocities and against
                                         
                                         whom. And so, you know, at the sort of the most base level, just even understanding the magnitude
                                         
                                         of the conflict, I think can help with that process of really acknowledging all the harm that had been done.
                                         
                                         And the other important, I think, reason to do this is if you can come up with different estimates disaggregated, say, by space or time or any sort or perpetrator or victim, then you can come up with a better understanding of the dynamics of the conflict.
                                         
                                         So for example, if you find that some minority group was much, much more likely to have been
                                         
    
                                         killed, but not that much more likely to have been recorded, that can hint at hiding of those
                                         
                                         sorts of killings, and it can also support claims of genocide. Another aspect of this is you can
                                         
                                         look at what sort of policy interventions, what their impact was. So for example, if there's some sort of policy intervention and you find that after that intervention,
                                         
                                         the number of recorded killings goes down, it's really useful to understand,
                                         
                                         was it the case that the number of killings actually went down or was it the case that
                                         
                                         people became too afraid to report them? So the mechanisms to report safely were also
                                         
                                         decreased. So being able to disentangle the reporting from the actual level of violence,
                                         
                                         I think, is really important for that sort of retrospective policy analysis.
                                         
    
                                         Absolutely. And I know that you, you know, your work in this field also extends to sort of academic
                                         
                                         as well as, you know, industry setting in the sense that I know you help organize the ACM
                                         
                                         Fairness, Accountability and Transparency Conference or the ACM FACT Conference.
                                         
                                         What is the main sort of goal of that event? And you know, what kind of attendees do you sort of
                                         
                                         attract? I mean, are there a lot of people working on the kind of problems that you're
                                         
                                         trying to solve? And is this sort of a gathering of those similar and like minds?
                                         
                                         Yeah, so this is this is something I'm really proud of. I'm really proud of seeing this community
                                         
                                         really grow over the last three something years. So the goal of this conference is mostly
                                         
    
                                         encapsulated in the name. It's to study fairness, accountability, and transparency in socio-technical
                                         
                                         systems. And so this is meant to attract attendees from a whole variety of disciplines. And in fact,
                                         
                                         it does. So we have people from law, from policy, from statistics and machine learning, from sociology, from all sorts of different types
                                         
                                         of studies, philosophy, who present their work there. It's highly interdisciplinary.
                                         
                                         The other thing that I think is really neat about this conference is we have avenues for people who
                                         
                                         don't publish in the ways that academics traditionally do. So via, say, like an eight-page
                                         
                                         conference paper, there are also ways that they can have content in the conference.
                                         
                                         So we have people from advocacy organizations, their policy organizations, and they'll present
                                         
    
                                         things like tutorials or these sort of interactive sessions. I participated in this really interesting
                                         
                                         session in this past one, deconstructing your memories of how you've thought about designing a sort of fair model. And so we have people from all across the spectrum,
                                         
                                         all sorts of different institutional types, all sorts of academic disciplines. And it's really
                                         
                                         a place for people to think really deeply about the impacts of technology on society. And I think
                                         
                                         it's really important to have all you know, all different sorts of perspectives
                                         
                                         represented when we're thinking about those issues. So that's the community I'm really proud of.
                                         
                                         Well, that sounds amazing. And I mean, I think that intersection of folks from, you know,
                                         
                                         various disciplines is so critical to have these sort of key conversations and actually take
                                         
    
                                         forward some of these issues, especially, I mean, given that it is a conference that's run
                                         
                                         the sort of foundation of technology, I think it given that it is a conference that's run the sort of foundation
                                         
                                         of technology, I think it's amazing that you're able to attract folks from all of these various
                                         
                                         disciplines who really need to have a say in these matters. Yeah, absolutely. And one key contingent
                                         
                                         that I did leave out and shouldn't have is industry. There's also an industry presence
                                         
                                         there as well, which I think is, it's also important to be talking to those folks.
                                         
                                         Absolutely. Yeah. Speaking of folks from industry,
                                         
                                         a lot of our listeners are actually practitioners, potentially young professionals,
                                         
    
                                         who also would love to hear more about you from you on just your career, right? What's your
                                         
                                         journey been like? How did you navigate your career? You come from a math and stats background
                                         
                                         into computer science, and that transition is fairly seamless. But we'd love to hear any unique stories that you may have to share from your career journey.
                                         
                                         Sure. My career has been a little bit of a winding path, in fact, more than most people's,
                                         
                                         I think. So the beginning of it starts out pretty straightforward. I went to Rice University,
                                         
                                         and I started out studying math. And about halfway through, I realized statistics,
                                         
                                         this is a thing for me. And so I went straight from there to grad school at Duke where I did a PhD in statistics.
                                         
                                         But by the time I was done with that, I was feeling fairly burned out. I was 25
                                         
    
                                         and felt like a little bit of an adventure. So I took a postdoc down in Rio de Janeiro, Brazil
                                         
                                         and ended up going down there by myself at first. And after a few months, it wasn't
                                         
                                         really working out. I was
                                         
                                         ending up spending more time trying to deal with the bureaucracy than I was getting real research
                                         
                                         done. So I ended up, I just kind of left, but I ended up spending the rest of the year down in
                                         
                                         Brazil, kind of taking a break and reevaluating where I wanted my career to go and what I was
                                         
                                         going to do. I also traveled around Europe for a little bit. So I had a good bit of a breather
                                         
                                         after grad school and after the beginning of the postdoc that I ended up leaving early.
                                         
    
                                         To back up a little bit chronologically, about halfway through grad school, I ended up sending
                                         
                                         a cold email to the founder of the Human Rights Data Analysis Group, Patrick Ball,
                                         
                                         because about halfway through grad school, I'd been talking to a friend about how I was really
                                         
                                         excited about all the work I was doing.
                                         
                                         And I was doing most of the things, just building models and coming up with algorithms for estimating those models, you know, sort of standard methodological statistics sort of things.
                                         
                                         I was really interested in finding a way to make those skills have some sort of social impact.
                                         
                                         So I talked to a friend who had heard Patrick give a talk.
                                         
                                         I think she was in law school and he gave a talk there.
                                         
    
                                         She's like, you should really talk to this guy.
                                         
                                         So I looked him up. I was really interested in his work. He was doing a lot of this
                                         
                                         population estimation for casualties. So estimating the number of deaths and conflicts like I just
                                         
                                         talked about. And I asked her for his email and she didn't have it. So I was like, well, okay.
                                         
                                         So I think it took me about 10 Google pages in and I found his email on like the last page of
                                         
                                         somebody's CV. And I was like, you know, what the heck? I'll just cold email him and see and see what happens. And so this is about halfway through grad school. I cold emailed him and I was
                                         
                                         like, Hey, really like your work. I think I could help in these ways. What do you think? And he's
                                         
                                         like, why don't you come out for the summer? And we ended up going down to Columbia and working on
                                         
    
                                         a project down there for those three months. And I sort of stayed in the orbit of HR DAG after that.
                                         
                                         So that's an important detail going forward. So anyway,
                                         
                                         after I came back from this sort of like year of just taking some time off and thinking about what I wanted to do, I ended up landing at the lab that I talked about earlier where we were doing
                                         
                                         micro simulation things. And I was designing synthetic populations to do those sorts of
                                         
                                         simulations. After about two years there, I decided that it was better for me to move back
                                         
                                         down to Durham to be with my husband because we were living separately. And just sort of for
                                         
                                         personal reasons, it was too hard to be separate for so long. So I ended up moving back down to
                                         
                                         Durham. But at the time, we had a really good friend from grad school who had just started a
                                         
    
                                         startup. So our friend Wes McKinney had started this company called Datapad, which was
                                         
                                         doing basically like cloud-based big data visualization analytics sort of stuff, and they
                                         
                                         needed a data scientist. So I signed on there as employee, I think, number 10, and spent several
                                         
                                         months there before the company got acquired. So I also got some experience in, yeah, in sort of
                                         
                                         like Silicon Valley startup world. And I was traveling
                                         
                                         back and forth from North Carolina, where I was living to San Francisco to work on that.
                                         
                                         But then after that, so I told you this is a winding story, right? So sorry.
                                         
                                         It's fascinating.
                                         
    
                                         But then after that, gosh, this must have been like 2015 or so. And I'm sure I could get the
                                         
                                         exact dates, but I don't think that's
                                         
                                         important for the story. Yeah, so it was around 2015. After that all ended, I ended up staying
                                         
                                         and doing tech consulting just because I'd made a bunch of connections in that area while I was
                                         
                                         doing this sort of startup thing. So I did some data science consulting for some small startups,
                                         
                                         like one was called Treasure Data. I also did some data science consulting for larger companies.
                                         
                                         One of those is called eBay, where we were looking at building an anomaly
                                         
                                         detection system. And I also spent about half my time back with HRDag because I felt like that was
                                         
    
                                         sort of where my heart had been since grad school. I'd really felt like that was something I was
                                         
                                         really interested in pursuing again. And so I was sort of splitting my time doing all of that.
                                         
                                         And then I think sometime around 2016, I ended up, it was a lot to be doing a bunch of different
                                         
                                         types of consulting.
                                         
                                         It's a lot to keep track of when you're doing things that are just so different.
                                         
                                         And so I ended up joining full time at HRDAG.
                                         
                                         I think it was sometime around 2016 and stayed there until just now, about a month ago when
                                         
                                         I moved over to the University of Pennsylvania.
                                         
    
                                         So a whole lot of different types of things from,
                                         
                                         you know, the sort of standard trajectory, then into a research lab type of environment,
                                         
                                         then to a startup, then to sort of being an independent data science consultant,
                                         
                                         and then sort of back to the nonprofit world and now back to academia. So it's kind of this
                                         
                                         full circle type of journey. Yeah, that's great, Christiane, because you're
                                         
                                         like sort of the poster child for somebody that we, you know, we think about when we say
                                         
                                         transition from academia to industry and back is actually very challenging, or just even
                                         
                                         working across those lines seems to be very difficult in many cases. But your journey has
                                         
    
                                         been proof of the fact that it can be done. You know, I think it's especially inspiring,
                                         
                                         because you started out this entire journey with a cold email. And sometimes we're so afraid to make those
                                         
                                         decisions and say, should I even reach out? And I think your example is a great way for us to think
                                         
                                         about this and say, you know, what is the worst that can happen? You won't hear back.
                                         
                                         Yeah, that's about it.
                                         
                                         That's great. So, Christiane, we're sort of running out of time, but I'd love to hear more
                                         
                                         about like, you know, now that we're all cooped up in our respective homes and sheltered in place
                                         
                                         in many locations, what do you do in your free time if you have any?
                                         
    
                                         You know, I don't have a whole lot of free time. So what I do these days, since we are all cooped
                                         
                                         up, is, well, I would be doing this anyway. Let's be honest. I'd be doing this anyway. I hang out
                                         
                                         with my daughter. I have a 16-month-old daughter who takes up a whole lot of time and energy right now, but I
                                         
                                         wouldn't have it any other way. She is just a delight and a joy. And I don't need to go on on
                                         
                                         that because I think probably most parents feel that way. And I could go on for hours about how
                                         
                                         wonderful she is. And that would just be beside the point and probably kind of annoying. Some of
                                         
                                         my other hobbies though, which I don't get a lot of time to do these days, maybe in a couple of years, I really like rock climbing,
                                         
                                         mostly indoors, some outdoors, especially when I was at Virginia Tech, I used to really like to go
                                         
    
                                         rock climbing outdoors in the River Gorge. And the other thing that's kind of weird, I guess,
                                         
                                         is I really like sewing costumes. So I used to, and I plan to revive this again, go to the
                                         
                                         Renaissance Fair at least once a year and
                                         
                                         pretty much always in a new costume and so that was sort of my yearly project was sewing something
                                         
                                         brand new for me my friends my husband any number of people who wanted to go wow yeah that's that is
                                         
                                         a unique hobby have you been sewing any masks because I know a lot of us have. I plan to, I have not yet, but I definitely plan
                                         
                                         to. Got it. Well, it's been amazing, Christiane, talking to you. For our final bite, I'd love to
                                         
                                         hear what is it that you're most excited about in your field of computing over the next five or so
                                         
    
                                         years? Yeah, I feel like I haven't really had, as you can tell from my like, what was my career like answer,
                                         
                                         I haven't really had much of a plan. I've sort of always just followed wherever I felt there
                                         
                                         was something interesting to do at the time. And so I think that's probably what's going to
                                         
                                         keep happening for the next five years. I'll probably just continue following whatever's,
                                         
                                         whatever piques my interest as long as I can, you know, have a job that lets me do that. So
                                         
                                         yeah, I don't know. Right now COVID-19 is really interesting. Fairness is really interesting. That is fairness
                                         
                                         in machine learning. There's all sorts of things that are really interesting and we'll see. We'll
                                         
                                         see what happens. Great. Thank you so much for talking to us. It's been an absolute pleasure
                                         
    
                                         to host you on our show. Thank you for taking the time to speak with ACM ByteCast. Thanks so much for having me.
                                         
                                         ACM ByteCast is a production of the Association for Computing Machinery's Practitioners Board. To learn more about ACM and its activities, visit acm.org. For more information about this and other episodes, please visit our website at learning.acm.org slash ByteCast.
                                         
                                         That's learning.acm.org slash B-Y-T-E-C-A-S-T.
                                         
