Microsoft Research Podcast - AI Testing and Evaluation: Learnings from genome editing

Starting point is 00:00:00 Welcome to AI Testing and Evaluation, Learnings from Science and Industry. I'm your host, Kathleen Sullivan. As generative AI continues to advance, Microsoft has gathered a range of experts from genome editing to cybersecurity to share how their fields approach evaluation and risk assessment. Our goal is to learn from their successes and

Starting point is 00:00:24 their stumbles to move the science and practice of AI testing forward. In this series, we'll explore how these insights might help guide the future of AI development, deployment, and responsible use. Today I'm excited to welcome R. Altasharo, the Warren P. Knowles Professor Emerita of Law and Bioethics at the University of Wisconsin-Madison to explore testing and risk assessment in genome editing. Professor Sharrow has been at the forefront of biotechnology policy and governance for decades, advising former President Obama's transition team on issues of medical research

Starting point is 00:01:02 and public health, as well as serving as a senior policy advisor at the Food and Drug Administration. She consults on gene therapy and genome editing for various companies and organizations, and has held positions on a number of advisory committees, including for the National Academy of Sciences. Her committee work has spanned women's health, stem cell research, genome editing, biosecurity, and more. After our conversation with Professor Sharrow, we'll hear from Daniel Klutz, a partner general manager in Microsoft's Office of Responsible AI, about what these insights from biotech regulation could mean for AI governance and risk assessment, and his team's work governing sensitive AI uses and emerging technologies. Alta, thank you so much for being here today.

Starting point is 00:01:47 I'm a follower of your work and have really been looking forward to our conversation. It's my pleasure. Thanks for having me. Alta, I'd love to begin by stepping back in time a bit before you became a leading figure in bioethics and legal policy. You've shared that your interest in science was really inspired by your brother's interest in the topic and that your upbringing really helped shape your perseverance and resilience. Can you talk to us about what put you on the path to law and policy? I think it's true that many of us are strongly influenced by our families and certainly my family had kind of a sciencey techy orientation. My father

Starting point is 00:02:26 was a refugee escaping the Nazis. And when he finally was able to start working in the United States, he took advantage of the GI Bill to learn how to repair televisions and radios which were really just coming in in the 1950s. So he was kind of technically oriented. My mother retrained from being a talented amateur artist to becoming a math teacher. And not surprisingly, both my brothers began to aim toward things like engineering and chemistry and physics. And our form of entertainment was to watch PBS or Star Trek. And so the interest comes from that background coupled with, in the 1960s, this enormous surge of interest in the so-called nature versus nurture debate

Starting point is 00:03:17 about the degree to which we are destined by our biology or shaped by our environments. It was a heady debate and one that perfectly combined the two interests in politics and science. For listeners who are brand new to your field in genomic editing, can you give us what I'll call a 90-second survey of the space in perhaps plain language and why it's important to have a framework for ensuring its responsible use? Well, you know, genome editing is both very old and very new. At base, what we're talking about is a way to either delete sections of the genome, our collection of genes, or to add things or to alter what's there. The goal is simply to be able to

Starting point is 00:04:08 take what might not be healthy and make it healthy, whether it's a plant, an animal, or a human. Many people have compared it to a word processor where you can edit text by swapping things in and out. You could change the letter G to the letter H in every word, and in our genomes, you can do similar kinds of things. But because of this, we have a responsibility to make sure that whatever we change doesn't become dangerous, and that it doesn't become socially disruptive.

Starting point is 00:04:44 Now, the earliest forms of genome editing were very inefficient and so we didn't worry that much. But with the advances that were spearheaded by people like Jennifer Doudna and Emmanuel Charpentier, who won the Nobel Prize for their work in this area, genome editing has become much easier to do. It's become more efficient. It doesn't require as much sophisticated laboratory equipment. It's moved from being something that only a few people can do

Starting point is 00:05:15 to something that we're going to be seeing in our junior high school biology labs. And that means you have to pay attention to who's doing it, why are they doing it, what are they releasing, if anything, into the environment, what are they trying to sell, and is it honest and is it safe? How would you describe the risks? And are there, you know, sort of specifically inherent risks in the technology itself, or do those risks really emerge only when it's applied in certain contexts like CRISPR in agriculture or CRISPR for human therapies? Well, to answer that, I'm going to do something that may seem a little picky, even pedantic, but I'm going to distinguish between hazards and risks.

Starting point is 00:05:59 So there are certain intrinsic hazards. That is, there are things that can go wrong. You want to change one particular gene or one particular portion of a gene, and you might accidentally change something else, a so-called off-target effect. Or you might change something in a gene, expecting a certain effect, but not necessarily anticipating

Starting point is 00:06:23 that there's gonna to be an interaction between what you changed and what was there, a gene-gene interaction that might have an unanticipated kind of result, a side effect essentially. So there are some intrinsic hazards, but risk is a hazard coupled with the probability that it's going to actually create something harmful. And that really depends upon the application. If you are doing something that is making a change in a human being that is going to be a lifelong change, that enhances the significance of that hazard.

Starting point is 00:07:02 It amplifies what I call the risk because if something goes wrong, then its consequences are greater. It may also be that in other settings, what you're doing is going to have a much lower risk because you're working with a more familiar substance, your predictive power is much greater, and it's not going into a human or an animal or into the environment. So I think that you have to say that the risk and the benefits, by the way, all are going to depend upon the particular application. Yeah. I think on this point of application, there's many players involved in that, right? Like, we often hear about this puzzle of who's actually responsible for ensuring safety and

Starting point is 00:07:48 a reasonable balance between risks and benefits or hazards and benefits, to quote you. Is it the scientists, the biotech companies, government agencies? And then if you could touch upon as well, maybe how does the nature of genome editing risks, how did those responsibilities get divvied up? Well, in the 1980s, we had a very significant policy discussion about whether we should regulate the technology, no matter how it's used or for whatever purpose, or if we should simply fold the technology in with all the other technologies that we currently have and regulate its applications, the way we regulate applications generally. And we went for the second,

Starting point is 00:08:32 the so-called coordinated framework. So what we have in the United States is a system in which if you use genome editing in purely laboratory-based work, then you will be regulated the way we regulate laboratories. There's also, at most universities because of the way the government works with this, something called Institutional Biosafety Committees, IBCs. You want to do research that involves recombinant DNA and modern biotechnology, including genome editing, but not limited to it. You have to go first to your IBC and they look and see what you're doing to decide if there's a danger there that you have not anticipated that requires special attention. If what you're doing is going to get released into the environment,

Starting point is 00:09:15 or it's going to be used to change an animal that's going to be in the environment, then there are agencies that oversee the safety of our environment, predominantly the Environmental Protection Agency and the U.S. Department of Agriculture. If you're working with humans and you're doing medical therapies, like you're doing the gene therapies that just have been developed for things like sickle cell anemia, then you have to go through a very elaborate regulatory process that's overseen by the Food and Drug Administration and also seen locally at the research stages overseen by institutional review boards that make sure the people who are being recruited into research understand what they're getting

Starting point is 00:09:56 into, that they're the right people to be recruited, etc. So we do have this kind of Jenga game of regulatory agencies. And on top of all that, most of this involves professionals who've had to be licensed in some way. There may be state laws specifically on licensing. If you are dealing with things that might cross national borders, there may be international treaties and agreements that cover this. And of course the insurance industry plays a big part because they decide whether or not

Starting point is 00:10:30 what you're doing is safe enough to be insured. So all of these things come together in a way that is not at all easy to understand if you're not kind of working in the field. But the bottom line thing to remember, the way to really think about it is we don't regulate genome editing. We regulate the things that use genome editing. Yeah, that makes a lot of sense. Actually, maybe just following up a little bit on this notion of a variety of different, particularly like government agencies being involved, you know, in this multi-stakeholder model, where do you see gaps today that need to be filled with some of the pros and cons to keep in mind?

Starting point is 00:11:09 Just as we think about distributing these systems at a global level, what are some of the considerations you are keeping in mind on that front? Well, certainly there were times where the way the statutes were written that govern the regulation of drugs or the regulation of foods did not anticipate this tremendous capacity we now have in the area of biotechnology generally or genome editing in particular. And so you can find that there are times where it feels a little bit ambiguous and the agencies have to figure out how to apply their existing rules. So an example, if you're going to make alterations in an animal, right, we have a system for regulating drugs, including

Starting point is 00:11:57 veterinary drugs. But we didn't have something that regulated genome editing of animals. But in a sense, genome editing of an animal is the same thing as using a veterinary drug. You're trying to affect the animal's physical constitution in some fashion. And it took a long time within the FDA to sort of work out how the regulation of veterinary drugs would apply if you think about the genetic construct that's being used to alter the animal as the same thing as injecting a chemically based drug. And on that basis they now know here's the regulatory path, here are the tests you have to do, here are the permissions you have to do, here's the surveillance you have to do after it goes on the market. Even there sometimes it was confusing what happens when it's

Starting point is 00:12:44 not the kind of animal you're thinking about when you think about animal drugs? Like, we think about pigs and dogs, but what about mosquitoes? Because there, you're really thinking more about pests, and if you're editing the mosquito so that it can't, for example, transmit dengue fever, right? It feels more like a public health thing than it is a drug for the mosquito itself, and it kind of fell in between the agencies that possibly had jurisdiction, and it took a while for the USDA, the Department of Agriculture, and the Food and Drug Administration to work out an agreement about how they would share this responsibility. So you do get those kinds of areas in which you have at least ambiguity. We also have situations where, frankly, the fact that some things can move across national

Starting point is 00:13:35 borders means you have to have a system for harmonizing or coordinating national rules. If you want to, for example, genetically engineer mosquitoes, they can't transmit dengue, mosquitoes have a tendency to fly. And so they can't fly very far. That's good. That actually makes it easier to control. But if you're doing work that's right near a border, then you have to be sure that the country next to you has the same rules for whether it's permitted to do this and how to surveil what you've done in order to be sure that you got the results you wanted to get, no other results. And that also is an area where we have a lot of work to be done in terms of coordinating across government borders and harmonizing our rules.

Starting point is 00:14:18 Yeah, I mean, you've touched on this a little bit, but there is such this striking balance between advancing technology, ensuring public safety, and sometimes I think it feels just You've touched on this a little bit, but there is such this striking balance between advancing technology, ensuring public safety, and sometimes I think it feels just like you're walking a tightrope where, you know, if we clamp down too hard, we'll stifle innovation, and if we're too lax, we risk some of these unintended consequences. And on a global scale, like you just mentioned as well, how has the field of genome editing found its balance? It's still being worked out, frankly.

Starting point is 00:14:48 But it's finding its balance application by application. So in the United States, we have two very different approaches on regulation of things that are going to go into the market. Some things can't be marketed until they've gotten an approval from the government. So you come up with a new drug, you can't sell that until it's gone through FDA approval. On the other hand, for most foods that are made up of familiar kinds of things, you can go on the market and it's only after they're on the market that the FDA can act to withdraw it if a problem arises. So basically we have either pre-market controls, you can't go on without permission, or post-market controls.

Starting point is 00:15:34 We can take you off the market if a problem occurs. How do we decide which one is appropriate for a particular application? It's based on our experience. New drugs typically are both less familiar than existing things on the market and also have a higher potential for injury if they in fact are not effective or they are in fact dangerous and toxic. If you have foods, even bioengineered foods that are basically the same as foods that are already here, it can go on the market with notice but without a prior approval. But if you create something truly novel, then it has to go through a whole long process. And so that is the way that we make this balance.

Starting point is 00:16:20 We look at the application area and we're just now seeing in the Department of Agriculture a new approach on some of the animal editing, again, to try and distinguish between things that are simply a more efficient way to make a familiar kind of animal variant and those things that are genuinely novel and to have a regulatory process that is more rigid, the more unfamiliar it is, and the more that we see a risk associated with it. I know we're at the end of our time here, and maybe just a quick kind of lightning round of a question. For students, young scientists, lawyers, or maybe even entrepreneurs listening who are

Starting point is 00:17:01 inspired by your work, what's the single piece of advice you give them if they're interested in policy, regulation, the ethical side of things in genomics or other fields? I'd say be a bio-optimist and read a lot of science fiction because it expands your imagination about what the world could be like. Is it going to be a world in which we're now going to be growing our buildings instead of building them out of concrete? Is it going to be a world in which our plants will glow in the evening so we don't need to be using batteries or electrical power from other sources, but instead our environment is adapting to our needs. You know, expand your imagination

Starting point is 00:17:47 with a sense of optimism about what could be and see ethics and regulation not as an obstacle, but as a partner to bringing these things to fruition in a way that's responsible and helpful to everyone. Wonderful. Well, also, this has been just an absolute pleasure. So thank you. It was my pleasure. Thank you for having me. Now I'm happy to bring in Daniel Klutz. As a Partner General Manager in Microsoft's Office of Responsible AI, Daniel leads the

Starting point is 00:18:23 group's Sensitive Uses and Emerging Technologies program. Daniel, it's great to have you here. Thanks for coming in. It's great to be here, Kathleen. Yeah. So maybe before we unpack all Tasharo's insights, I'd love to just understand the elevator pitch here. What exactly is Sensitive Uses and Emerging Technologies

Starting point is 00:18:40 program, and what was the impetus for establishing it? Yeah, so the Sensitive Uses and Emerging Technologies program sits within our Office of Responsible AI at Microsoft and inherent in the name there are two real core functions. There's the Sensitive Uses and Emerging Technologies. What does that mean? Sensitive Uses, think of that as Microsoft's internal consulting and oversight function for our higher risk, most impactful AI system deployments. And so my team is a team of

Starting point is 00:19:11 multidisciplinary experts who engages in sort of a white glove treatment sort of way with product teams at Microsoft that are designing, building, and deploying these higher risk AI systems. And where that sort of consulting journey culminates is in a set of bespoke requirements tailored to the use case of that given system that really implement and apply our more standardized, generalized requirements that apply across the board. Then the emerging technologies function of my team faces a little bit further out trying to look around corners To see what new and novel and emerging risk are coming out of new AI technologies with the idea that we work with our researchers or engineering partners and of course product

Starting point is 00:19:58 Leaders across the company and understand where Microsoft is going With those emerging technologies and we're're developing rapid, quick-fire, early steer guidance that implements our policies ahead of that formal internal policy-making process, which can take a bit of time. So it's designed to both afford that innovation speed that we like to optimize for at Microsoft, but also integrate our responsible AI commitments and our AI principles into

Starting point is 00:20:27 emerging product development. That segues really nicely, actually, as we met with Professor Sharrow and she was talking about the field of genome editing and the governing at the application level. I'd love to just understand how similar or not is that to managing the risks of AI in our world? Yeah, I mean, Professor Shorah's comments were music to my ears because, you know, where we make our bread and butter, so to speak, in our team is in applying to use cases. AI systems, especially in this era of generative AI, are almost inherently multi-use dual use.

Starting point is 00:21:05 And so what really matters is how you're going to apply that more general purpose technology, who's going to use it and what domain is it going to be deployed, and then tailor that oversight to those use cases. Try to be risk proportionate. Professor Sharer talked a little bit about this, but if it's something that's been done before

Starting point is 00:21:23 and it's just a new spin on an old thing, maybe we're not so concerned about how closely we need to oversee and gate that application of that technology. Whereas if it's something new and novel or some new risk that might be posed by that technology, we take a little bit closer look and we are overseeing that in a more sort of high-touch way. Maybe following up on that, how do you define sensitive use or maybe like high-impact application and once that's labeled what happens? Like what kind of steps kick in from there? Yeah, so we have this sensitive uses program

Starting point is 00:21:59 that's been at Microsoft since 2019. I came to Microsoft in 2019 when we were starting this program in the Office of Responsible AI. It had actually been incubated in Microsoft Research with our Ether community of colleagues who are experts in socio-technical approaches to responsible AI as well. Once we put it in the Office of Responsible AI, I came over. I came from academia. I was a researcher myself. At Berkeley, right? At Berkeley, that's right. Yeah

Starting point is 00:22:27 sociologist by training and a lawyer in a past life But that does help sort of bridge those fields for me But sensitive uses we we force all of our teams when they're envisioning their system design to think about could the reasonably foreseeable use or misuse of The system that they're developing in practice result in three really major sort of risk types. One is could that deployment result in a consequential impact on someone's legal position or life opportunity? Another category we have is could that foreseeable use or misuse result in significant psychological

Starting point is 00:23:05 or physical injury or harm? And then the third really ties in with a long standing commitment we've had to human rights at Microsoft. And so could that system and its reasonably foreseeable use or misuse result in human rights impacts and injurious consequences to folks along different dimensions of human rights impacts and injurious consequences to folks along different dimensions of human rights. Once you decide, we have a process to reporting that project into my office.

Starting point is 00:23:34 And we will triage that project, working with the product team, for example, and our responsible AI champs community, which are folks who are dispersed throughout the ecosystem of Microsoft and educated in our Responsible AI program, and then determine, okay, is it in scope for our program? If it is, say, okay, we're going to go along for that ride with you. And then we get into that whole sort of consulting arrangement that then culminates in this set

Starting point is 00:24:00 of bespoke use case based requirements applying our AI principles. That's super fascinating. What are some of the approaches in the governance of genome editing? Are you maybe seeing happening in AI governance or maybe just like bubbling up in conversations around it? Yeah, I mean, I think we've learned a lot from fields like genome editing that Professor Shorah talked about and others. And again, it gets back to this sort of risk proportionate based approach. It's a balancing test. It's a trade off of trying to sort of foster innovation and really look for the beneficial uses of these technologies.

Starting point is 00:24:37 I appreciated her speaking about that. What are the intended uses of the system, right? And then getting to, okay, how do we balance trying to, again, foster that innovation in a very fast-moving space, a pretty complex space and a very unsettled space, contrasting to other sort of professional fields or technological fields that have a long history and are relatively settled from an oversight

Starting point is 00:25:02 and regulatory standpoint. This one is not. And for good reason. It is still developing. And I think there are certain oversight and policy regimes that exist today that can be applied. Professor Sharrow talked about this as well, where maybe you have certain policy and oversight

Starting point is 00:25:21 regimes that, depending on how the application of that technology is applied, applies there versus some horizontal overarching regulatory sort of framework. And I think that applies from an internal governance standpoint as well. Yeah, that's a great point. So what isn't being explored from genome editing that, you know, maybe we think could be useful to AI governance or as we think about the evolving frameworks, what maybe we think could be useful to AI governance or as we think about the evolving frameworks, what maybe we should be taking into account from what Professor Sharrow shared with us.

Starting point is 00:25:51 So, one of the things I've thought about and took from Professor Sharrow's discussion was she had just this amazing way of framing up how genome editing or regulation is done. And she said, you know, we don't regulate genome editing, we regulate the things that use genome editing or regulation is done. And she said, we don't regulate genome editing, we regulate the things that use genome editing. And while it's not a one-to-one analogy with the AI space, because we do have this sort of very general model level distinction versus application layer and even platform layer distinctions, I think it's fair to say,

Starting point is 00:26:22 we don't regulate AI applications writ large. We regulate the things that use AI in a very similar way. And that's how we think of our internal policy and oversight process at Microsoft as well. And maybe there are things that we regulated and oversaw internally at the first instance, and the first time we saw it come through, and it graduates into more of a programmatic framework for how we manage that. So one good example of that is some of our higher risk AI systems that we offer out of Azure at the platform level.

Starting point is 00:26:55 When I say that, I mean APIs that you call the developers can then build their own applications on top of. We were really deep in evaluating and assessing mitigations on those platform systems in the first instance. But we also graduated them into what we call our limited access AI services program. And some of the things that Professor Shuro discussed really resonated with me. She had this moment where she was mentioning,

Starting point is 00:27:21 you want to know who's using your tools and how they're being used. And this is the same concepts. We want to have trust in our customers, we want to understand their use cases, and we want to apply technical controls that sort of force those use cases or give us signal post-deployment that use cases are being done in a way that may give us some level of concern to reach out and understand what those use cases are. Yeah, you're hitting on a great point. And I love this kind of layered approach that we're taking and that Alta highlighted as well. Maybe to double click a little bit just on that post-market control and what we're

Starting point is 00:27:57 tracking kind of once things are out and being used by our customers, how do we take some of that deployment data and bring it back into maybe even better inform upfront governance, or just how we think about some of the frameworks that we're operating in? It's a great question. The number one thing is for us at Microsoft, we want to know the voice of our customer.

Starting point is 00:28:18 We want our customers to talk to us. We don't want to just understand telemetry and data. But it's really getting out there and understanding from our customers and not just our customers. I would say our stakeholders is maybe a better term because that includes civil society organizations, it includes governments, it includes all of these non sort of customer actors that we care about and that we're trying to sort of optimize for as well. It includes end users of our enterprise customers. If we can gather data about how our products are being used and trying to understand

Starting point is 00:28:50 maybe areas that we didn't foresee how customers or users might be using those things and we can tune those systems to better align with what both customers and users want but also our own AI principles and policies and programs. Daniel, before coming to Microsoft you led social science research and socio but also our own AI principles and policies and programs. Daniel, before coming to Microsoft, you led social science research and socio-technical applications of AI-driven tech at Berkeley. What do you think some of the biggest challenges are

Starting point is 00:29:16 in defining and maybe even just kind of measuring at like a societal level, some of the impacts of AI more broadly? Measuring social phenomenon is a difficult thing. And one of the things that as social scientists you're very interested in is scientifically observing and measuring social phenomena. Well that sounds great. It sounds also very high level and darkening.

Starting point is 00:29:40 What do we mean by that? You know, it's very easy to say that you're collecting data and you're measuring, I don't know, trust in AI, right? That's a very fuzzy concept. It is a concept that we want to get to, but we have to unpack that and we have to develop what we call measurable constructs. What are the things that we might observe that could give us an indication toward what is a very fuzzy and general concept? And

Starting point is 00:30:11 there's challenges with that everywhere and I'm extremely fortunate to work at Microsoft with some of the world's leading socio-technical researchers and some of these folks who are thinking about very steeped in measurement theory, literally PhDs in these fields, how to both measure and allow for a scalable way to do that at a place the size of Microsoft. And that is trying to develop frameworks that are scalable and repeatable and put into our platform that then serves our product teams? Are we providing as a platform a service to those product teams that they can plug in

Starting point is 00:30:52 and do their automated evaluations at scale as much as possible? And then go back in over the top and do some of your more qualitative targeted testing and evaluations. Yeah, it makes a lot of sense. Before we close out, if you're a game for it, maybe we do a quick lightning round, just 30 second answers here. Favorite real-world sensitive use case you've ever reviewed? Oh gosh. Well, this is where I get to be the social scientist. It's a defined favorite, Kathleen.

Starting point is 00:31:22 Most memorable, most painful. Let's do most memorable. We'll do most memorable. You know, I would say the most memorable project I worked on was when we rolled out the new Bing chat, which is no longer called Bing chat, because that was the first really big cross-company effort to deploy GPT-4, which was the next step up in AI innovation from our partners at OpenAI. And I really value working hand-in-hand with engineering teams and with researchers. And that was us at our best and really sort of turbocharged the model that we have. Wonderful. What's one of the most overused phrases that you have in your AI governance meetings? If I hear we need to get aligned, we need to align on this more.

Starting point is 00:32:14 Right. But you know, it's said for a reason and I think it sort of speaks to that clever nature. That's one that comes to mind. That's great. And then maybe last one, what are you most excited about in the next, I don't know, let's say three months, this world is moving so fast? You know, the pace of innovation, as you just said, is just staggering, is unbelievable.

Starting point is 00:32:33 And sometimes it can feel overwhelming in my space. But what I'm most excited about is how we are building up this emerging, I mentioned this emerging technologies program in my team is is as a sort of formal program is relatively new and, and I really enjoy being able to take a step back and think a little bit more about the future and a little bit more holistically. And I love working with engineering teams and sort of strategic visionaries who are thinking about what we're doing a year from now or five years from now or even 10 years from now. And I get to be a part of those

Starting point is 00:33:06 conversations and that really gives me energy and helps me, helps keep me grounded and not just dealing with the day-to-day and you know various fire drills that you may run. It's thinking strategically and having that foresight about what's to come and it's exciting. Great. Well, Daniel, just thanks so much for being here. I've had such a wonderful discussion with you. And I think the thoughtfulness in our discussion today, I hope resonates with our listeners. And again, thanks to Alta for setting the stage and sharing her really amazing, insightful thoughts here as well. So thank you. Thank you Kathleen. I appreciate it. It's been fun.

Starting point is 00:33:49 And to our listeners, thanks for tuning in. You can find resources related to this podcast in the show notes. And if you want to learn more about how Microsoft approaches AI governance, you can visit microsoft.com slash rai. See you next time!

Microsoft Research Podcast - AI Testing and Evaluation: Learnings from genome editing

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.