Microsoft Research Podcast - The AI Revolution in Medicine, Revisited: The reality of generative AI in the clinic

Starting point is 00:00:00 The workload on healthcare workers in the United States has increased dramatically over the past 20 years, and in the worst way possible. Far too much of the practical day-to-day work of healthcare has evolved into a crushing slog of filling out and handling paperwork. GPT-4 indeed looks very promising as a foundational technology for relieving doctors of many of the most taxing and burdensome aspects of their daily jobs. This is the AI Revolution in Medicine Revisited. I'm your host, Peter Lee. Shortly after OpenAI's GPT-4 was publicly released, Carrie Goldberg, Dr. Zach Kohani, and I published The AI Revolution in Medicine to help educate the world of healthcare and

Starting point is 00:00:54 medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right and what did we get wrong? In this series, we'll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here. and where we go from here. What I read there at the top is a passage from Chapter 2 of the book, which captures part of what we're going to cover in this episode.

Starting point is 00:01:34 In our book, we predicted how AI would be leveraged in the clinic. Some of those predictions I felt were slime dunks. For example, AI being used to listen to doctor-patient conversations and write clinical notes. There were already early products coming out in the world not using generative AI that were doing just that. But other predictions we made were bolder, for instance, on the use of generative AI as a second set of eyes to look over the shoulder of a doctor or nurse or a patient and spot mistakes.

Starting point is 00:02:03 In this episode, I'm pleased to welcome Dr. Chris Longhurst and Dr. Sarah Murray to talk about how clinicians in their respective systems are using AI, their reactions to it, and what's ahead. Chris is the Chief Clinical and Innovation Officer at UC San Diego Health, and he is also the Executive Director of the Joan and Orin Jacobs Center for Health Innovation. He's in charge of UCSD Health's digital strategy, including the integration of new technologies from bedside to bench and reaching across UCSD Health,

Starting point is 00:02:34 the School of Medicine, and the Jacobs School of Engineering. Chris is a board certified pediatrician and clinical informaticist. Sarah is vice president and chief health AI officer at UC San Francisco Health. Sarah is an internal medicine specialist and associate professor of clinical medicine. A doctor, a professor of medicine,

Starting point is 00:02:54 and a strategic health system leader, she builds infrastructure and governance processes to ensure that UCSF's deployment of AI, including both AI procured from companies as well as AI power tools developed in-house, are trustworthy and ethical. I've known Chris and Sarah for years and what's really impressed me about their work and frankly the work of all the guests we'll have on the show is that they've all done something significant to advance the use of AI in healthcare. Here's my

Starting point is 00:03:23 conversation with Dr. Chris Longhurst. Chris, thank you so much for joining us today. Peter, it's a pleasure to be here. I really appreciate it. We're going to get into what's happening in the clinic with AI, but I think we need to find out a little bit more about you first. I introduced you as a person with a fancy title, Chief Clinical and Innovation Officer. What is that exactly, and how do you spend a typical day at work? Well, I have a little bit of a unicorn job because my portfolio includes information

Starting point is 00:04:00 technology and I'm a recovering CIO after spending seven years in that role. It also includes quality patient safety, case management, and the office of our chief medical officer. And so I'm really trying to unify our mission to deliver highly reliable care with these new tools in a way that allows us to transform that care. One good analogy I think is about the game, right? Our job is not only to play the game and win the game using the existing tools, but also to change the game by leveraging these new tools and showing the rest of the country how that can be done.

Starting point is 00:04:33 And so as you're doing that, I can understand you're, of course, you're working at a very kind of senior executive level, but when I've visited you at UCSD Health, you're also working with clinicians, doctors and nurses all the time. In a way, I viewed you as connective tissue between these things. Is that accurate?

Starting point is 00:04:57 Well, sure. We've got several physicians who are part of the executive team who are also continuing to practice. I think that's one of the ways in which doctors on the executive team can bring value is being that connective tissue, being the ears on the ground and a little dose of reality. Well, in fact, that reality is really what I want to delve into.

Starting point is 00:05:19 But I just want to, before getting into that, talk a little bit about AI and your encounters with AI. And I think we have to do it in two stages, because there is AI and machine learning and data analytics prior to the rise of generative AI, and then, of course, after. And so tell us a little bit about what got you into health informatics and AI to begin with.

Starting point is 00:05:47 Well, Peter, I know that you play video games, and I did too for many years. So I was an early John Carmack id software, Castle Wolfenstein and Doom fan. And that kept me occupied because I lived out in the country on 50 acres of almond trees. And so it was computer gaming that first got me into computers, but during medical school, I decided to pursue graduate work in this field called health informatics. And actually my master's thesis was using machine learning to help identify and distinguish innocent from pathologic heart murmurs in children. And I worked with Dr. Nancy Reed at UC Davis who had programmed using Lisp, a really fancy

Starting point is 00:06:29 tool to do exactly that. And I will tell you that if I never see another parentheses in Lisp code again, it'll be too soon. So I spent a solid year on that. No, no, but you should wear that as a badge of honor. And I will guess that no other guest on this podcast series will have programmed in Lisp. So kudos to you. Well, it was a lot of work and I learned a lot, but as you can imagine, it wasn't highly

Starting point is 00:06:55 successful at the time. And fast forward, we've had lots of traditional machine learning kind of activities using discrete data for predictive analytics to help predict flow in the hospital and even sepsis, which we can talk about. But as you said, the advent of generative AI in the fall of 2022 was a real game changer. Well, you have this interest in technology. And in fact, I do know you as a fairly intensely geeky person. Really, I think maybe that's one reason why we've been attracted to each other. But you also got drawn into medicine. Where did that come from? So my father was a practicing cardiologist and scientist. He was MD-PhD trained, and he really shared with me both a love of medicine but also science.

Starting point is 00:07:45 I worked in his lab for three summers and it was during college I decided I wanted to apply to medical school because the human side of the science really drew me in. But my father was the one who really identified it was important to cross train and that's why I decided to take time off to do that master's degree in health informatics and see if I could figure out how to take two disparate fields and Really combine them into one. I actually went down to Stanford to become a pediatrician because they have a standalone children's hospital It's one of the best in the country and I still practice Pediatrics and see newborns and it's a passion for me and part of my identity

Starting point is 00:08:21 Yeah, I'm just endlessly Fascinated and impressed with people who can span these two worlds in the way that you've done. So now, 2022 in November, CHAT GPT gets released to the world. And then a few months later, GPT-4. And then, of course, in the last two years, so much has happened. But what was your first encounter with what we now know of as generative AI?

Starting point is 00:08:49 So, I remember when ChatGPT was released and some of my computer science type of nerd friends, we were on text threads with a lot of mind-blowing emojis. But when it really hit medicine was when I got a call right after Thanksgiving in 2022 from my colleague. He was playing with ChatGPT and he said to me, Chris, I've been feeding it patient questions and you wouldn't believe the responses. And he emailed some of the examples to me and my mind was blown. And so that's when I became one of the reviewers on the paper that was published in April of 2023 that showed not only could chat GPT help answer questions from patients in a high quality way, but it also expressed a tremendous amount of empathy. And in fact, in our review, the clickbait headlines that came out of the paper were that the chat bot was both higher quality and more empathetic than doctors.

Starting point is 00:09:46 But that wasn't my takeaway at all. In fact, I'll take my doctors any day and put them against your chat bot. If you give them an hour to Google and construct a really long, thoughtful response, to me, part of the takeaway was that this was really an opportunity to improve efficiency and save time. And so I called up our colleagues at Epic, I think it was right around December of 2022. And I said, Sumit, have you seen this? I'd like to share some results with you. And I showed them the data from our paper

Starting point is 00:10:16 before we had actually had it published. And he said, well, that's great because we're working with Peter Lee and the team at Microsoft to integrate GPT into Epic. And so of course, that's how we became one of the first two sites in the country to roll out GPT inside our electronic health record to help answer or help draft answers to patient questions. You know, one thing that's worth emphasizing in the story that you've just told is that

Starting point is 00:10:43 there is no other major health system that has been confronting the reality of generative AI longer than UC San Diego Health and I think largely because of your drive and early adoption and many listeners of this podcast will know what EPIC is but many will not and so it's worth worth saying that Epic is a very important creator of electronic health records system. And of course, UC San Diego Health uses Epic to store all of the clinical data for its patients. And then Sumit is of course Sumit Rana,

Starting point is 00:11:22 who is president at Epic. So in partnership with Epic, we decided to tackle a really important challenge in healthcare today, which is particularly since the pandemic and the increase in virtual and telehealth care, our clinicians get more messages than ever from patients. But answering those asynchronous messages is an unreimbursed, non-compensated activity

Starting point is 00:11:46 that can often take time after hours, what we call pajama time for our doctors. And in truth, health systems that have thought through this, most of the answers are not actually generated by the doctors themselves. Many times it's mid-level providers, protocol schedulers, other things, because the questions can be about anything from rescheduling an appointment to a medication refill. They don't all require doctors. When they do, it's a more complicated question and sometimes can require a more complicated answer. And in many cases, the clinicians will see a long complex question and rather than typing an answer, they'll say, you know, this is complicated. Why don't you schedule a visit with me so we can talk about it more?

Starting point is 00:12:30 Yeah. So now you've made a decision to contact people at Epic to what posit the idea that AI might be able to make responding to patient queries easier, is that the story here? That's exactly right. And Sumit knew well that this is a challenge across many organizations. This is not unique to UC San Diego or Stanford. And there's been a lot of publications about it. It's even been in the lay press.

Starting point is 00:13:02 So our hypothesis was that using GPT to help draft responses for doctors would save them time, make it easier, and potentially result in higher quality more empathetic answers to patients. And so now the thing that I was so impressed with is you actually did a carefully controlled study to try to understand how well does that work. So tell us a little bit first about the results of that study, but then how you set it up. Sure. Well, first, I want to acknowledge something

Starting point is 00:13:36 you said at the beginning, which is one of my hats is the executive director of the Joan and Erwin Jacobs Center for Health Innovation. And we're incredibly grateful to the Jacobs for their gift, which has allowed us to not only implement AI as part of hospital operations, but also to have resources that other health systems may not have to be able to study outcomes.

Starting point is 00:13:59 And so that really enabled what we're going to talk about here. Right. By the way, one of the things I was personally so fascinated by is, of course, in our book, we speculated that things like after visit notes to patients, responding to patient queries might be something that happens. And you, at the same time we were writing the book,

Starting point is 00:14:25 were actually actively trying to make that real, which is just incredible. And for me, and I think my co-author is pretty affirming. I think you guys were really prescient in your vision. The book is tremendous. I have a signed copy of Peter's book, and I recommend it for all your listeners. All right, so now what have you found about generative AI?

Starting point is 00:14:45 Well, first, to understand what we found, you have to understand how we built it. And so Stanford and UC San Diego really collaborated with Epic on designing what this would look like. So, Doctor gets that patient message. We feed some information to GPT that's not only the message, but also some information about the patient, their problems and medications and past medical and surgical history and that sort of thing. message, but also some information about the patient, their problems and medications and past medical and surgical history and that sort of thing. Is there a privacy concern that patients should be worried about when that happens? Yeah, that's a really good question.

Starting point is 00:15:14 There's not because we're operating in partnership with Epic and Microsoft in a HIPAA compliant cloud. And so that data is not only secure and private, but that's our top priority is keeping it that way. Great. So once we feed that into GPT, of course, we very quickly get a draft message that we could send to a patient. But we chose not to just send that message to a patient. So part of our AI governance is keeping a human in the loop, and there's two buttons that allow that clinician to review the message. One button says, edit draft message,

Starting point is 00:15:52 and the other button says, start new blank message. So there's no button that says just send now, and that really is illustrative of the approach that we took. The second thing, though, that we chose to do, I think, is really interesting from a conversation standpoint, is that our AI governance, as they were looking at this, said, you know, AI is new and novel, it can be scary to patients, and if we want to maximize trust with our patients, we should maximize transparency. And so anytime a clinician uses the button that says edit draft response, we automatically append something in the message that says This message was automatically generated and reviewed and edited by your doctor

Starting point is 00:16:34 We felt strongly that was the right approach and we've had a lot of positive feedback And so we'll want to get into you know, how good these messages are whether they're issues with bias or hallucination. But before doing that, on this human in loop, this was another theme in our book. And in fact, we recommended this. But there were other health systems around the country that were also later experimenting with similar ideas. And they some have taken different approaches. In fact, you know, it's, it's a very, very, very around the country that were also later experimenting with similar ideas. And some have taken different approaches. In fact, as time has gone on, if anything, it seems like it's become a little bit less clear,

Starting point is 00:17:15 this sort of labeling idea. Has your view on this evolved at all over the last two years? First of all, I'm glad that we did it. I think it was the right choice for University of California. And in fact, the other four UC sites are all doing this as well. There is variability across the organizations that are using this functionality.

Starting point is 00:17:35 And as you suggest, there's tens of thousands of physicians and hundreds of thousands, if not millions, of patients receiving these messages. And it's been highlighted a bit in the press. I can tell you that talking about our approach to transparency, one of our lawmakers in the state of California heard about this and actually proposed a bill that

Starting point is 00:17:52 was signed into legislation by our governor so that effective January 1, any communication with patients that uses AI has to be disclosed with those patients. And so there is some thought that this is perhaps the right approach. I don't think that it's a perfect approach though. We're using AI in more and more ways, and it's not as if we're gonna be able to disclose

Starting point is 00:18:13 every single time that we're doing it to prioritize, you know, scheduling for the sickest patients or to help operationally on billing or something else. And so I think that there are other ways we need to figure it out. But we have called on national societies and others to try to create some guidelines around this because we should be as transparent as we can with our patients. Obviously, one of the issues, and we highlighted this a lot in our book, is the problem of hallucination.

Starting point is 00:18:48 book is the problem of hallucination. And surely this must be an issue when you're having AI draft these notes to patients. What have you found? We were worried about that when we rolled it out. And what we found is not only were there very few hallucinations, in some cases our doctors were learning from the GPT. And I can give you an example when a patient who had had a visit wrote their doctor afterwards and said, Doc, I've been thinking a lot about what we discussed and quitting smoking marijuana. And the GPT draft reply said something to the effect of,

Starting point is 00:19:20 that's great news, here's a bunch of evidence on how smoking marijuana can harm your lungs and cause other effects. And by the way, since you live in the state of California, here's the marijuana quitters helpline. And the doctor who was sitting there called me up to tell me about it. And I said, well, is there a marijuana quitters helpline in the state of California?

Starting point is 00:19:42 And he said, I didn't know. So I Googled it. And yeah, there is. And so that's an example of the GPT actually having more information than a primary care clinician might have. And so there are cases clearly where the GPT can help us increase the quality.

Starting point is 00:20:00 In addition, some of the feedback that we've been getting both anecdotally and now measuring is that these draft responses do carry that tone of empathy that Dr. Ayers and I saw in the original manuscript. And we've heard from our clinicians that it's reminding them to be empathetic, because you don't always have that time when you're hammering out a quick short message, right? I think the thing that we've observed and we've discussed this also is exactly that reminding thing. You know, there might be in the encounter between a doctor and patient, maybe a conversation about, I don't know, about going to a football game for

Starting point is 00:20:36 the first time. That could be part of the conversation, but in a busy doctor's life when writing a note, you might forget about that. And of course, an AI has the endless ability to remember that it might be friendly to send well wishes. Exactly right, Peter. In fact, one of the findings in Dr. Errer's manuscript that didn't get as much attention but I think is really important was the difference in length between the responses. So I was one of the putatively blinded reviewers, but as I was looking at the questions and

Starting point is 00:21:14 answers it was really obvious which ones were the chat bot and which ones were the doctors, because the chat bot was always three or four paragraphs and the doctor was three or four sentences, right? It's about time. And so we saw that in the results of our study. All right, so now let's get into those results. OK. Well, first of all, my hypothesis

Starting point is 00:21:35 was that this would help us save time. And I was wrong. It turns out a busy primary care clinician might get about 30 messages a day from patients, and each one of those messages might take about 30 seconds to type a quick response, a two-sentence response, a dot phrase, a macro. Your labs are normal, no need to worry. I'll call you if anything comes up.

Starting point is 00:22:01 After we implemented the AI tool, it still took about 30 seconds per message to respond. But we saw that the responses were two to three times longer on average, and they carried a more empathetic tone. And our physicians told us it decreased cognitive burden, which is not surprising because any of you who have written know that it's much easier to edit somebody else's copy than it is to face a blank screen, right? That's why I like to be senior author, not lead author. And so the tool actually helped quite a bit, but it didn't help in the ways that we had

Starting point is 00:22:38 expected necessarily. There are some other sites that have now found a little bit of time savings, but it's really nominal overall. The Stanford study that was done at the same time, and we actually had some shared co-authors, measured physician burnout using a validated survey, and they saw a decrease in measured physician burnout. And so there are clear advantages to this, and we're still learning more. In fact, we've now rolled this out not only to all of our physicians, but to all of our nurses who help answer those messages in many different

Starting point is 00:23:10 clinics. And one of the things that we're finding, and Dr. C.T. Lin at University of Colorado recently published, is that this tool might actually help those mid-level providers even more, because it's really good at protocolized responses. I mentioned at the beginning, some of the questions that come to the physicians may be more the edge cases that require a little bit less protocolized kind of answers. And so as we get into academic subspecialties like gynecology, oncology, the GPT might not be dishing up a draft message that's quite as useful. But if you're a nurse in obstetrics and you're getting very routine pregnancy questions,

Starting point is 00:23:48 it could save a ton of time. And so we've rolled this out broadly. I want to acknowledge the partnership with Seth Hain and the team at Epic who've just been fantastic. And we're finding all sorts of new ways to integrate the GPT tools into our electronic health record as well. Yeah, certainly the doctors and nurses

Starting point is 00:24:04 that I've encountered that have access to this feature, they just, they don't want to give it up. But it's so interesting that it actually doesn't really save time. Is that a problem? Because of course, there seems to be a workforce shortage in healthcare, need to lower costs and have greater

Starting point is 00:24:27 efficiencies. How do you think about that? Great question. There are so many opportunities, as you've kind of mentioned. I mean, health care is full of waste and inefficiency, and I'm super bullish on how these generative AI tools are going to help us reduce some of that inefficiency. Everything from revenue cycle to our call centers to operations efficiency, I think, can be positively impacted. Those things make more resources available for clinicians and others. When we think about saving clinicians time, I don't think it's necessarily communicating

Starting point is 00:25:01 with patients where you want to save that time actually. I think what we want to do is we want to offload some of those administrative tasks that take a lot of time for our physicians. So we've measured pajama time in our doctors. And on average, a busy primary care clinician can spend one to two hours after clinic doing things. But only about 15 minutes is answering messages from patients. Actually, the bulk of the time after hours is documenting the notes that are required from those visits.

Starting point is 00:25:34 And those notes are used for a number of different purposes, not only communicating to the next doctor who sees the patient, but also for billing purposes and compliance purposes and medical legal purposes. So another really exciting area is AI scribes. Yeah. And so we'll get into scribes and actually other possibilities. I wonder though about this empathy issue because as computer scientists,

Starting point is 00:25:59 we know that you can fall into traps if you anthropomorphize these AI systems or any machine. So in this study, how was that measured and how real do you think that is? So in the study, you'll see anecdotal or qualitative evidence about empathy. We have a follow-up study that will be published soon where we've actually measured empathy using some more quantitative tools. And there is no doubt that the chatbot-generated drafts are coming through with more empathy. And we've heard this from a number of our doctors, so it's not surprising. Here's one of the more surprising things, though. I published a paper last year with Dr. Sally Baxter, one of our ophthalmologists, and she

Starting point is 00:26:42 actually looked at messages with a negative tone. It turns out, not surprisingly, healthcare can be frustrating, and stressed patients can send some pretty nasty messages to their care teams. And you can imagine being a busy, tired, exhausted clinician and receiving a bit of a nasty gram from one of your patients

Starting point is 00:27:05 can be pretty frustrating. And the GPT is actually really helpful in those instances in helping draft a pretty empathetic response when I think the human instinct would be a pretty nasty one. I should probably use it in my email, Peter. And is the patient experience, the actually lived experience of patients when they receive these notes, are you absolutely convinced and certain that they are also benefiting from this empathetic tone? I am.

Starting point is 00:27:37 In fact, in our paper, we also found that the messages going to patients that had been drafted with the AI tool were two to three times longer that the messages going to patients that had been drafted with the AI tool were two to three times longer than the messages going to patients that weren't using the drafts. And so it's clear there's more content going, and that content is either contributing to a greater sense of empathy and relationship among the patients as well as the clinicians, and or in some cases that content may

Starting point is 00:28:05 be educating the patients or even reducing the need for follow-up visits. Yeah, so now I think an important thing to share with the audience here is health care, of course, is a very highly regulated industry for good reasons. There are issues of safety and privacy that have to be guarded very, very carefully and thoroughly. And for that reason, clinical studies oftentimes

Starting point is 00:28:32 have very carefully developed controls and randomization setups. And so to what extent was that done in this case? Because here, it's not like you're testing a new drug. It's something that's a little fuzzier, isn't it? Yeah, that's right, Peter. And credit to the lead author, Dr. Ming-Tai Seale, we actually did randomize. And so that's unusual in these types of studies. We actually got IRB exemption to do this as a randomized QI study. And it was a crossover study because all the doctors wanted the functionality. So what we tested was the early adopters

Starting point is 00:29:11 versus the late adopters. And we compared at the same time the early adopters to those who weren't using the functionality, and then later the late adopters to the folks that weren't using the functionality. And in that type of study, you might also, depending on how the randomization is set up, also have to have doctors some days using it

Starting point is 00:29:29 and some days not having access. Did that also happen? It did, but it wasn't on a day-to-day basis. It was more a month-to-month basis. And what kind of conversation do you have with a doctor that might be attached to a technology and then be told for the next month you don't get to use it.

Starting point is 00:29:46 The good news is because of a doctor's medical training, they all understood the need for it and the conversation was sort of, hey, we're going to need you to stop using that for a month so that we can compare it, but we'll give it back to you afterwards. Okay, great. All right, so now we made some other predictions. So we talked about responding to patient. You briefly mentioned clinical note taking. We also made guesses about other types of paperwork,

Starting point is 00:30:13 filling out prior authorization requests or referral letters, maybe for a doctor to refer to a specialist. We even made some guesses about a second set of eyes on medications, on various treatment options, diagnoses. What of these things have happened and what hasn't happened at least in your clinical experience? Your guesses were spot on. And I would say almost all of them have already happened and are happening today at UC San Diego and many other health systems. We have a HIPAA compliant GPT instance that can be used for things like generating patient letters, generating referral letters, even generating patient education with

Starting point is 00:30:57 patient-friendly language. And that's a common use case. The second set of eyes on medications is something that we're exploring but have not yet rolled out. One of the areas I'm really excited about is reporting. So Johns Hopkins did a study a couple years ago that showed an average academic medical center our size spends about $5 million annually just reporting on quality measures that are regulatory requirements. And that's about accurate for us.

Starting point is 00:31:26 We published a paper just last fall showing that large language models could help to pre-populate quality data for things like sepsis reporting in a really effective way. It was like 91% accurate. And so that's a huge time savings and efficiency opportunity, again, allows us to redeploy those quality staff. We're now looking at things like how do we use large language models to review charts for peer review to help ensure ongoing accuracy and mitigate risk. I'm really passionate about the whole space of using AI to improve quality and patient safety in particular.

Starting point is 00:32:00 Your readers may be familiar with the famous report 1999 to air as human that suggests 100,000 Americans die on an annual basis from medical errors. And unfortunately, the data shows we really haven't made great progress in 25 years. But these new tools give us the opportunity to impact that in a really meaningful way. This is a turning point in healthcare. Dr. Kahn Yeah, medication errors, actually all manner of medical errors, I think, has been just such a frustrating problem. And I think this gives us some new hope.

Starting point is 00:32:34 Well, let's look ahead a little bit. And just to be a little bit provocative, one question that I get asked a lot by both patients and clinicians is, will AI replace doctors sometime in the future? What are your thoughts? So the Pat response is AI won't replace doctors, but AI will replace doctors who don't use AI. And the implication there, of course,

Starting point is 00:32:59 is that a doctor using AI will end up being a more effective practitioner than a doctor who doesn't. And I think that's absolutely true from a medical legal standpoint. What is standard of care today? And what is standard of care five or 10 years from now will be different.

Starting point is 00:33:16 And I think there will be a point where doctors who aren't using AI regularly would almost be unconscionable. Yeah, I think there are already some areas where we've seen this happen. My favorite example is with the technology of ultrasound, where if you're a gynecologist or some part of internal medicine, there are some diagnostic procedures

Starting point is 00:33:39 where it would really be malpractice not to use ultrasound. Whereas in the late 1950s, the safety and also the doctor training to read ultrasound images were all called into question. And so let's look ahead two years from now, five years from now, 10 years from now. And on those three timeframes, what do you think, based on the practice of medicine today, what doctors and nurses are doing in clinic every day today, what do you think the biggest differences will be

Starting point is 00:34:14 two years from now, five years from now, and 10 years from now? Great question, Peter. So first of all, 10 years from now, I think that patients will be still coming to clinic, doctors will still be seeing them. Hopefully we'll have more house calls and care occurring outside the clinic with remote monitoring and things like that.

Starting point is 00:34:34 But the most important part of healthcare is the humanism. And so what I'm really excited about is AI helping to restore humanism in medical care, because we've lost some of it over the last 20, 30 years as health care has become more corporate. So in the next two to five years, some things I expect to see is AI baked into more workflows. So AI scribes are going to become incredibly commonplace. I also think that there are huge opportunities to use those scribes to help reduce errors in diagnosis. So five or seven years from now, I think that when you're speaking to your physician about your symptoms and other things, the scribe is going to be developing a differential diagnosis and helping recommend not only the right follow-up tests

Starting point is 00:35:27 or imaging, but even the physical exam findings that the doctor might want to look for in particular to help make a diagnosis. Dirty secret in health care, Peter, is that 50% of doctors are below average. That's just math, and I think that the AI can help raise all of our doctors. So it's like Lake Wobegon, they're all above average.

Starting point is 00:35:48 It has important implications for the workforce, as you were saying. Do we need all visits to be with primary care doctors? Will mid-level providers augmented by AI be able to do as great a job as many of our physicians do? I think these are unanswered questions today that need to be explored. And then there was a really stimulating editorial in New York Times recently by Dr. Eric Topol. And he was waxing philosophic about recent study

Starting point is 00:36:17 that showed AI could interpret X-rays with 90% accuracy. And radiologists actually achieve about 72% accuracy. The study looked at how do the radiologists do with AI? Working together. And they got about 74% accuracy. So the doctors didn't believe the AI. They thought that they were in the right. And the inference that Eric took that I agree with

Starting point is 00:36:41 is that rather than always looking for ways to combine the two, we should be thinking about those tasks that are amenable to automation that could be offloaded with AI so that our physicians are focused on the things that they're great at, which is not only the humanism and health care, but a lot of those edge cases we talked about. So let's take mammogram screening as an example, chest XA screening.

Starting point is 00:37:02 There's going to be a point in the next five years where all first reads are being done by AI, and then it's a subset of those that are positive that need to be reviewed by physicians. And that helps free up radiologists to do a lot of other things that we need them to do. Wow. That is really just such a great vision for the future.

Starting point is 00:37:22 And I call some of this the flip, you know, where even patient expectations on the use of technology flips from fear and uncertainty to, you know, you would try to do this without the technology. And I think you just really put a lot of color and detail on that. Well, Chris, thank you so much for this. On that groundbreaking paper from April 2023, we'll put a link to it. It's a really great thing to read. And of course, you've published

Starting point is 00:37:52 extensively since then. But I can't thank you enough for just all the great work that you're doing. It's really changing medicine. Peter, can't thank you enough for the opportunity to be here today and the partnership with Microsoft to make this all possible I always love talking to Chris because he really is a prime example of an important breed of doctor a Doctor who has clinical experience, but is also a world-class tech geek. You know, it's surprising to me, and pleasantly so, that the traditional gold standard of randomized trials that Chris has employed can be used to assess the viability of generative AI, not just for things like medical diagnoses, but even for seemingly mundane things like writing email notes to patients. The other surprise is that the use of AI, at least in the in-basket task, which involves doctors

Starting point is 00:38:51 having to respond to emails from patients, doesn't seem to save much time for doctors, even though the AI is drafting those notes. Doctors seem to love the reduced cognitive burden, and patients seem to appreciate the greater detail and friendliness that AI provides, but it's not yet a big time saver. And of course, the biggest surprise out of the conversation with Chris was his celebrated paper back two years ago now on the idea that AI notes are perceived by patients as being more empathetic than notes written by human doctors. Wow. Let's go ahead to my conversation with Dr. Sarah Murray.

Starting point is 00:39:38 Sarah, I'm thrilled you're here. Welcome. Thank you so much for having me. You have actually a lot of roles. And I know that's not so uncommon for people at the leading economic medical institutions. But I think for our audience, understanding what a chief health AI officer does,

Starting point is 00:39:58 an associate professor of clinical medicine, what does it all mean? And so to start, when you talk to someone say, like your parents, how do you describe your job? You know, how do you spend a typical day at work? So first and foremost, I do always introduce myself as a physician because that's how I identify that's the that's how I trained. But in my current role, I, as the chief health AI officer, I'm really responsible for the vision and strategy for how we use trustworthy AI at scale to solve the biggest problems

Starting point is 00:40:34 in our health system. And so I think there's a couple key important points about that. One is that we have to be very careful that everything we're doing in healthcare is trustworthy, meaning it's safe, it's ethical, it's doing what we hope it's doing, and it's not causing any unexpected harm. And then, second, we really wanna be doing things that affect the population at large

Starting point is 00:41:01 of the patients we're taking care of. And so I think if you look historically at what's happened with AI in healthcare, you've seen little studies here and there, but nothing broadly affecting or transforming how we deliver care. And I think now that we're in this generative AI era, we have the tools to start thinking about how we're doing that. And so that's part of my role. And I'm assuming a chief health AI officer is not a role that has been around for a long time. Is this fairly new at UCSF or has this particular job title been around? No, it's a relatively new role actually. I

Starting point is 00:41:40 came into this role about 18 months ago. I am the first chief health AI officer at UCSF. And I actually wrote the paper defining the role with Dr. Ashley Beezy, Dr. Chris Longhurst, Dr. Karen Deepsingh, and Dr. Bob Wachter, where we discuss what is this role in health care, why do we actually need it now, and what is this person accountable for? And I think it's very important that as we roll these technologies out in health systems, we have someone who's really accountable for thinking about, you know, whether we're selecting the right tools and whether they're being used in the right ways to impact our patients. It's so interesting because I would say in the old days, like five years ago, information technology in a hospital or health system setting might be under the control

Starting point is 00:42:34 and responsibility of a chief information officer, a CIO or an IT chief. Or if it's maybe some sort of medical device technology integration, maybe it's some engineering type of leader, a chief technology officer. But you're different. And in fact, the role that I think I would credit you with sort of making the blueprint for seems different because it's actually doctors practicing clinicians who tend to inhabit these roles. Is there a reason why it's different that way?

Starting point is 00:43:10 Like a typical CIO is not a clinician. Yeah. So I actually I report to our CIO and I think that there's a recognition that you need a clinician who really understands and practices how the tools can be deployed effectively. So it's not enough to just understand the technology, but you really have to understand the use cases. And I think when you're seeing physician, chief health AI officers pop up around the country, it's because they're people who both understand the technology, not to the level you do, obviously, but to some sufficient level, and then understand how to use these tools in clinical care

Starting point is 00:43:51 and where they can drive value and what the risks are in clinical care and that type of thing. And so I think it'd be hard for it not to be some type of clinician in this role. So I'm going to want to get into what's really happening in clinic. And before that, I've been asking our guests about their stages of AI grief, as I like to put it.

Starting point is 00:44:15 And for most people, I've been talking about the experiences and encounters with machine learning and AI before CHAT GPT and then afterwards. And so can you tell us a little bit about how did you get into AI in the first place and what were your first encounters like? Yeah. So I actually started out as a health services researcher and this was before we had electronic health records when we were still writing our notes on carbon copy in the elevators. And a lot of the data we used was actually from claims data, and that was the kind of rich data source at the time, but as you know, that was very limited.

Starting point is 00:45:00 And so when we went live with our electronic health record, I realized there was this tremendous opportunity to really use rich clinical data for research. And so I initially started collaborating with folks down at Stanford to do machine learning, to identify rare diseases like lupus and the electronic health record, but quickly realized there was this real gap in the health system for using data in an actionable

Starting point is 00:45:25 way. And so I built what was our initially our advanced analytics team grew into our data science team and is now our health AI team as our ability to use the data in more sophisticated ways evolved. But if we think about just the pre-generative era and my first encounter with AI or at least AI deployment in healthcare, you know, we initially, gosh, it was probably eight or nine years ago where we got access through our EHR vendor to some initial predictive tools. And these are relatively simple tools, but they were predicting things we care about in healthcare, like who's not going to make it to a clinic visit or how long patients are going to stay in the hospital.

Starting point is 00:46:05 And so there's a lot of interest in predicting who might not make it to a clinic visit, because we have big access issues with it being difficult for patients to get appointments. And the idea was that if you knew who wouldn't show, you could actually put someone else in that slot, and it's called overbooking. And so when we looked at the initial model,

Starting point is 00:46:27 it was striking to me how risky it was for vulnerable patient populations, because immediately it was obvious that this model was likely to overbook people by race, by body weight, by things that are clearly protected patient characteristics. And so we did a lot of work initially with that model and a lot of education around how these tools could be biased. But the risk existed. And as we continued to look at more of these models, we found there were a lot of issues

Starting point is 00:47:00 with trustworthiness. You know, there was a length of stay prediction model that MIT was able to outperform with a pair of dice. And when I talked to other systems about not implementing this model, folks said, but it must be useful a little bit. And I was like, actually, if the dice is better, it's not useful at all. And so there was very little out there to frame this, but we quickly realized we have to start putting something together because there's a lot of hype and there's a lot of hope, but there's also a lot of risk here. And so that was my pre-generative moment. You know, just before

Starting point is 00:47:36 I get to your post-generative moment, this story that you told, I sometimes refer to it as story that you told, I sometimes refer to it as the healthcare IT world's version of irrational exuberance. Because I think one thing that I've learned and I have to say I've been guilty personally as a techie, you look at some of the problems that the world of healthcare faces and to a techie first encountering this, a lot of it looks like common sense. Of course, we can build a model and predict these things. And you sort of don't understand some of the realities, as you've described, that make this complicated. And at the same time, from healthcare professionals, I sometimes think they look at all of this dazzling machine learning magic and also are kind of overly optimistic that it can

Starting point is 00:48:27 solve so many problems. And it does create this danger, this irrational exuberance that both sides kind of get into a reinforcing cycle where they're too quick to adopt technologies without thinking through the implications more carefully. I don't know if that resonates with you at all. Yeah, oh totally. I think there's a real educational opportunity here because it's the you don't know what you don't know phenomenon. And so I do think there is a lot of work in healthcare to be done around people understanding the strengths and limitations of these tools because they're not magic,

Starting point is 00:49:05 but they are perceived to be magic. And likewise, I think the tech world often doesn't understand how healthcare is practiced and doesn't think through the risks in the same way we do, right? So I know that some of the vulnerable patients who might have been overbooked by that algorithm are the people who I most need to see in clinic and are the people who would be most slighted

Starting point is 00:49:29 if that they show up and the other patient shows up and now you have been overworked clinician. But I just think those are stages further down the pathway of utilization of these algorithms that people don't think of when they're initially developing them. And so one of the things we actually think, you know, require in our AI oversight process is when folks come to the table with the tool,

Starting point is 00:49:54 they have to have a plan for how it's going to be used and operationalized. And a lot of things die right there, honestly, because folks have built a cool tool, but they don't know who's going to use it in a clinic, who the clinical champions are, how it'll be acted on. And you can't really evaluate whether these tools are trustworthy unless you've thought through all of that. Because you can imagine using the same algorithm in dramatically different ways, right? If you're using the no-show model to do targeted outreach and send people a free lift, if they have transportation issues, that's going to have very different outcomes in overbooking folks. It's so interesting, and I'm going to want to get back to this topic

Starting point is 00:50:32 because I think it also speaks to the challenges of how do you integrate technologies into the daily workflow of a clinic. And I know this is something you think about a lot. But let's get back now to my original question about your AI moments. So now November 2022, chat GPT happens. And what is your encounter with this new technology? Yeah, so I used to be on med Twitter.

Starting point is 00:51:02 I still am, actually. It's just not as active anymore. But I would say, you know, med Twitter. I still am actually. It's just not as active anymore. But I would say, you know, med Twitter went crazy after Shout GPT was initially released. And it was largely filled with catchy poems. And people, you know, having fun. Yeah, exactly. I still use poems and people having fun trying to make it hallucinate.

Starting point is 00:51:26 And so, you know, I went, I was guilty of that as well. And so one of the things I initially did was I asked it to do something crazy. So I asked it, draft me a letter for a prior authorization request for a drug called a Pixaban, which is a blood thinner to treat insomnia. And if you practice clinical medicine, you know that we would never use a blood thinner to treat insomnia. But it wrote me such a compelling letter that I actually went back to PubMed and I made sure that I wasn't missing anything, like some unexpected side effect. I wasn't missing anything and in fact it was hallucination.

Starting point is 00:52:06 So at that moment I said this is very promising technology but this is still a party trick. A few months later I went and did the exact same prompt and I got a lecture instead of a draft about how it would be unethical and unsafe for me to draft such a request. And so I realized these tools were rapidly evolving, and the game was just gonna be changing very quickly. I think the other thing that we've never seen before is the deployment of a technology at scale, like we have with AI Scribes. So this is a technology that scale like we have with AI Scribes. So this is a technology

Starting point is 00:52:45 that was in its infancy, you know, two years ago, and is now largely a commodity deployed at scale across many health systems, a very short period of time. There's been no government incentives for people to do this. And so it clearly works well enough to be used in clinics. And I think these tools, you know, like AI Scribes have the opportunity to really undo a lot of the harm that the electronic health record implementations were perceived to have caused. What is a Scribe, first off?

Starting point is 00:53:20 Yeah, so AI Scribes, or as we're now calling them, AI Assistants or as we're now calling them AI assistants or ambient assistants are tools that essentially listen to your clinical interaction. We record them with the permission of a patient with consent and then they draft a clinical note and they can also draft other things like the patient instructions and the idea is those drafts are very helpful to clinicians and they can they have to review them and edit them, but it saves a lot of the furious typing that was previously happening during patient encounters.

Starting point is 00:53:55 We have been talking also to Chris Longhurst, your colleague at UC San Diego, and he mentions also the importance of having appropriate billing codes in those notes, which is yet another burden. Of course, when Kerry, Zach, and I wrote our book, we predicted that AI scribes would get better and would find wider use because of the improvement in technology. Let me start by asking, do you yourself use an AI scribe?

Starting point is 00:54:28 So I do not use it yet because I'm an inpatient doctor and we have deployed them to all ambulatory clinic doctors because that's where the technology is tried and true. So we're looking now to deploy it in the inpatient setting but we're doing very initial testing. And what are the reasons for not integrating it into the inpatient setting, but we're doing very initial testing. And what are the reasons for not integrating it into the inpatient setting? Well, there's two things, actually.

Starting point is 00:54:51 Most inpatient documentation work, I would say, is follow-up documentation. And so you're often taking your prior notes and making small changes to it as you change the care from day to day. And so the tools are just – all of the companies are working on this, but right now they don't really incorporate your prior documentation or note when they draft your note for today. The second reason is that a lot of the decision making that we do in the inpatient setting is asynchronous with the patient. So we'll often have a conversation in the morning

Starting point is 00:55:26 with the patient in their room, and then I'll see some labs come back, and I'll make decisions and act on those labs and give the patient a call later and let them know what's going on. And so it's not a very succinct encounter. And so the technology is gonna have to be a little bit different to work in that case, I think.

Starting point is 00:55:43 Right, and so these are distinct workflows from the ambulatory setting where it is the classic you're sitting with a patient in an exam room, having an encounter. Exactly. And all your decisions are made there. And I would say it's also different from nursing. We're also looking at deploying these tools to nurses. But a lot of their documentation is in something called flow sheets. They write in columns, you know, specific numbers. And so for them to use these tools, they'd have to start saying to the patient,

Starting point is 00:56:13 sounds like your pain is a five, your blood pressure is 120 over 60. And so those are different workflows they'd have to adopt to use the tools. So you've been in the position of having to oversee the integration of AI scribes into UCSF Health. From your perspective, how were clinical staff actually viewing all of this? So I would say clinical staff are largely very excited, receptive, and would like us to move faster. And in fact, I gave a town hall to UCSF and all of the comments were, when is this coming for APPs? When is this coming for allied health professionals?

Starting point is 00:56:56 And so people want this across healthcare, it's not just doctors. But at the same time, you know, I think there's a technology adoption curve. And about half of our ambulatory clinicians have signed up, and about a third of them are now using the tool. And so we are now doing outreach to figure out who is not using it, why aren't they using it, and what can we do to increase adoption, or are there true barriers that we need to help folks overcome?

Starting point is 00:57:26 And when you do these things, of course, there are risks. And as you were mentioning several times before you, we're really concerned about hallucinations, about trustworthiness. So what were the steps that you took at UCSF to make these integrations happen? Yeah. So we have an AI oversight process for all tools that come into our healthcare with AI,

Starting point is 00:57:55 regardless of where they're coming from. So industry tools, internally developed tools, and research tools come through the same process. And we have a committee that is quite multidisciplinary. We have health system leaders, data scientists, bioethicists, researchers, health equity experts. And through our process, we break down the AI life cycle to a couple key places where these tools come for committee review.

Starting point is 00:58:22 And so for every AI deployment, we expect people to establish performance metrics, fairness metrics, and we help them with figuring out what those things should be. We were also fortunate to receive a donation to build a AI monitoring platform, which we're working on now at UCSF. We call it our Impact Monitoring Platform for AI and Clinical Care, IMPACT. And AI Scribes is actually our first use case. And so on that platform, we have a metric adjudication process where we've established, you know, what do we really care about for our health system executive leaders? What do we really care about for, you know, what do we really care about for our health system executive leaders? What do we really care about for, you know, ensuring safety and trustworthiness? And then,

Starting point is 00:59:10 you know, what are our patients going to want to know? Because we want to be also, we want to also be transparent with our patients about the use of these tools. And so we have processes for doing all this work. I think the challenge is actually how we scale these processes as more and more tools come through because as you could imagine a lot of conversation with a lot of stakeholders to figure out what and how we measure things right now. And so there's so much to get into there but I actually want to zoom in on the actual experience that doctors, nurses, and patients are having. And, you know, do you find that AI is meeting expectations? Is it making a difference, positive or negative, in people's lives? And what kinds of potential surprises are people encountering?

Starting point is 01:00:00 Mm-hmm. So we're collecting data in a couple of ways. We're first surveying clinicians before and after their experience, and we are hearing from folks that they feel like their clinic work is more manageable, that they're more able to finish their documentation in a timely fashion. And then we're looking at actual metrics that we can extract from the EHR around how long people are spending doing things, and that data is largely aligning with what people are reporting, although the caveat is they're not saving enough time for us to have them see more patients. And so we've been very explicit at UCSF around making it clear that this is a tool to improve

Starting point is 01:00:41 experience and not to improve efficiency. So we're not expecting it for people to see more patients as a result of using this tool. We want their clinic experience to be more meaningful. But then the other thing that's interesting that folks share is this tremendous relief of cognitive burden that folks feel when using this tool. So they may have been really efficient before. You know, they could get all their work done. They could type while they were talking to their patient. But they didn't actually, you know, get to look at their patients eye to eye

Starting point is 01:01:12 and have the meaningful conversation that people went into medicine for. And so we're hearing that as well. And I think one of the things that's going to be important to us is actually measuring that moving forward. And that is matched by some of the feedback we're getting from patients. So we have quotes from patients where they've said, you know, my doctor is using this new tool. And it's amazing. We're just having eye to eye conversations. Keep using it. So I think that's really important. I've been pushing my own primary care doctor to get into this because I really depend on her.

Starting point is 01:01:50 I love her dearly, but we never, I'm always looking at her back as she's typing at a computer during our encounters. So Sarah, while we're talking about efficiency, and at least the early evidence doesn't show clear efficiency gains, it does actually beg the question about how or why health systems, many of which are financially not swimming

Starting point is 01:02:16 in money, how or why they could adopt these things. And then we could also even imagine that there are even more important applications in the future that might require quite a bit of expense on developers as well as procurers of these things. What's your point of view on the, I guess, we would call this the ROI question about AI. I think this is a really challenging area because return on investment is very important

Starting point is 01:02:52 to health systems that are trying to figure out how to spend a limited budget to improve care delivery. And so I think we've started to see a lot of small use cases So I think we've started to see a lot of small use cases that prove this technology could likely be beneficial. So there are use cases that you may have heard of from Dr. Longhurst around drafting responses to patient messages, for example, where we've seen that this technology is helpful but doesn't get us all the way there. And that's because these technologies

Starting point is 01:03:30 are actually quite expensive. And when you wanna process large amount of data, that's called tokens and tokens cost money. And so I think one of the challenges when we sit in, when we envision the future of healthcare, we're not really envisioning the expense of querying the entire medical record through a large language model. And we're going to have to build systems from a technology standpoint that can do that work in a more affordable way for us to be able to deliver really high value use cases to clinicians that involve

Starting point is 01:04:07 processing that. And so those are use cases like summarizing large parts of the patient's medical record, providing really meaningful clinical decision support that takes into account the patient's entire medical history. We haven't seen those types of use cases really come into being yet, largely because, you know, they're technically a bit more complex to do well and they're expensive. But they're completely feasible.

Starting point is 01:04:36 You know, what you're saying really resonates so strongly. From the tech industry's perspective, you know, one way that that problem manifests itself is shareholders in big tech companies like ours, more or less expect they're paying a high premium, a high multiple on the share price, because they're expecting our revenues to grow at very spectacular rates, double digit rates. But that isn't obviously compatible with how health care works and the health care business works. It doesn't grow at 30% year-over-year or anything like that. And so how to make these things financially make sense for all comers.

Starting point is 01:05:21 And it's sort of part and parcel also with the problem that sometimes efficiency gains in healthcare just translate into heavier caseloads for doctors, which isn't obviously the best outcome either. And so in a way, I think it's another aspect of the work on impact and trustworthiness when we think about technology at all in health care. I think that's right. I think you know if you look at the difference between the AI scribe market and the rest of the summarization work that's largely happening within the electronic health record in the AI scribe market you have a lot of independent companies, and they all are competing to be the best.

Starting point is 01:06:06 And so because of that, we're seeing the technology get more efficient, cheaper. There's just a lot of investment in that space. Whereas, like, the electronic health record providers, they're also invested in really providing us with these tools, but it's not their main priority. They're delivering an entire electronic health record, and they also have to do it in a way that is affordable for all kinds of health systems, big UCSF health systems, smaller settings. And so there's a real tension, I think, between delivering good enough tools and truly transformative tools. So I want to go back for a minute to this idea of cognitive burden that you described.

Starting point is 01:06:54 When we talk about cognitive burden, it's often in the context of paperwork, right? There are maybe referral letters, after-visit notes, all of these things. How do you see these AI tools progressing with respect to that stream of different administrative tasks? These tools are gonna be continued to be optimized to do more and more tasks for us. So with AI Scribes, for example,

Starting point is 01:07:18 we're starting to look at whether it can draft the billing and coding information for the clinician, which is a tedious task with many clicks. These tools are poised to start pending orders based on the conversation. Again, a tedious task. All of this with clinician oversight, but I think as we move from them being AI scribes to AI assistants, it's going to be like a helper on the side for clinicians doing more and more work so they can really focus

Starting point is 01:07:49 on the conversations, the shared decision making, and the reason they went into medicine really. Yeah, let me, since you mentioned AI assistants, and that's such an interesting word, and it does connect with something that was apparent to us even as we were writing the book, an interesting word. And it does connect with something that was apparent to us even, you know, as we were writing the book, which is this phenomenon that these AI systems might make mistakes. They might be guilty of making biased decisions or showing bias. And yet

Starting point is 01:08:22 they at the same time seem incredibly effective at spotting other people's mistakes or other people's biased decisions. And so is there a point where these AI scribes do become AI assistants, that they're sort of looking over a doctor's shoulder and saying, hey, did you think about something else? Or hey, maybe you're wrong about a certain diagnosis. I mean, absolutely. You're just really talking about combining technologies that already exist into a more streamlined clinical care

Starting point is 01:09:00 experience, right? So you can all, and I already do this when I'm on rounds, I'll kind of give the case to CHAT GPT if it's a complex case and I'll say here's how I'm thinking about it, are there other things? And it'll give me additional ideas that are sometimes useful and sometimes not, but often useful. And I'll integrate them into my conversation about the patient. I think all of these companies are thinking about that. How do we integrate more clinical decision making into the process? I think it's just, you know, healthcare is always a little bit behind the technology

Starting point is 01:09:37 industry in general, to say the least. And so it's kind of one step at a time and And all of these use cases need a lot of validation. There's regulatory issues. And so I think it's going to take time for us to get there. Should I be impressed or concerned that the chief health AI officer at UC San Francisco Health is using chat GPT off label? Well, actually, every time I go on service,

Starting point is 01:10:08 I encourage my residents to use it because I think we need to learn how to use these technologies. And when our medical education leaders start thinking about how do we teach students to use these, we don't know how to teach students to use them if we're not using them ourselves, right? And so I've learned a lot about what I perceive the strengths and limitations of the tools are. And I think, but you know, one

Starting point is 01:10:32 of the things that we've learned is, and you've written about this in your book, but the prompting really matters. And so I had a resident ask it for a differential for abnormal liver tests. But in asking for that differential, there is a key important blood finding, something called eosinophilia. It's a type of blood cell that was mildly, mildly elevated, and they didn't know it. So they didn't give it in the prompt. And as a result, they didn't get the right differential. But it wasn't actually Chachi PT's fault. It just didn't get the right information because the trainee didn't recognize the right information.

Starting point is 01:11:16 And so I think there's a lot to learn as we practice using these tools clinically. So I'm not ashamed of it. Well, in fact, I think my co-, Carrie Goldberg would find what you said really validating because in our book, she actually wrote this fictional account of what it might be like in the future. And this medical resident was also using the chat bot off label for pretty much the same kinds of purposes. And it's these kinds of things that, you know, it seems like might be coming next. I mean, medicine, the practice of medicine is a very imperfect science.

Starting point is 01:11:55 And so, you know, when we have a difficult case, I might sit in the workroom with my colleagues and run it by people. And everyone has different thoughts and opinions on, you know, things I should check for. And so I think this is just one other resource where you can kind of run cases, obviously just reviewing all of the outputs yourself. All right, so we're running short on time.

Starting point is 01:12:18 And so I want to be a little provocative at the end here. And since we've gotten into AI assistance, two questions. First off, do we get to a point in the near future when it would be unthinkable and maybe even bordering on malpractice for a doctor not to use AI assistance in his or her daily work? So it's possible that we see that in the future. We don't see it right now and that's part of the reason we don't force this on people. So we see AI scribes or AI assistants as a tool

Starting point is 01:12:54 we offer to people to improve their daily work because we don't have sufficient data that the outcomes are markedly better from using these tools. I think if there's I think there is a future where specific You know tools do actually improve outcomes and then they're you should be incentivized either through you know CMS or other other systems to ensure that you know, we're delivering Standard of care, but we're not yet at the place where any of these tools are standard of care, which means they should be used to practice good medicine. And I think I would say that it's the work

Starting point is 01:13:33 of people like you that would make it possible for these things to become standard of care. And so now, final provocation. It must have crossed your mind through all of this, the possibility that AI might replace doctors in some ways. What are your thoughts? I think we're a long way from that happening, honestly. And I think even when I talk to my colleagues in radiology about this, where I perceive as an internist, they might be the most replaceable. There's a million reasons why that's not the case. And so, I think these tools are going to augment our work. They're going to help us streamline access for patients. They're going to maybe change what clinicians have to do, but I don't

Starting point is 01:14:22 think they're going to fully replace doctors. There's just too much complexity and nuance in providing clinical care for these tools to do that work fully. Yeah, I think you're right. And actually, you know, I think there's plenty of evidence because in the history of modern medicine, we actually haven't seen technology replace human doctors. Maybe you could say that we don't use barbers for bloodletting anymore because of technology. But I think as you say, we're at least a long ways away. Yeah. Sarah, this has been just a great conversation and thank you for the great work that you're doing and for being so open with us on your personal use of AI, but also how you see the adoption of AI in our health system.

Starting point is 01:15:08 Thank you. It was really great talking with you. I get so much out of talking to Sarah. Every time she manages to get me refocused on two things, the quality of the user experience and the importance of trust in any new technology that is brought into the clinic. I felt like there were several good takeaways from the conversation. One is that she really validated some predictions that Carrie, Zack, and I made in

Starting point is 01:15:37 our book. First and foremost, that automated note-taking would be a highly desirable and practical reality. The other validation is Sarah revealing that even she uses chat GPT as a daily assistant in her clinical work. Something that we guessed would happen in the book, but we weren't really sure since health systems oftentimes are very locked down when it comes to the use of technological tools. And, of course, maybe the biggest thing about Sarah's work is her role in defining a new

Starting point is 01:16:08 type of job in healthcare, the health AI officer. This is something that Carrie, Zach, and I didn't see coming at all, but in retrospect, makes all the sense in the world. Taken together, these two conversations really showed that we were on the right track in the book. AI has made its way into day-to-day life and work in the clinic, and both doctors and patients seem to be appreciating it. I'd like to extend another big thank you to Chris and Sarah for joining me on the show and sharing their insights. And to our listeners, thank you for coming and Sarah for joining me on the show and sharing their insights.

Starting point is 01:16:45 And to our listeners, thank you for coming along for the ride. We have some really great conversations planned for the coming episodes. We'll delve into how patients are using generative AI for their own healthcare, the hype and reality of AI drug discovery, and more. We hope you'll continue to tune in. Until next time.

Your Ad Here

Microsoft Research Podcast - The AI Revolution in Medicine, Revisited: The reality of generative AI in the clinic

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.