OpenAI Podcast - Episode 14 - Building AI for better healthcare

Starting point is 00:00:00 Hello, I'm Andrew Main, and this is the Open Eye podcast. Today, we're talking to Dr. Nate Gross, head of health and Karan Singhal, who leads health AI research at OpenAI. We'll cover what went into training models to handle sensitive questions and how it's helping clinicians, patients, and health care systems. We actually worked really closely with a group, a cohort of around 250 physicians across every stage of generation of this data. And we're starting to see medications that have been sitting on a shelf, that all of a sudden AI has found ways for them to have direct value in inpatient lives. How did you find your way into healthcare?

Starting point is 00:00:40 So what drew me to health care initially was health policy. I was very interested. This was before the first Obama election. Value-based care was first becoming a thing. I started studying different ways to make health care more accessible to more people. people. And then eventually went to Emory for medical school. And what drew me to that was a large public hospital, Grady Hospital, you know, to make sure that you're taking advantage of every clinical hour you have. So what kind of things were you doing? So I was mostly pissing off the

Starting point is 00:01:16 IT department. When I was in medical school, the news feed came out, the iPhone came out, Twitter came out, the app store came out. And so comparing the technology. that we had as doctors, which was fax machine, clipboard, paper binder, the beginnings of electronic health records to like what my friends had or what the patients had in the waiting room was pretty profound. So you come at it from the point of view as an AI researcher. Where did your interest in applying this to healthcare come from? So I nerded out a lot when I was younger about things like philosophy of mind. And I thought a lot about, you know, intelligence and how far could intelligence go and could machines be intelligent? And a lot of those,

Starting point is 00:01:59 explorations took me towards, as I was learning about AI and starting to work on my first AI projects, thinking a lot about the ways in which AI could have a lot of impact on humanity in the future. And I thought something like, I didn't predict the future or how fast it would happen, but I thought something like AGI would happen within our lifetimes. So then once I had that conviction, I thought a lot about, you know, what are the ways in which I can have either positive impact and hopefully make that a really large upside for humanity or think about the ways in which we could avoid downside. So since that, you know, And in my career, I've been thinking a lot about both sides of that coin, thinking about that

Starting point is 00:02:33 from the perspective as a safety researcher, which is part of my background. And then really some of that work on safety and privacy that I was working on previously, I started applying it in healthcare. And then I started being like, whoa, there is a really massive opportunity to think about the application of this technology, especially large language models in healthcare. And that's what took me to transitioning to it full time, was just the size of that opportunity and the fact that I felt like the healthcare and clinical AI world was kind of not fully aware of that gap. And so I just thought it was kind of a really amazing opportunity and responsibility

Starting point is 00:03:04 to bring us there. I want to understand both the vision and actually how this is going to be implemented. So our mission at OpenAI is to ensure that AGI benefits all of humanity. And health is one of the places where I think that is not only most achievable, but is the clearest. So healthcare today, as everyone knows, is fragmented, care is missed, left and right. Patients are often left 364 days per year without the opportunity to engage with the organizations that have the information centralized. And doctors have extremely limited time when they do get that chance to engage with the patient to actually have, you know, a meaningful impact beyond a simple surgery or a simple reactive

Starting point is 00:03:57 prescription, you know, the system is more reactive than it is proactive today. And that leads to, you know, tremendous challenges in the system. It leads to tremendous gaps in care. It leads to, you know, leaving people behind in situations when they could be thriving. And one of the reasons that I joined OpenAI is because access has always been a through line in my life. Access to knowledge, first in medicine, then in building a product for doctors to access the latest medical literature, and then in supporting entrepreneurs as they were building healthcare tools. But OpenAI has the type of technology that can do that at scale for the entire ecosystem

Starting point is 00:04:43 all at once, help patients, help healthcare professionals. and help incredible entrepreneurs who are building for all of the corners and edge cases and tough challenges that exist in each area of the health market. What is the strategy here? We know that people use chatbots all the time now for medical questions, but it seems like you're building and working towards something bigger and more comprehensive, not just for the patient side, but the clinician side. Because you talk about what your goals are?

Starting point is 00:05:14 Patients are increasingly turning to tools like chat. GPT throughout the year. In fact, 900 million people now use chat GPT per week. And if you look at how many are doing health-related queries, it's about one and four in a given week. So that's 40 million people per day. And so our strategy in health is as much proactive as it is reactive and stepping up to the responsibility and the opportunity to do good that comes with that strong consumer demand. And so with chat GPT Health, we have created a space to keep these conversations not just secure, but empowered. So when I say secure, of course, encrypted with this essentially one-way valve protecting your conversations. So these extra security layers, these protections

Starting point is 00:06:09 to make sure that we will never train on users' health care data, combined with empowerment, really. You know, search engines that people have used before to navigate health have amnesia. You know, they're one-size-fits-all. And I think context really matters in health care. And so building a series of features and technology hooks to help patients bring in their own context that they choose to so that each time they choose to engage with AI, it's grounded in their own context, is a key reason. why we've built this chat GPT for health foundation. So I understand the safeguards you put in place to keep the data separate and to make sure that you don't get leaked between there and to be able to, you know, undergo a very rigorous method of making sure that your data secure. But when it comes to

Starting point is 00:06:58 the model itself, what comes into training models that are capable of working with something like healthcare? It's kind of like the most important thing in the world. For sure. It's a high-stakes domain and because of the use that people are doing, it's super important that we get it right. So we think a lot about a few things when we think about evaluation and training for health care. And this is actually the foundation for the work at health at OpenAI. When we were first starting to work on the health effort at OpenAI, we were thinking a lot about the safety and grounding motivation as an important part of what we were doing. And so part of the thesis actually for starting work on health at Open AI was thinking, this is an excellent way to ground our work in safety and alignment and provide kind of concrete incentives and feedback loop for researchers who think about this problem. So like the model improvements and the safety thinking here is not just an afterthought.

Starting point is 00:07:46 It's actually the beginning of our work here. And so where we started really was thinking about evaluation. So can we think about the ways in which, you know, models were already starting to become useful to people then. And there's already starting to be this capability overhang between what the models could do and what people were using them for. And so we started to navigate that problem and think about, you know, where do the models still have gaps today? And so that's where our work on evaluation comes in. And so we've taken a pretty methodological. methodologically interesting approach to that. And a lot of that has reflected in our work in

Starting point is 00:08:19 Health Bench, which is this kind of realistic evaluation of conversations between users who are either health professionals or consumers talking to models and seeing, measuring their performance and safety of the models in these situations, which are these kind of multi-turn conversations. And the way we worked on this is we actually worked really closely with a group, a cohort of around 250 physicians that we work with to kind of across every stage of generation of this data from thinking about the ways in which, you know, the areas that we would focus in for the evaluation and the areas that we thought about were going to be clinically relevant or impactful to the specific, you know, one of the specific things that are being graded in this evaluation.

Starting point is 00:09:04 So that's like a range of things from, you know, are you tailoring your response to a layperson versus a more technical health professional, are you thinking about the ways in which you should seek context first before providing an initial response? The models used to be significantly, are much better today at kind of seeking context when needed because users are typing in much less than the models often need to be able to provide information that's most helpful.

Starting point is 00:09:32 It burns. Exactly. You know, if it user types, then it burns. How do you think about the right way to provide information? You can provide some initial information potentially based on a impression you might have of what the user might be saying. But the most helpful thing to do in that situation and the safest thing to do in that situation is actually to ask for more context. So that's just one example of the many ways that we kind of measured performance in Health Bench. And Health Bench in particular actually measured around 49,000 different dimensions of performance.

Starting point is 00:09:59 And that's just an example of one possible dimension of performance. So this is a very multifaceted evaluation that we built kind of in concert with this cohort of 250, physicians over a long period of time. And it took us about a year actually end to end to work on that evaluation and then release it. And the kind of the model development cycle, it seems like sometimes some company gets a bit ahead and somebody comes up and catches up and whatnot. I've noticed a pattern with the open A health models. They've consistently been far ahead in HealthBench and other evals that like by a big margin. Why is that? I think we have a pretty dedicated effort here and a pretty serious effort that is cross functional and and

Starting point is 00:10:38 kind of across the stack, everything from kind of pre-deployment evals to like HealthBench, to monitoring in production traffic and thinking about the ways in which we are ensuring safety in production traffic in a privacy preserving way and working with physicians across every step of that process. And so to my knowledge, opening eyes models are the only major models where every phase of model training from pre-training to mid-training to post-training and every step in between really, integrates health into every major stage. And I think the result is that our models are pretty good, not just on our own benchmarks, but also the benchmarks that people, other people put

Starting point is 00:11:18 together. I'd like to add a little to what Karin said about the model training, because I think when we spend time with the healthcare ecosystem, that's one of the things that is most important to them. So not only were these models trained in development with hundreds of physicians who created over 5,000 conversations and 48,500 rubric criteria through which to evaluate AI responses and score them and identify ways that we could improve the model, do additional data acquisition, do additional post-training, hone in on a particular sub-specialty or a particular area of the world where users were telling us we could improve health or health care in that specific topic. But in addition, I think that close proximity to physicians really leads to calling out the most important parts that should be focused on in model development.

Starting point is 00:12:19 So, you know, other places, sometimes I see how a model fared on a medical school exam or a board exam. And healthcare is not multiple choice. You know, patients are coming in with a tremendous amount of complexities and their own. stories and nuance and context. And that's presented in many different ways. And part of the job of working in healthcare is being able to draw from those disparate sources, draw from experience, balance all that in your head. And so having a training mechanism that thinks about things like when to escalate and how

Starting point is 00:12:59 to escalate and keep that always as the top priority or adaptive literacy, I mean, can compare the one-size-fits-all handouts that people. people get when they visit the doctor today to a model that can respond differently when it knows you're an oncologist versus a primary care doctor versus a pharmacist in Kenya versus a patient at the 12th grade literacy level or the third grade literacy level is extremely important for not only making sure that accuracy and impact is is maximized but also just to make sure that everyone can maximally participate in their own care on the patient side. And then finally, uncertainty.

Starting point is 00:13:41 You know, if you go back a year and a half ago, many of the mistakes people would call out about AI models were overconfident hallucinations. And I think in such a high-stakes field like healthcare, one of the most important things is that the model can be trained to better know when it doesn't know and say that. And in addition, suggest follow-up that can be dug into either by the patient in a referral to the health care system or by the doctor if the doctor is

Starting point is 00:14:13 using the model, a test that they might run additional pathways they may go down to make sure that the patient can be led to the best possible outcome. We've seen the cost of intelligence drop every year and it's exciting because every year you're able to get better answers, medicine, everything, health care across the board. But what are the challenges? What are going to be the blockers or what are you looking at ahead to say that it's okay, we have to solve for this. The drop in cost intelligence has been super exciting here

Starting point is 00:14:41 because so much of what we think about and care about here is actually about access. And so the more people have access to technology, the more people will benefit. And that's why we're working on rolling out Chatsypita Health more widely to all free users. And so that's incredibly exciting. Another thing that we think about as researchers is like,

Starting point is 00:15:02 where will the marginal gains intelligence compound the most, right? And so I think Nate mentioned this exciting, thing, which is like there is more and more data that is being collected that is across different modalities. How do you think about integrating that data across all the different ways that people use chat GPT and all the different modalities and wearables and things like this that people are collecting, lap tests, things like this? And that's one place where I think a lot of the intelligence will compound and will start to see kind of new zero one capabilities, like a model looks at my entire history over a decade and tells me a prediction that even a human couldn't have because

Starting point is 00:15:34 just the model has a higher context size. So thinking about those zero to one capabilities, I think, are going to be really cool. The other thing we keep in mind is just like, how are people thinking about and using chat GPT today? Can we measure that? Can we improve that? And I think we're kind of this interesting point right now. I call this to our team the transition where, you know, for context, I bike to work. And I bike to work.

Starting point is 00:15:58 I wear my helmet. I worry about cars and things like this next to me. I just reached the point here in SF, you know, in SF we have a bunch of self-driving cars, including Waymo's. I just reached the point where, you know,

Starting point is 00:16:11 when I'm biking an extra Waymo, I actually feel safer than if I was biking next to a human driver, right? I don't worry about whether I'm in their blind spot or not or anything like this. So I feel this protective effect by being next to this Waymo.

Starting point is 00:16:22 And I want everybody to have this protective effect, right? I want everybody to have this protective effect with health AI. There are these studies showing that, you know, if you have a doctor in your family, that adds a protective effect to your health as well. And I want everybody, whether they're a patient or a health professional, to think about the ways in which, like, as a patient, you want to feel safer having this.

Starting point is 00:16:45 As a health professional, you want this to be a safety net for the decisions that you're making. So that's another frontier that I think we're going to cross in the next six months or so, which is really exciting, this kind of inflection point. Another thing that we're thinking about is kind of the right ways to think around post-deployment, monitoring of certain workflows. And I think a good example here that I love to talk about is our AI clinical co-pilot study that we did with Penda Health. This was a study where we worked with these 20 or so clinics in Nairobi and actually thought about the ways in which we can deploy a safety net for clinicians in that context, which is basically monitoring things that they

Starting point is 00:17:26 type into their electronic health record and only interrupting their flow when there's something potentially concerning that's going on or potential error or things like this. And what we found is that when we deployed this to clinicians in the setting, that there was actually a statistically significant reduction in diagnostic and treatment errors for the clinicians who are using this tool versus not. And I think this is a step in the direction of moving beyond kind of model evaluations and even monitoring of the ways in which people are thinking about using chatyPD today to actually like thinking about workflows in which these technologies can be deployed and the right ways to evaluate those workflows after deployment. I think that's another

Starting point is 00:18:04 frontier that we are really excited about and would love to see more from our partners. Nate, what do you think the challenges are going to be? I'll start with talking through some of the challenges that exist on the professional side. So each day when healthcare professionals use AI, they're looking for the ability to trust what they're seeing in the answer. And so a lot of our recent work has been making sure that answers that the AI is providing are not just grounded in what the model was training on, but are grounded in the latest medical literature, the latest guidelines. And sometimes the latest guidance from their own institution or their own region, some conditions are treated differently in areas of different areas of the country,

Starting point is 00:18:53 other times different care settings have different levels of resources, different levels of specialists and additional services on hand. And it can be helpful as a healthcare professional to be able to quickly navigate that and come up with completely personalized care plans. And so building connectivity within chat GPT to not only be HIPAA aligned and be used in these secure environments, but also be able to combine sensitive information with the latest medical knowledge, I think is a great path that we've started down and something that will continue to keep trust

Starting point is 00:19:37 as the top priority between how healthcare professionals engage with AI. So I think one of the other challenges is that the systems themselves in healthcare are quite siloed. both at an organization level, but also at the tools that have to be used within each organization. AI thus far has been deployed on a really a point solution basis in the technology industry, but increasingly the connectivity is becoming available to connect the dots between the hundreds of different systems, some analog, some digital, some structured, some unstructured, many decentralized, many not on the cloud, being able to connect all of those through unified AI layers to actually make sure that patients and information isn't falling through the cracks and that the connectivity

Starting point is 00:20:32 can be maximized to actually bring the greatest amount of impact. That's hard in healthcare, and it's certainly not something that we can say is solved. But with many of our recent products, ranging from chat GPT for healthcare and its connectivity to apps and connectors to the OpenAI API for healthcare, to our frontier foundation for models and agents, we think increasingly there's going to be an opportunity to really accelerate what is possible within the healthcare system and what agents can achieve. Part of this seems like it's very collaborative, working of the healthcare industry. And I noticed when using the chatchipD health app, the first thing I did was able to put in my records and get all of that. And it looked like there was a lot of just

Starting point is 00:21:18 cooperation working across the sort of ecosystem to do this. How has that come to be? Where is headed? It's extremely important that all of the health care system has an equal chance to contribute and engage nationally and internationally. with providing the context that will help empower patients to receive the best possible answers from chat GPT. And so on the electronic health record side, this means working with the government and centers for Medicare and Medicaid services, adopting national standards for electronic health record sinking so that patients in just a few taps are able to bring in their context in consented ways. it's being able to tap into existing standards like mobile phones and the most popular consumer health products and the most popular biosensors and wearables to make sure again in just one

Starting point is 00:22:21 or two taps patients are able to not only bring in that information but leverage it in thoughtful ways in ways that may not have been possible without the combined set of data that can exist in this sort of ecosystem. So for instance, being able to reference your recent exercise activity when making a plan of how to spend your evening or being able to even do things as simple as, you know, reference your overnight sleep and stress when your agent is helping you set your calendar for the next day and what tasks you make take on first. It's very exciting. You know, I have, you know, we're a smart ring, a watch, whatever, but I get this data and all I kind of have in my apps are rings to look at and get like, okay, I guess it's doing something.

Starting point is 00:23:12 Being able to plug in a chatchiped has been fantastic because now I'm able to ask those kinds of questions. But that's very exciting. What you talk about too is if you get a plan from your doctor suggestions is literally say, hey, I didn't walk enough yesterday. What should I do today? I've had to be really good at menu planning and literally go on this menu film, tell me what to order and whatnot. And so you're saying we're just going to get more of that and much better. Yeah, and that's why our partnerships, I believe, are so important because in these instances, chat GPT doesn't replace the incredible technology that our partners are building to go deep on health insights for a particular wearable.

Starting point is 00:23:51 But our surface area, our opportunity to bring in that health information can now extend to the many different ways people use chat GPT. such as what they're going to cook for dinner or how they're going to plan their afternoon. You know, sometimes I think of two patients and one patient has to navigate the healthcare system by themselves. And the other patient maybe has a spouse come with them. And that spouse has a clipboard and used to work as a healthcare professional

Starting point is 00:24:24 and is very attentive, if not neurotic, and can follow up on details and is connected to your personal calendar. And the best aspects of that with consent for the patients that want to, I think represents a future where we can make it easier and easier for patients to follow care plans to play active captain-like roles in their own health in partnership with their care teams and their physicians. And I think if we can remove a lot of the friction that historically exists,

Starting point is 00:24:58 between those processes, whether it's just information not following or there's a lot to keep track of or a lot of old information to parse and bring in, we can do a tremendous amount of good or we can help patients themselves be empowered to do a tremendous amount of good in their own care plans. And you know as a physician that it's hard to give as much time as you would like because you're going to always to have more patients you have to deal with and you have hours in the day. And it's interesting to see kind of a technology that has infinite time, infinite patients, to be able to do that as a compliment to that. I mean, if there's one thing that healthcare professionals are short on, it's time. So when we think about our role internally at OpenAI, we often break down the work that we're doing into three buckets.

Starting point is 00:25:48 Raise the floor. So make sure that AI and the benefits of AI are accessible to everyone. and that could be patients, that could be healthcare professionals and others working in health-related industries, sweep the floor, which means help doctors and help other health professionals save time from the tremendous administration and bureaucratic burdens that they have every day so that they can spend more time with their patients. And then thirdly, raise the ceiling. You know, the impact that AI can have in health care, I think will, you know,

Starting point is 00:26:23 allow us to look back on this space in a few years and say, wow, we have all accelerated together in a way that medicine is still in the driver's seat, but is also far more empowered than ever before. Yeah. I don't think anybody feels like their doctor spent too much time with them. So it looks like this is going to be helpful to solve for that. What has been your favorite aha or wow or this is a really cool moment in the intersection of AI and healthcare?

Starting point is 00:26:51 I'll answer your question in a non-standard way, which is, I think the most amazing thing to seem for me in the last year has been the rate of adoption of health, actually even beyond the Chachapiti health product before we announced the Chachipti health product. It's been one of our fastest-growing use cases, this kind of health and wellness questions, and we shared that hundreds of millions of people a week are starting to use Chachapiti for health and wellness. I think seeing that rapid growth, especially, you know, coming from a background of being motivated to work on this problem because I felt like healthcare and clinical AI world were not super aware of the potential of LLMs in health care and seeing how far we had come. I think it's been a really special moment for me.

Starting point is 00:27:29 There's no doubt that the adoption of this technology and the fact that it is increasingly collaborative with the healthcare system, it is increasingly driving feedback loops back to us to improve the models is the most meaningful thing and the most mission aligned thing. But what I also get excited about is what our research team has increasingly be able to, able to give back to them using that feedback. And not only is it the capabilities of the models, but it's what can be unlocked once those models are allowed to run longer and have more context. And we're starting to see discoveries of medications that have been sitting on a shelf

Starting point is 00:28:11 that all of a sudden AI has found ways for them to have meaningful and direct value in inpatient lives. it is starting to scale experiments that we as individuals wouldn't have been able to juggle on our own. And that partnership combined with that increased capability to finally move from being interesting to being useful and increasingly to being transformative is, I think, what is the most exciting thing for us heading into this year. Now that you've been working on this for some time, you've been engaging with clinicians and talking to people, helping deploy this, what has been some of the feedback you've seen? I think the experience of flying to Nairobi and seeing the clinicians using the tool and the ways in which we did this thing, which we call active change management,

Starting point is 00:29:02 where we worked really closely with these clinicians and flew to Kenya a couple of times to think about the ways that we could deepen their workflows using the AI tool and make it something that not only made sense to them, but actually became kind of something that was indispensable for them. And so as we were concluding the study, the team was actually thinking about, the team at Penda Health was thinking about potentially running another study. And they actually had a lot of hesitance around running another study because that would have involved having some group of clinicians using AI and some group of clinicians not using AI. They actually felt that it was dangerous to have a group of clinicians not using the AI. And so that's the point at which I was like, wow, we have done something major here.

Starting point is 00:29:44 I think the stories that we get back from our members every day are one of the most meaningful parts of the job. And these are from caregivers that are increasingly under strain, taking care of family members, trying to navigate their own health at the same time. This is from doctors and nurses who are truly overloaded every day. And we can help them extend their expertise and compress the tough parts of their, their day a little bit more. And then sometimes, and this is more rare but increasing, it's the miracle cases. It's the patient who had been bouncing around the system for years, the unsolved diagnosis, the emergency where information wasn't present. And suddenly being able to step in and

Starting point is 00:30:36 assist and accelerate and bring people into the care that could really help is truly a privilege. It's exciting. It's as an amplifier and every doctor I know wants to be able to do more for their patients. Thank you very much. This has been very interesting, guys. Thank you.

OpenAI Podcast - Episode 14 - Building AI for better healthcare

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.