OpenAI Podcast - Episode 14 - Building AI for better healthcare
Episode Date: March 16, 2026Healthcare systems around the world are under strain, and both patients and clinicians are feeling the impact. OpenAI's Head of Health Dr. Nate Gross and Karan Singhal, who leads Health AI Research, d...iscuss how AI can help address the biggest challenges. They cover how OpenAI is training models to handle sensitive health questions in collaboration with physicians, and how that foundation is unlocking a new generation of tools for patients, clinicians, and healthcare systems.Chapters00:00:38 – Origins of Nate and Karan’s interest in AI and healthcare00:05:01 – Strategy for building AI tools for clinicians00:06:57 – How AI models are trained for health use cases00:10:15 – How OpenAI is able to score well on health evals00:14:21 – Key challenges deploying AI in healthcare00:21:05 – Collaboration with hospitals and healthcare systems00:23:05 – Practical everyday uses of AI health assistants00:26:43 – Biggest “wow” moment during development00:28:46 – Feedback from clinicians and early users Hosted on Acast. See acast.com/privacy for more information.
Transcript
Discussion (0)
Hello, I'm Andrew Main, and this is the Open Eye podcast. Today, we're talking to Dr. Nate Gross,
head of health and Karan Singhal, who leads health AI research at OpenAI. We'll cover what went
into training models to handle sensitive questions and how it's helping clinicians, patients,
and health care systems. We actually worked really closely with a group, a cohort of around 250
physicians across every stage of generation of this data. And we're starting to see medications
that have been sitting on a shelf, that all of a sudden AI has found ways for them to have
direct value in inpatient lives.
How did you find your way into healthcare?
So what drew me to health care initially was health policy.
I was very interested.
This was before the first Obama election.
Value-based care was first becoming a thing.
I started studying different ways to make health care more accessible to more people.
people. And then eventually went to Emory for medical school. And what drew me to that was a large
public hospital, Grady Hospital, you know, to make sure that you're taking advantage of every
clinical hour you have. So what kind of things were you doing? So I was mostly pissing off the
IT department. When I was in medical school, the news feed came out, the iPhone came out,
Twitter came out, the app store came out. And so comparing the technology.
that we had as doctors, which was fax machine, clipboard, paper binder, the beginnings of electronic
health records to like what my friends had or what the patients had in the waiting room was
pretty profound. So you come at it from the point of view as an AI researcher. Where did your
interest in applying this to healthcare come from? So I nerded out a lot when I was younger about
things like philosophy of mind. And I thought a lot about, you know, intelligence and how far could
intelligence go and could machines be intelligent? And a lot of those,
explorations took me towards, as I was learning about AI and starting to work on my first AI
projects, thinking a lot about the ways in which AI could have a lot of impact on humanity in the
future. And I thought something like, I didn't predict the future or how fast it would happen,
but I thought something like AGI would happen within our lifetimes. So then once I had that
conviction, I thought a lot about, you know, what are the ways in which I can have either
positive impact and hopefully make that a really large upside for humanity or think about
the ways in which we could avoid downside. So since that, you know,
And in my career, I've been thinking a lot about both sides of that coin, thinking about that
from the perspective as a safety researcher, which is part of my background.
And then really some of that work on safety and privacy that I was working on previously,
I started applying it in healthcare.
And then I started being like, whoa, there is a really massive opportunity to think about
the application of this technology, especially large language models in healthcare.
And that's what took me to transitioning to it full time, was just the size of that opportunity
and the fact that I felt like the healthcare and clinical AI world was kind of not fully aware
of that gap. And so I just thought it was kind of a really amazing opportunity and responsibility
to bring us there. I want to understand both the vision and actually how this is going to be
implemented. So our mission at OpenAI is to ensure that AGI benefits all of humanity. And health
is one of the places where I think that is not only most achievable, but is the clearest.
So healthcare today, as everyone knows, is fragmented, care is missed, left and right.
Patients are often left 364 days per year without the opportunity to engage with the organizations
that have the information centralized.
And doctors have extremely limited time when they do get that chance to engage with the patient
to actually have, you know, a meaningful impact beyond a simple surgery or a simple reactive
prescription, you know, the system is more reactive than it is proactive today. And that leads to,
you know, tremendous challenges in the system. It leads to tremendous gaps in care. It leads to,
you know, leaving people behind in situations when they could be thriving. And one of the reasons
that I joined OpenAI is because access has always been a through line in my life.
Access to knowledge, first in medicine, then in building a product for doctors to access
the latest medical literature, and then in supporting entrepreneurs as they were building
healthcare tools.
But OpenAI has the type of technology that can do that at scale for the entire ecosystem
all at once, help patients, help healthcare professionals.
and help incredible entrepreneurs who are building for all of the corners and edge cases
and tough challenges that exist in each area of the health market.
What is the strategy here?
We know that people use chatbots all the time now for medical questions,
but it seems like you're building and working towards something bigger and more comprehensive,
not just for the patient side, but the clinician side.
Because you talk about what your goals are?
Patients are increasingly turning to tools like chat.
GPT throughout the year. In fact, 900 million people now use chat GPT per week. And if you look at how many are
doing health-related queries, it's about one and four in a given week. So that's 40 million people
per day. And so our strategy in health is as much proactive as it is reactive and stepping
up to the responsibility and the opportunity to do good that comes with that strong
consumer demand. And so with chat GPT Health, we have created a space to keep these conversations
not just secure, but empowered. So when I say secure, of course, encrypted with this essentially
one-way valve protecting your conversations. So these extra security layers, these protections
to make sure that we will never train on users' health care data, combined with empowerment,
really. You know, search engines that people have used before to navigate health have amnesia.
You know, they're one-size-fits-all. And I think context really matters in health care. And so
building a series of features and technology hooks to help patients bring in their own context that they
choose to so that each time they choose to engage with AI, it's grounded in their own context, is a key reason.
why we've built this chat GPT for health foundation. So I understand the safeguards you put in place
to keep the data separate and to make sure that you don't get leaked between there and to be able to,
you know, undergo a very rigorous method of making sure that your data secure. But when it comes to
the model itself, what comes into training models that are capable of working with something like
healthcare? It's kind of like the most important thing in the world. For sure. It's a high-stakes
domain and because of the use that people are doing, it's super important that we get it right.
So we think a lot about a few things when we think about evaluation and training for health care.
And this is actually the foundation for the work at health at OpenAI.
When we were first starting to work on the health effort at OpenAI, we were thinking a lot about the safety and grounding motivation as an important part of what we were doing.
And so part of the thesis actually for starting work on health at Open AI was thinking, this is an excellent way to ground our work in safety and alignment and provide kind of concrete incentives and feedback loop for researchers who think about this problem.
So like the model improvements and the safety thinking here is not just an afterthought.
It's actually the beginning of our work here.
And so where we started really was thinking about evaluation.
So can we think about the ways in which, you know, models were already starting to become useful to people then.
And there's already starting to be this capability overhang between what the models could do and what people were using them for.
And so we started to navigate that problem and think about, you know, where do the models still have gaps today?
And so that's where our work on evaluation comes in.
And so we've taken a pretty methodological.
methodologically interesting approach to that. And a lot of that has reflected in our work in
Health Bench, which is this kind of realistic evaluation of conversations between users who are
either health professionals or consumers talking to models and seeing, measuring their performance
and safety of the models in these situations, which are these kind of multi-turn conversations.
And the way we worked on this is we actually worked really closely with a group, a cohort of
around 250 physicians that we work with to kind of across every stage of generation of this data
from thinking about the ways in which, you know, the areas that we would focus in for the
evaluation and the areas that we thought about were going to be clinically relevant or impactful
to the specific, you know, one of the specific things that are being graded in this evaluation.
So that's like a range of things from, you know, are you tailoring your response to a layperson
versus a more technical health professional,
are you thinking about the ways in which you should seek context first
before providing an initial response?
The models used to be significantly,
are much better today at kind of seeking context when needed
because users are typing in much less than the models often need
to be able to provide information that's most helpful.
It burns.
Exactly.
You know, if it user types, then it burns.
How do you think about the right way to provide information?
You can provide some initial information potentially based on a impression you might have of what the user might be saying.
But the most helpful thing to do in that situation and the safest thing to do in that situation is actually to ask for more context.
So that's just one example of the many ways that we kind of measured performance in Health Bench.
And Health Bench in particular actually measured around 49,000 different dimensions of performance.
And that's just an example of one possible dimension of performance.
So this is a very multifaceted evaluation that we built kind of in concert with this cohort of 250,
physicians over a long period of time. And it took us about a year actually end to end to work
on that evaluation and then release it. And the kind of the model development cycle, it seems like
sometimes some company gets a bit ahead and somebody comes up and catches up and whatnot.
I've noticed a pattern with the open A health models. They've consistently been far ahead in
HealthBench and other evals that like by a big margin. Why is that? I think we have a pretty
dedicated effort here and a pretty serious effort that is cross functional and and
kind of across the stack, everything from kind of pre-deployment evals to like HealthBench,
to monitoring in production traffic and thinking about the ways in which we are ensuring safety
in production traffic in a privacy preserving way and working with physicians across every
step of that process.
And so to my knowledge, opening eyes models are the only major models where every phase
of model training from pre-training to mid-training to post-training and every step in
between really, integrates health into every major stage. And I think the result is that our models are
pretty good, not just on our own benchmarks, but also the benchmarks that people, other people put
together. I'd like to add a little to what Karin said about the model training, because I think
when we spend time with the healthcare ecosystem, that's one of the things that is most important to them.
So not only were these models trained in development with hundreds of physicians who created
over 5,000 conversations and 48,500 rubric criteria through which to evaluate AI responses
and score them and identify ways that we could improve the model, do additional data acquisition,
do additional post-training, hone in on a particular sub-specialty or a particular area of the
world where users were telling us we could improve health or health care in that specific topic.
But in addition, I think that close proximity to physicians really leads to calling out the most important parts that should be focused on in model development.
So, you know, other places, sometimes I see how a model fared on a medical school exam or a board exam.
And healthcare is not multiple choice.
You know, patients are coming in with a tremendous amount of complexities and their own.
stories and nuance and context.
And that's presented in many different ways.
And part of the job of working in healthcare is being able to draw from those disparate
sources, draw from experience, balance all that in your head.
And so having a training mechanism that thinks about things like when to escalate and how
to escalate and keep that always as the top priority or adaptive literacy, I mean, can
compare the one-size-fits-all handouts that people.
people get when they visit the doctor today to a model that can respond differently when it knows
you're an oncologist versus a primary care doctor versus a pharmacist in Kenya versus a patient
at the 12th grade literacy level or the third grade literacy level is extremely important for
not only making sure that accuracy and impact is is maximized but also just to make sure
that everyone can maximally participate in their own care on the patient side.
And then finally, uncertainty.
You know, if you go back a year and a half ago,
many of the mistakes people would call out about AI models
were overconfident hallucinations.
And I think in such a high-stakes field like healthcare,
one of the most important things is that the model can be trained
to better know when it doesn't know and say that.
And in addition, suggest follow-up that can be dug into
either by the patient in a referral to the health care system or by the doctor if the doctor is
using the model, a test that they might run additional pathways they may go down to make sure
that the patient can be led to the best possible outcome.
We've seen the cost of intelligence drop every year and it's exciting because every year
you're able to get better answers, medicine, everything, health care across the board.
But what are the challenges?
What are going to be the blockers or what are you looking at ahead to say that it's
okay, we have to solve for this.
The drop in cost intelligence has been super exciting here
because so much of what we think about and care about here
is actually about access.
And so the more people have access to technology,
the more people will benefit.
And that's why we're working on rolling out Chatsypita Health
more widely to all free users.
And so that's incredibly exciting.
Another thing that we think about as researchers is like,
where will the marginal gains intelligence compound the most, right?
And so I think Nate mentioned this exciting,
thing, which is like there is more and more data that is being collected that is across different
modalities. How do you think about integrating that data across all the different ways that people
use chat GPT and all the different modalities and wearables and things like this that people are
collecting, lap tests, things like this? And that's one place where I think a lot of the intelligence
will compound and will start to see kind of new zero one capabilities, like a model looks at my entire
history over a decade and tells me a prediction that even a human couldn't have because
just the model has a higher context size.
So thinking about those zero to one capabilities, I think, are going to be really cool.
The other thing we keep in mind is just like, how are people thinking about and using chat GPT today?
Can we measure that?
Can we improve that?
And I think we're kind of this interesting point right now.
I call this to our team the transition where, you know, for context, I bike to work.
And I bike to work.
I wear my helmet.
I worry about cars and things like this next to me.
I just reached the point here in SF,
you know,
in SF we have a bunch of self-driving cars,
including Waymo's.
I just reached the point where,
you know,
when I'm biking an extra Waymo,
I actually feel safer
than if I was biking next to a human driver,
right?
I don't worry about whether I'm in their blind spot
or not or anything like this.
So I feel this protective effect
by being next to this Waymo.
And I want everybody to have this protective effect, right?
I want everybody to have this protective effect with health AI.
There are these studies showing that,
you know,
if you have a doctor in your family,
that adds a protective effect
to your health as well. And I want everybody, whether they're a patient or a health professional,
to think about the ways in which, like, as a patient, you want to feel safer having this.
As a health professional, you want this to be a safety net for the decisions that you're making.
So that's another frontier that I think we're going to cross in the next six months or so,
which is really exciting, this kind of inflection point.
Another thing that we're thinking about is kind of the right ways to think around post-deployment,
monitoring of certain workflows. And I think a good example here that I love to talk about is
our AI clinical co-pilot study that we did with Penda Health. This was a study where we worked
with these 20 or so clinics in Nairobi and actually thought about the ways in which we can deploy
a safety net for clinicians in that context, which is basically monitoring things that they
type into their electronic health record and only interrupting their flow when there's
something potentially concerning that's going on or potential error or things like this.
And what we found is that when we deployed this to clinicians in the setting, that there was
actually a statistically significant reduction in diagnostic and treatment errors for the
clinicians who are using this tool versus not. And I think this is a step in the direction of
moving beyond kind of model evaluations and even monitoring of the ways in which people are thinking
about using chatyPD today to actually like thinking about workflows in which these technologies can
be deployed and the right ways to evaluate those workflows after deployment. I think that's another
frontier that we are really excited about and would love to see more from our partners. Nate, what do you
think the challenges are going to be? I'll start with talking through some of the challenges that
exist on the professional side. So each day when healthcare professionals use AI, they're looking for
the ability to trust what they're seeing in the answer. And so a lot of our recent work has been
making sure that answers that the AI is providing are not just grounded in what the model
was training on, but are grounded in the latest medical literature, the latest guidelines.
And sometimes the latest guidance from their own institution or their own region,
some conditions are treated differently in areas of different areas of the country,
other times different care settings have different levels of resources,
different levels of specialists and additional services on hand.
And it can be helpful as a healthcare professional to be able to quickly navigate that
and come up with completely personalized care plans.
And so building connectivity within chat GPT to not only be HIPAA aligned
and be used in these secure environments,
but also be able to combine sensitive information with the latest medical knowledge,
I think is a great path that we've started down and something that will continue to keep trust
as the top priority between how healthcare professionals engage with AI.
So I think one of the other challenges is that the systems themselves in healthcare are quite siloed.
both at an organization level, but also at the tools that have to be used within each organization.
AI thus far has been deployed on a really a point solution basis in the technology industry,
but increasingly the connectivity is becoming available to connect the dots between the hundreds of different systems,
some analog, some digital, some structured, some unstructured, many decentralized,
many not on the cloud, being able to connect all of those through unified AI layers to actually
make sure that patients and information isn't falling through the cracks and that the connectivity
can be maximized to actually bring the greatest amount of impact. That's hard in healthcare,
and it's certainly not something that we can say is solved. But with many of our recent products,
ranging from chat GPT for healthcare and its connectivity to apps and connectors to the OpenAI API
for healthcare, to our frontier foundation for models and agents, we think increasingly there's
going to be an opportunity to really accelerate what is possible within the healthcare system
and what agents can achieve. Part of this seems like it's very collaborative, working
of the healthcare industry. And I noticed when using the chatchipD health app, the first thing I did
was able to put in my records and get all of that. And it looked like there was a lot of just
cooperation working across the sort of ecosystem to do this. How has that come to be? Where is
headed? It's extremely important that all of the health care system has an equal chance to
contribute and engage nationally and internationally.
with providing the context that will help empower patients to receive the best possible answers from chat GPT.
And so on the electronic health record side, this means working with the government and centers for Medicare and Medicaid services,
adopting national standards for electronic health record sinking so that patients in just a few taps are able to bring in their context in consented ways.
it's being able to tap into existing standards like mobile phones and the most popular consumer
health products and the most popular biosensors and wearables to make sure again in just one
or two taps patients are able to not only bring in that information but leverage it in thoughtful
ways in ways that may not have been possible without the combined set of data that can
exist in this sort of ecosystem. So for instance, being able to reference your recent exercise
activity when making a plan of how to spend your evening or being able to even do things as
simple as, you know, reference your overnight sleep and stress when your agent is helping
you set your calendar for the next day and what tasks you make take on first.
It's very exciting.
You know, I have, you know, we're a smart ring, a watch, whatever, but I get this data and all I kind of have in my apps are rings to look at and get like, okay, I guess it's doing something.
Being able to plug in a chatchiped has been fantastic because now I'm able to ask those kinds of questions.
But that's very exciting.
What you talk about too is if you get a plan from your doctor suggestions is literally say, hey, I didn't walk enough yesterday.
What should I do today?
I've had to be really good at menu planning and literally go on this menu film, tell me what to order and whatnot.
And so you're saying we're just going to get more of that and much better.
Yeah, and that's why our partnerships, I believe, are so important because in these instances,
chat GPT doesn't replace the incredible technology that our partners are building to go deep on health insights for a particular wearable.
But our surface area, our opportunity to bring in that health information can now extend to the many different ways people use chat GPT.
such as what they're going to cook for dinner
or how they're going to plan their afternoon.
You know, sometimes I think of two patients
and one patient has to navigate the healthcare system by themselves.
And the other patient maybe has a spouse come with them.
And that spouse has a clipboard
and used to work as a healthcare professional
and is very attentive, if not neurotic,
and can follow up on details
and is connected to your personal calendar.
And the best aspects of that with consent for the patients that want to,
I think represents a future where we can make it easier and easier for patients to follow
care plans to play active captain-like roles in their own health in partnership with their care
teams and their physicians.
And I think if we can remove a lot of the friction that historically exists,
between those processes, whether it's just information not following or there's a lot to keep track of
or a lot of old information to parse and bring in, we can do a tremendous amount of good or we can
help patients themselves be empowered to do a tremendous amount of good in their own care plans.
And you know as a physician that it's hard to give as much time as you would like because you're
going to always to have more patients you have to deal with and you have hours in the day.
And it's interesting to see kind of a technology that has infinite time, infinite patients, to be able to do that as a compliment to that.
I mean, if there's one thing that healthcare professionals are short on, it's time.
So when we think about our role internally at OpenAI, we often break down the work that we're doing into three buckets.
Raise the floor.
So make sure that AI and the benefits of AI are accessible to everyone.
and that could be patients, that could be healthcare professionals and others working in health-related
industries, sweep the floor, which means help doctors and help other health professionals
save time from the tremendous administration and bureaucratic burdens that they have every day
so that they can spend more time with their patients.
And then thirdly, raise the ceiling.
You know, the impact that AI can have in health care, I think will, you know,
allow us to look back on this space in a few years and say, wow, we have all accelerated together
in a way that medicine is still in the driver's seat, but is also far more empowered than ever
before.
Yeah.
I don't think anybody feels like their doctor spent too much time with them.
So it looks like this is going to be helpful to solve for that.
What has been your favorite aha or wow or this is a really cool moment in the intersection
of AI and healthcare?
I'll answer your question in a non-standard way, which is, I think the most amazing thing to
seem for me in the last year has been the rate of adoption of health, actually even beyond the
Chachapiti health product before we announced the Chachipti health product. It's been one of our
fastest-growing use cases, this kind of health and wellness questions, and we shared that hundreds
of millions of people a week are starting to use Chachapiti for health and wellness. I think seeing that
rapid growth, especially, you know, coming from a background of being motivated to work on this problem
because I felt like healthcare and clinical AI world were not super aware of the potential of LLMs in health care and seeing how far we had come.
I think it's been a really special moment for me.
There's no doubt that the adoption of this technology and the fact that it is increasingly collaborative with the healthcare system,
it is increasingly driving feedback loops back to us to improve the models is the most meaningful thing and the most mission aligned thing.
But what I also get excited about is what our research team has increasingly be able to,
able to give back to them using that feedback.
And not only is it the capabilities of the models,
but it's what can be unlocked once those models are allowed to run longer and have more
context.
And we're starting to see discoveries of medications that have been sitting on a shelf
that all of a sudden AI has found ways for them to have meaningful and direct value
in inpatient lives.
it is starting to scale experiments that we as individuals wouldn't have been able to juggle on our own.
And that partnership combined with that increased capability to finally move from being interesting to being useful and increasingly to being transformative is, I think, what is the most exciting thing for us heading into this year.
Now that you've been working on this for some time, you've been engaging with clinicians and talking to people,
helping deploy this, what has been some of the feedback you've seen?
I think the experience of flying to Nairobi and seeing the clinicians using the tool
and the ways in which we did this thing, which we call active change management,
where we worked really closely with these clinicians and flew to Kenya a couple of times
to think about the ways that we could deepen their workflows using the AI tool
and make it something that not only made sense to them,
but actually became kind of something that was indispensable for them.
And so as we were concluding the study, the team was actually thinking about, the team at Penda Health was thinking about potentially running another study.
And they actually had a lot of hesitance around running another study because that would have involved having some group of clinicians using AI and some group of clinicians not using AI.
They actually felt that it was dangerous to have a group of clinicians not using the AI.
And so that's the point at which I was like, wow, we have done something major here.
I think the stories that we get back from our members every day are one of the most meaningful parts of the job.
And these are from caregivers that are increasingly under strain, taking care of family members,
trying to navigate their own health at the same time.
This is from doctors and nurses who are truly overloaded every day.
And we can help them extend their expertise and compress the tough parts of their,
their day a little bit more. And then sometimes, and this is more rare but increasing, it's the
miracle cases. It's the patient who had been bouncing around the system for years, the unsolved
diagnosis, the emergency where information wasn't present. And suddenly being able to step in and
assist and accelerate and bring people into the care that could really help is truly a
privilege. It's exciting. It's as an amplifier and every doctor I know wants to be able to do
more for their patients. Thank you very much. This has been very interesting, guys. Thank you.
