Microsoft Research Podcast - The AI Revolution in Medicine, Revisited: How AI is reshaping the future of healthcare and medical research
Episode Date: June 12, 2025Technologists Bill Gates and Sébastien Bubeck discuss the state of generative AI in medicine, how access to “medical intelligence” might help empower people across healthcare, and how AI’s acce...lerating improvements are likely to affect both delivery and discovery.
Transcript
Discussion (0)
In The Little Black Bag, a classic science fiction story, a high-tech doctor's kit
of the future is accidentally transported back to the 1950s into the shaky hands of
a washed-up alcoholic doctor.
The ultimate medical tool, it redeems the doctor wielding it, allowing him to practice
gratifyingly heroic medicine.
The tale ends badly for the doctor and his treacherous assistant, but it offered a picture of how advanced technology could transform medicine, powerful when it
was written nearly 75 years ago and still so today.
What would be the AI equivalent of that little black bag? At this moment when new capabilities
are emerging, how do we imagine them into medicine? This is the AI Revolution in Medicine Revisited.
I'm your host, Peter Lee.
Shortly after OpenAI's GPT-4 was publicly released,
Kerry Goldberg, Dr. Zach Kohohani and I published The AI Revolution in
Medicine to help educate the world of healthcare and medical research about
the transformative impact this new generative AI technology could have. But
because we wrote the book when GPT-4 was still a secret, we had to speculate. Now,
two years later, what did we get right and what did we get wrong?
In this series, we'll talk to clinicians, patients, hospital administrators, and others
to understand the reality of AI in the field and where we go from here.
The book passage I read at the top is from Chapter 10, The Big Black Bag. In Imagining AI Medicine,
Kerry, Zach, and I included in our book two fictional accounts.
In the first, a medical resident consults
GPT-4 on her personal phone
as the patient's in front of her crashes.
Within seconds, it offers
an alternate response based on recent literature.
In the second account, a 90-year-old woman with several chronic conditions is
living independently and receiving near constant medical support from an AI aid.
In our conversations with the guests we've spoken to so far,
we've caught a glimpse of these predicted futures, seeing how clinicians and
patients are actually using AI today and how developers are leveraging
the technology in the healthcare products and services they're creating.
In fact, that first fictional account isn't so fictional after all, as most of the doctors
in the real world actually appear to be using AI, at least occasionally, and sometimes much
more than occasionally to help in their daily clinical work.
And as for the second fictional account, which is more of a science fiction account,
it seems we are indeed on the verge of a new way of delivering and receiving health care,
though the future is still very much open.
As we continue to examine the current state of AI and health care and its potential to transform the field. I'm pleased to welcome Bill Gates and Sebastian Bubeck.
Bill may be best known as the co-founder of Microsoft, having created the company with
his childhood friend Paul Allen in 1975. He's now the founder of Breakthrough Energy,
which aims to advance clean energy innovation, and TerraPower, a company developing groundbreaking
nuclear energy and
science technologies. He also chairs the world's largest philanthropic organization,
the Gates Foundation, and focuses on solving a variety of health challenges around the globe
and here at home. Sebastian is a research lead at OpenAI. He was previously a distinguished
scientist, vice president of AI, and a colleague of mine here at Microsoft, where his work included spearheading the development of the family
of small language models known as PHY.
While at Microsoft, he also co-authored the discussion-provoking 2023 paper, Sparks of
Artificial General Intelligence, which presented the results of early experiments with GPT-4
conducted by a small team from Microsoft Research.
Here's my conversation with Bill Gates and Sebastian Gubbe.
Bill, welcome. Thank you. Seb? Yeah, hi, Peter. Nice to be here.
You know, one of the things that I've been doing just to get the conversation warmed up is to talk
about origin stories. And what I mean about origin stories is, you know, what was the first contact
that you had with large language models or the concept of generative AI that convinced you or made you think that something really important
was happening.
And so Bill, I think I've heard the story about the time when the OpenAI folks, Sam
Aldman, Greg Brockman and others showed you something, but could we hear from you what
those early encounters were like and what was going through your mind? Well, I'd been visiting OpenAI soon after it was created
to see things like GPT-2 and to see the little arm they had
that was trying to match human manipulation
and looking at their games like Dota
that they were trying to get as good as human play.
And honestly, I didn't think the language model stuff they were doing,
even when they got to GPT-3, would show the ability to learn.
You know, in the same sense that a human reads a biology book
and is able to take that knowledge and access it,
not only to pass a test, but also to create new medicines.
And so my challenge to them was that if their LLM could get a five on
the advanced place in biology test, then I would say, okay,
it took biologic knowledge and encoded it in an accessible way,
and that I didn't expect them to do that very quickly,
but it would be profound.
And it was only about six months
after I challenged them to do that,
that an early version of GPT-4,
they brought up to a dinner at my house,
and in fact, it answered most of the questions that night very well.
The one that got totally wrong,
because it was so good, we kept thinking,
you know, we must be wrong.
It turned out it was a math weakness that, you know,
we later understood that that was an area,
weirdly, of incredible weakness of those early models.
But that was when I realized, okay,
the age of cheap intelligence was at its beginning.
Yeah.
So I guess it seems like you had something similar to me
in that my first encounters,
I actually harbored some skepticism.
Is it fair to say you were skeptical before that?
Well, the idea that we've figured out
how to encode and access knowledge in this very deep sense
without even understanding the nature of the encoding,
that is a bit weird.
We have an algorithm that creates the computation,
but even say, okay, where is the president's birthday
stored in there?
Where is this fact stored in there?
The fact that even now, when we're playing around
and getting a little bit more sense of it, it's
opaque to us what the semantic encoding is. It's kind of amazing to me. I thought the
invention of knowledge storage would be an explicit way of encoding knowledge, not an
implicit statistical training.
Yeah. Yeah. All right. So, Seb, on the same topic,
as we say at Microsoft,
I got pulled into the tent because this was a very secret project.
Then I had the opportunity to select a small number of researchers in
MSR to join and start investigating this thing seriously.
The first person I pulled in was you.
What was that?
And so what were your first encounters?
Because I actually don't remember what happened then.
Oh, I remember it very well.
My first encounter with GPT-4 was in the meeting
with the two of you, actually.
But my kind of first contact, the first moment
where I realized that something was happening
with generative AI was before that.
And I agree with Bill that I also wasn't too impressed by GPT-3.
I thought that it was kind of very naturally mimicking the web, sort of parroting what
was written there in a nice way, still in a way which seemed very impressive, but it
wasn't really intelligent in any way.
But shortly after GPT-3, there was a model before GPT-4 that really shocked me, and this
was the first image generation model, Dali 1.
So that was in 2021.
And I will forever remember the press release of OpenAI, where they had this prompt of an
avocado chair.
And then you had this image of the avocado chair. And then you had this image of the avocado chair. And what really shocked me is that
clearly the model
kind of understood what is
a chair, what is an avocado, and was able
to merge those concepts.
So this was really to me the first
moment where I saw some understanding
in those models.
Just to get the time right, that was before I pulled you into
that. That was before. That was like
a year before. And now I will tell you how we went from that moment
to the meeting with the two of you and GPT-4.
So once I saw this kind of understanding,
I thought, OK, fine.
It understands concept, but it's still not able to reason.
As Bill was saying, it cannot learn from your document.
It cannot reason.
So I set out to try to prove that.
This is what I was in the business of at the time, trying to prove things in
mathematics. So I was trying to prove that basically autoregressive
transformers could never reason. So I was trying to prove this and after a year of
work I had something reasonable to show and so I had the meeting with the
two of you and I had this example where I wanted to say there is no way that
an LLM is going to be able to do X and then as soon as I don't know
if you remember Bill but as soon as I said that you said oh but wait a second
I had you know the open AI crew at my house recently and they showed me a new
model why don't we ask this new model this question and we did and it solved
it on the spot and that really honestly just changed my life.
Like, you know, I had been working for a year
trying to say that this was impossible
and just right there it was shown to be possible.
One of the very first things I got interested in
because I was really thinking a lot about healthcare
was healthcare and medicine.
And I don't know if the two of you remembered
but I ended up doing a lot of tests.
I ran through step one and step two
of the US Medical Licensing Exam,
did a whole bunch of other things.
I wrote this big report,
it was, I can't remember, a couple hundred pages,
and I needed to share this with someone.
There weren't too many people I could share with,
so I sent, I think, a copy to you, Bill,
I sent a copy to you, Seb. I hardly slept for about a week putting that
report together, and yeah, I kept working on it. But I was far from low, and I think
everyone who was in the tent, so to speak, in those early days was going through something
pretty similar.
All right. So I think, of course, a lot of what I put in the report also ended
up being examples that made it into the book. But the main purpose of this conversation
isn't to reminisce about or indulge in those reminiscences, but to talk about what's happening
in healthcare and medicine. And, you know, as I said, we wrote this book,
we did it very, very quickly.
So you helped Bill, you provided a review
and some endorsements.
But honestly, we didn't know what we were talking about
because no one had access to this thing.
And so we just made a bunch of guesses.
So really the whole thing I wanted to probe
with the two of you is,
now with two years of experience out in the world,
what do we think is happening today?
Is AI actually having an impact, positive or negative,
on healthcare and medicine?
And what do we now think is going to happen
in the next two years, five years, or 10 years?
And so I realize it's a little bit too abstract
to just ask it that way, so let me just try to narrow
the discussion and guide us a little bit.
The kind of administrative and clerical work,
paperwork around healthcare,
and we made a lot of guesses about that.
That appears to be going well, but Bill, I know we've discussed this sometimes that you
think there ought to be a lot more going on.
Do you have a viewpoint on how AI is actually finding its way into reducing paperwork? Well, I'm stunned.
I don't think there should be a patient-doctor meeting
where the AI is not sitting in
and both transcribing,
offering to help with the paperwork
and even making suggestions,
although the doctor will be the one
who makes the final decision about
the diagnosis and whatever prescription gets done.
It's so helpful.
When that patient goes home and their son who wants to understand what happened has
some questions, that AI should be available to continue that conversation.
And the way you can improve that experience
and streamline things and involve the people who advise you,
I don't understand why that's not more adopted
because there you still have the human in the loop
making that final decision,
but even for follow-up calls
to make sure the patient did things to understand it,
they have concerns and knowing when to escalate back to the doctor,
the benefit is incredible.
And that thing is ready for prime time.
That paradigm is ready for prime time in my view.
Yeah, there are some good products,
but it seems like the number one use right now,
and we kind of got this from some of the previous guests
in previous episodes, is the use of AI
just to respond to emails from patients.
Does that make sense to you?
Yeah. So maybe I want to second what Bill was saying, but maybe take a step back first. You know, two years ago, like the concept of clinical scribes, which is one of the things
that we're talking about right now, it would have sounded, in fact, it sounded two years ago,
borderline dangerous because everybody was worried about hallucinations.
What happens if you have this AI listening in and then transcribe something wrong?
Now two years later, I think it's mostly working.
And in fact it is not yet fully adopted, you're right,
but it is in production, it is used in many, many places.
So this rate of progress is astounding
because it wasn't obvious that we would be able
to overcome those obstacles of hallucination.
It's not to say that hallucinations are fully solved,
but at least in this closed system they are.
Now, I think more generally what's going on
in the background is that there is something
that we, that certainly I underestimated,
which is this management overhead.
So I think the reason why this is not adopted everywhere
is really a training and teaching aspect.
People need to be taught those systems,
how to interact with them.
And one example that I really like,
a study that recently appeared,
where they tried to use ChatGPT for diagnosis,
and they were comparing doctors without and with
ChatGPT.
And the amazing thing, so this was a set of cases where the accuracy of the doctors alone
was around 75%.
ChatGPT alone was 90%.
So that's already kind of mind blowing.
But then the kicker is that doctors with ChadGPT was 80%.
Intelligence alone is not enough.
It's also how it's presented, how you interact with it.
And ChadGPT is, it's an amazing tool.
Obviously, I absolutely love it.
But it's not, you don't want the doctor to have to type in, you know, prompts and use
it that way.
It should be, as Bill was saying, kind of running continuously in the background, sending you notification.
And you have to be really careful at the rate
at which those notifications are being sent.
Because if they are too frequent,
then the doctor will learn to ignore them.
So all of those things matter, in fact,
at least as much as the level of intelligence of the machine.
One of the things I think about, Bill, in that scenario
that you described, doctors
do some thinking about the patient when they write the note.
So I'm always a little uncertain whether it's actually, you wouldn't necessarily want to
fully automate this, I don't think.
Or at least there needs to be some prompt to the doctor to make sure that the
doctor puts some thought into what happened in the encounter with the patient.
Does that make sense to you at all?
At this stage, you know, I'd still put the onus on the doctor to write the conclusions
and the summary and not delegate that. The trade-offs you make a little bit are somewhat dependent
on the situation you're in.
If you're in Africa where most people never meet
a real doctor their entire life,
the idea of being able to have some of this advice
and diagnosis is extremely advantageous
because you're comparing it to nothing.
So yes, the doctor's still going to have to do a lot of work, but just the quality of
letting the patient and the people around them interact and ask questions and have things
explained, that alone is such a quality improvement,
it's mind blowing.
So since you mentioned Africa,
and of course this touches on the mission
and some of the priorities of the Gates Foundation,
and this idea of democratization of access
to expert medical care.
What's the most interesting stuff going on right now?
Are there people in organizations or technologies that are impressing you or that you're tracking?
Yeah, so the Gates Foundation has given out a lot of grants to people in Africa doing education,
agriculture, but more health healthcare examples than anything.
And the way these things start off, they often start out either being
patient-centric in a narrow situation, like, okay, I'm a pregnant woman, talk to me.
Or I have infectious disease symptoms, talk to me.
Or they're connected to a health worker, where they're helping that worker get their job done.
And we have lots of pilots out in both of those cases.
The dream would be eventually to have the thing the patient consults be so broad that
it's like having a doctor available who understands the local things.
We're not there yet, but over the next two or three years, particularly given the worsening
financial constraints against African health systems, where the withdrawal of money has
been dramatic, figuring out how to take this, what I sometimes call free intelligence, and
build a quality health system around that, we will have to be more radical in low-income
countries than any rich country is ever going to be.
Also, there's maybe a different regulatory environment. So some of those things maybe are easier because right now, I think the world hasn't figured
out how to and whether to regulate, let's say, an AI that might give a medical diagnosis
or write a prescription for medication.
Yeah, I think one issue with this, and it's also slowing down the deployment of AI in
healthcare more generally, is the deployment of AI in healthcare more generally is a lack
of proper benchmark.
Because you know, you were mentioning the USMLE, for example, that's a great test to
test human beings and their knowledge of healthcare and medicine.
But it's not a great test to give to an AI.
It's not asking the right question.
So finding what are the right questions to test whether an AI system is ready to give diagnosis in a constrained setting, that's a very, very important
direction which, to my surprise, is not yet accelerating at the rate I was hoping for.
Okay, so that gives me an excuse to get more in now into the core AI tech because something
I've discussed with both of you is this issue of what are the
right tests.
And you both know the very first test I give to any new spin of LLM is I present a patient,
the results, a mythical patient, the results of my physical exam, my mythical physical
exam, maybe some results of some initial labs, and
then I present our proposed differential diagnosis.
And if you're not in medicine, differential diagnosis you can just think of as a prioritized
list of the possible diagnoses that fit with all that data.
And in that proposed differential, I always intentionally make two mistakes. I make a textbook technical error in one of the possible elements of the differential
diagnosis and I have an error of omission.
And you know, I just want to know, does the LLM understand what I'm talking about and
all the good ones out there do now?
But then I want to know, can it spot the errors?
And then most importantly, is it willing to tell me I'm wrong?
I've made a mistake.
That last piece seems really hard for AI today.
And so let me ask you first,
because at the time of this taping, of course,
there was a new Spender GPT-40 last week
that became overly sycophantic.
In other words, it was actually prone in that test of mine
not only to not tell me I'm wrong, but it actually praised me for the creativity
of my differential. What's up with that? Yeah, I guess it's a testament to the
fact that training those models is still more of an art than a science. So it's a
difficult job. Just to be clear with the audience, we have rolled back that
version of GPT-4.0. So now we don't have the Sycophant version out there. Yeah, no, it's a
really difficult question. It has to do, as you said, it's very technical. It has to do with the
post-training and how, like, where do you nudge the model? So you know there is this very classical,
by now, technique called RLHF,
where you push the model in the direction of a certain reward model.
So the reward model is just telling the model, you know, what is good,
and what behavior is good, what behavior is bad.
But this reward model is itself an LLM.
And Bill was saying at the very beginning of the conversation
that we don't really understand how those LLMs deal with concepts,
like, you know, where is the capital of France located, things like that.
It's the same thing for this reward model.
We don't know why it says that it prefers one output to another.
And whether this is correlated with some sycophant is, you know, something that we discovered
basically just now.
That if you push too hard in optimization on this reward model, you
will get a sycophant model.
So it's kind of what I'm trying to say is we became too good at what we were doing and
we ended up in fact in a trap of the reward model.
I mean, you do want, it's a difficult balance because you do want models to follow your
desires.
It's a very difficult, very difficult balance.
So this brings up then the following question for me, which is the extent to which we think
we'll need to have specially trained models for things.
So let me start with you, Bill.
Do you have a point of view on whether we will need to, quote unquote, take AI models
to med school, have them specially trained. If you're going to deploy something
to give medical care in underserved parts of the world, do we need to do something special
to create those models?
We certainly need to teach them the African languages and the unique dialects so that
the multimedia interactions are very high quality.
We certainly need to teach them the disease prevalence
and unique disease patterns,
like neglected tropical diseases and malaria.
So we need to gather a set of facts
that somebody trying to go for a US customer base
wouldn't necessarily have that in there.
Those two things are actually very straightforward
because the additional training time is small.
I'd say for the next few years,
we'll also need to do reinforcement learning
about the context of being a doctor
and how important certain behaviors are.
Humans learn over the course of their life to some degree
that I'm in a different context and the way I behave
in terms of being willing to criticize or be nice.
How important is it? Who's here?
What's my relationship to them?
Right now, these machines don't have that broad social experience,
and so if you know it's going to be used for health things,
a lot of reinforcement learning of the very best humans
in that context would still be valuable.
Eventually, the models will, having read all the literature
of the world about good doctors, bad doctors,
it'll understand as soon as you say,
I want you to be a doctor diagnosing somebody,
all of the implicit reinforcement
that fits that situation will be there.
And so I hope three years from now,
we don't have to do that reinforcement learning,
but today for any medical context,
you would want a lot of data to reinforce tone,
willingness to say things when there might be something significant at stake.
Yeah, so something Bill said kind of reminds me of another thing that I think
we missed, which is the context also, and the specialization
also pertains to different, I guess, we still call modes. Although I don't know if the idea
of multimodal is the same as it was two years ago. But what do you make of all of the hubbub
around, in fact, within Microsoft Research, this is a big deal,
but I think we're far from alone,
medical images and vision, video, proteins and molecules,
cell, cellular data and so on.
Yeah, okay, so there is a lot to say to everything
to the last couple of minutes.
Maybe on the specialization aspect,
I think there is hiding behind this really fundamental scientific question of whether
eventually we have a singular AGI that kind of knows everything and you can just explain
your own context and it will just get it and understand everything. That's one vision.
I have to say, I don't particularly believe in this vision.
In fact, we humans are not like that at all.
I think, hopefully, we are general intelligences, yet we have to specialize a lot.
And I did myself a lot of RL reinforcement learning on mathematics.
That's what I did, I spent a lot of time doing that.
And I didn't improve on other aspects, you know, in fact, probably degraded on other aspects.
So it's, I think it's an important example to have in mind.
I think I might disagree with you on that though, because
like doesn't a model have to see both good science and bad science in order to be able
to gain the ability to discern between the two?
Yeah, no, that absolutely.
I think there is value in seeing the generality, in having a very broad base, but then you
kind of specialize on verticals.
And this is where also open weights models, which we haven't talked about
yet, are really important because they allow you to provide this broad base to everyone
and then you can specialize on top of it. So we have about three hours of stuff to talk about,
but our time is actually running low. So I think there's a more provocative question. It's almost a silly question,
but I need to ask it of the two of you,
which is, is there a future where AI replaces doctors
or replaces medical specialties that we have today?
So what does the world look like, say, five years from now?
Well, it's important to distinguish
healthcare discovery activity
from healthcare delivery activity.
We focus mostly on delivery.
I think it's very much within the realm of possibility
that the AI is not only accelerating healthcare discovery,
but substituting for a lot of the roles of,
I'm an organic chemist or I run various types of assays.
I can see those which are testable output type jobs,
which still very high value.
I can see some replacement in those areas before the doctor.
The doctor still understanding the human condition and
long term dialogues, they've had a lifetime of reinforcement of that,
particularly when you get into areas like mental health.
So I wouldn't say in five years,
either people will choose to adopt it.
But it will be profound that there'll be this nearly free intelligence
that can do follow-up,
that can help you make sure you went through different possibilities.
And so I'd say, yes, we'll have doctors,
but I'd say healthcare will be massively transformed
in its quality and in efficiency by AI in that time period.
Is there a comparison, useful comparison, say between doctors and say programmers, computer
programmers, or doctors and lawyers?
Programming is another one that has kind of a mathematical correctness to it.
So the objective function that you're trying to reinforce to
as soon as you can understand the state machines,
you can have something that's checkable, that's correct.
So I think programming, which is weird to say,
that the machine will beat us at most programming tasks I think programming, which is weird to say,
that the machine will beat us at most programming tasks before we let it take over roles that have deep empathy,
physical presence and social understanding in them.
Yeah, by the way, I fully expect in five years that AI will produce mathematical
proofs that are checkable for validity, easily checkable, because they'll be written in a
proof checking language like Lean or something, but will be so complex that no human mathematician
can understand them.
I expect that to happen.
I can imagine in some fields like cellular biology we could have the same
situation in the future because the molecular pathways of chemistry,
biochemistry of human cells or living cells is as complex as any mathematics.
And so it seems possible that we may be in a state where in wet lab we see,
oh yeah, this actually works, but no one can understand why.
Yeah, absolutely.
I mean, I think I really agree with Bill's distinction
of the discovery and the delivery.
And indeed, the discovery is when you can check things,
and at the end there is an artifact that you can verify.
You can run the protocol in the wet lab
and see whether you produce what you wanted. So I absolutely agree with that. And in fact, you know, we don't have to talk five
years from now. I don't know if you know, but just recently there was a paper that was
published on a scientific discovery using O3 mini. So this is really amazing. And you
know, just very quickly, just so people know, it was about this statistical physics model,
the frustrated pots model, which has to do with coloring.
And basically, the case of three colors, like more than two colors, was open for a long
time.
And O3 was able to reduce the case of three colors to two colors, which is just astounding.
And this is now.
This is happening right now.
So this is something that I personally didn't expect it would happen so quickly, and it's
due to those reasoning models.
Now on the delivery side, I would add something more to it for the reason why doctors and
in fact lawyers and coders will remain for a long time.
And it's because we still don't understand how those models generalize.
Like at the end of the day, we are not able to tell you
when they are confronted with a really new novel situation,
whether they will work or not.
Nobody is able to give you that guarantee.
And I think until we understand this generalization better,
we're not gonna be willing to just let this system
in the wild without human supervision.
But don't human doctors, human specialists,
so for example, a cardiologist sees a patient in a certain way
that a nephrologist or an endocrinologist might not?
That's right.
But another cardiologist will understand and kind of
expect a certain level of generalization from their peer.
And this, we just don't have it with AI models.
Now, of course, just to exactly, you're exactly right,
that generalization is also hard for humans.
Like, if you have a human trained for one task
and you put them into another task, then you don't,
you often don't know, but you have other examples.
So if you have two humans that were trained on a task
and you put them on another one, then you can expect that they will do the same
on the other task.
Okay.
You know, the podcast is focused on
what's happened over the last two years.
But now I'd like one provocative prediction
about what you think the world of AI and medicine
is going to be at some point in the future.
You pick your time frame.
I don't care if it's two years or 20 years from now.
But what do you think will be different about AI and medicine in that future than today?
Yeah, I think the deployment is going to accelerate soon.
Like, we're really not missing very much.
There is this enormous capability overhang.
Like even if progress completely stops,
with current system, we can do a lot more
than what we're doing right now.
So I think this will, this has to be realized,
you know, sooner rather than later.
And I think it's probably dependent on these benchmarks
and proper evaluation
and tying this with regulation. So these are things that take time in human society and
for good reason. But now we already have two years, you know, give it another two years
and it should be really...
Will AI prescribe your medicines? Write your prescriptions?
I think yes. I think yes.
Okay, Bill?
Well, I think the next two years will have massive pilots.
And so the amount of use of the AI still in a co-pilot type mode, you know, we should
get millions of patient visits, you know, both in general medicine and in the mental
health side as well.
And I think that's going to build up both the data and the confidence
to give the AI some additional autonomy.
Are you going to let it talk to you at night when you're panicked about your mental health
with some ability to escalate?
And I've gone so far as to tell politicians
with national health systems that if they deploy AI
appropriately, that the quality of care,
the overload of the doctors,
the improvement in the economics will be enough
that their voters will be stunned
because they just don't expect this.
And they could be reelected just on this one thing
of fixing what is a very overloaded
and economically challenged health system
in these rich countries.
My personal role is gonna be to make sure
that in the poor countries, there isn't some leg, in fact, in many cases that will be more
aggressive because we're comparing to having no access to doctors at all.
I think whether it's India or Africa, there'll be lessons that are globally valuable because we need medical intelligence and,
you know, thank God, AI is going to provide a lot of that.
Well, on that optimistic note, I think that's a good way to end.
Bill, I really appreciate all of this.
I think the most fundamental prediction we made in the book is that
AI would actually find its way into the practice of medicine and I think that that at least has
come true. Maybe in different ways than we expected but it's come true and I think it'll only get
So thanks again, both of you. Yeah, thanks you guys.
Thank you, Peter. Thanks, Ben.
I just always feel such a sense of privilege to have a chance to interact and actually work with people like Bill and Sebastian.
With Bill, I'm always amazed at how practically minded he is. He's really thinking about the nuts and bolts of what AI might be able to do for people.
And his thoughts about underserved parts of the world, the idea that we might actually
be able to empower people with access to expert medical knowledge, I think is both inspiring
and amazing.
And then Seb, Sebastian Bubek, he's just absolutely a brilliant
mind. He has a really firm grip on the deep mathematics of artificial intelligence and
brings that to bear in his research and development work. And where that mathematics takes him isn't
just into the nuts and bolts of algorithms, but into philosophical questions about the nature of
intelligence.
One of the things that Sebastian brought up was the state of evaluation of AI systems.
And indeed, he was fairly critical in our conversation.
But of course, the world of AI research and development is just moving so fast. And indeed, since we recorded our conversation,
OpenAI, in fact, released a new evaluation metric
that is directly relevant to medical applications,
and that is something called Healthbench.
And Microsoft Research also released a new evaluation
approach or process called Adele.
Healthbench and Adele are examples of new approaches to evaluating AI models
that are less about testing their knowledge and ability to pass multiple choice exams,
and instead are evaluation approaches designed to assess how well AI models are able to complete tasks that actually arise every day in typical
healthcare or biomedical research settings.
These are examples of really important good work that speak to how well AI models work
in the real world of healthcare and biomedical research and how well they can collaborate
with human beings in those settings.
You know, I asked Bill and Seb to make some predictions
about the future.
My own answer, I expect that we're
going to be able to use AI to change how we diagnose
patients, change how we decide treatment options.
If you're a doctor or a nurse and you encounter a patient,
you'll ask questions, do a physical exam, call out for labs, just like you do today.
But then you'll be able to engage with the AI based on all of that data and just ask, you know,
based on all the other people who have gone through the same experience, who have similar data,
how are they diagnosed? How are they treated? What were their outcomes?
And what does that mean for the patients I have right now?
Some people call it the patients-like-me paradigm.
And I think that's going to become real because of AI within our lifetimes.
That idea of really grounding the delivery in healthcare and medical practice through data and intelligence,
I actually now don't see any barriers
to that future becoming real.
I'd like to extend another big thank you
to Bill and Sebastian for their time
and to our listeners, as always, it's a pleasure to have you along for the ride.
I hope you'll join us for our remaining conversations as well as a second co-author roundtable with
Kieran and Zach.
Until next time. you