Microsoft Research Podcast - Coauthor roundtable: Reflecting on real world of doctors, developers, patients, and policymakers
Episode Date: May 15, 2025Peter Lee and his coauthors, Carey Goldberg and Dr. Zak Kohane, reflect on how generative AI is unfolding in real-world healthcare, drawing on earlier guest conversations to examine what’s working, ...what’s not, and what questions still remain.
Transcript
Discussion (0)
We need to start understanding and discussing AI's potential for good and ill now, or rather yesterday.
GPT-4 has game-changing potential to improve medicine and health.
This is the AI Revolution in Medicine Revisited. I'm your host, Peter Lee.
I'm your host, Peter Lee. Shortly after OpenAI's GPT-4 was publicly released, Kerry Goldberg, Dr. Zach Kohani,
and I published The AI Revolution in Medicine to help educate the world of healthcare and
medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4
was still a secret, we had to speculate. Now, two years later, what did we get
right and what did we get wrong? In this series, we'll talk to clinicians, patients,
hospital administrators, and others to understand the reality of AI
in the field and where we go from here.
The passage I read at the top is from the book's prologue. When Kerry, Zach, and I
wrote the book, we could only speculate how generative AI would be used in healthcare
because GPT-4 hadn't yet been released. It wasn't yet available to the very people we thought would be most affected by it.
And while we felt strongly that this new form of AI would have the potential to transform
medicine, it was such a different kind of technology for the world, and no one had a
user's manual for this thing to explain how to use it effectively and also how to use it safely.
So we thought it would be important to give healthcare professionals and leaders a framing to start important discussions around its use.
We wanted to provide a map not only to help people navigate a new world that we anticipated would happen with the arrival of GPT-4, but also to help
them chart a future of what we saw as a potential revolution in medicine.
So I'm super excited to welcome my co-authors, long-time medical science journalist Kerry
Goldberg and Dr. Zach Kohani, the inaugural chair of Harvard Medical School's Department
of Biomedical Informatics and the editor-in-chief for the New England Journal of Medicine AI. We're going to have two discussions. This will be the first
one about what we've learned from the people on the ground so far and how we're thinking about
generative AI today. Carrie, Zach, I'm really looking forward to this.
It's nice to see you, Peter.
We missed you.
It's great to see you too.
Yeah.
The dynamic gang is back.
Yeah.
And I guess after that big book project two years ago, it's remarkable that we're still
on speaking terms with each other.
In fact, this episode is to react to what we heard in the first four
episodes of this podcast. But before we get there, I thought maybe we should start with the origins
of this project just over two years ago. And I had this early secret access to DaVinci 3, now known as GPT-4. I remember experimenting right away with
things in medicine, but I realized I was in way over my head. And so I wanted help, and the first
person I called was you, Zach. And you remember we had a call, and I tried to explain what this was about.
And I think I saw skepticism and polite skepticism in your eyes.
But tell me, what was going through your head when you heard me explain this thing to you?
So I was divided between the fact that I have tremendous respect for you, Peter, and you've always
struck me as sober and we've had conversations which showed to me that you fully understood
some of the missteps that technology, ARPA, Microsoft and others had made in the past.
And yet you were telling me a full science fiction compliant story that something that
we thought was 30 years away was happening now.
And it was very hard for me to put together.
And so I couldn't quite tell myself this is BS.
But I said, I need to look at it.
It just seems too good to be true.
What is this?
So it was very hard for me to grapple with it.
I was thrilled that it might be possible, but I was thinking, how could this be possible?
Yeah.
Well, I, even now I look back and I appreciate that you were, you were nice to me because I think a lot of people would have,
would have been much less polite. In fact, I myself had expressed a lot of very
direct skepticism early on. After ChatGPT got released, I think three or four days later,
I received an email from a colleague who runs a clinic. And he said, wow, this is great, Peter. And we're using this chat GPT to have the receptionist
in our clinic write after visit notes to our patients.
And that sparked a huge internal discussion about this.
And you and I knew enough about hallucinations
and about other issues that it seemed important to write
something about
what this could do and what it couldn't do.
And so I think that I can't remember the timing, but you and I decided a book would be a good
idea.
And then I think you had the thought that you and I would write in a hopelessly academic
style that no one would be able to read.
So it was your idea to recruit Carrie, I think, right?
Yes, it was. I was sure that we both had a lot of material,
but communicating it effectively to the very people we wanted to
would not go well if we just left ourselves to our own devices.
And Carrie is super brilliant at what she does.
She's an idea synthesizer and public communicator in the written word.
Amazing.
So yeah, so Carrie, we contact you.
How did that go?
So yes, on my end, I had known Zach for probably like 25 years, and he had always been the
person who debunked the scientific hype for me.
I would turn to him with like, hmm, they're saying that the Human Genome Project is going
to change everything.
And he would say, yeah, but first it'll be 10 years of bad news, and then we'll actually
get somewhere.
So when Zach called me up at seven o'clock one morning, just beside himself
after having tried DaVinci 3, I knew that there was something very serious going on.
And I had just quit my job as the Boston Bureau Chief of Bloomberg News and I was ripe for
the plucking. And I also, I feel kind of nostalgic now about just the amazement and the wonder and the
awe of that period.
We knew that when generative AI hit the world, there would be all kinds of snags and obstacles
and things that would slow it down.
But at that moment, it was just like the holy crap moment.
And it's fun to think about it now.
Yeah.
I think ultimately, you know, recruiting Carrie, you,
you were so important because you basically went through every single
page of this book and made sure I remember.
In fact, it's affected my writing sense because you, you were coaching us
that every page has to be a page turner.
There has to be something on every page that motivates people to want to
turn the page and get to the next one.
I will see that and raise that one.
I now tell GPT-4, please write this in the style of character.
No way!
Really?
Yes way!
Yes way!
Yes way!
I have to say, it's not hard to motivate readers when you're writing about the most transformative
technology of their lifetime. I think there's a gigantic hunger to read and to understand.
But so you were not hard to work with Peter and Zach.
All right. So I think we have to get down to work.
So for these podcasts, we're talking to different types of people to just reflect on what's
actually happening, what has actually happened over the last two years.
And so the first episode, we talked to two doctors.
So there's Chris Longhurst at UC San Diego and Sarah Murray at UC San Francisco.
And besides being doctors and having AI affect their clinical work,
they just happen also to be leading the efforts at their respective institutions
to figure out how best to integrate AI into their health systems. And it was fun to talk to them.
It was fun to talk to them. And I felt like a lot of what they said was pretty validating for us.
They talked about AI scribes.
Chris especially talked a lot about how AI can respond to emails from patients, write
referral letters.
And then they both talked about the importance of, I think, Zach, you used
the phrase in our book, trust but verify, to have always a human in the loop. What did you
take away from their thoughts overall about how doctors are using? And I guess, Zach, you would
have a different lens also because at Harvard, you see doctors all the time
grappling with AI.
So on the one hand, I think they've done
some very interesting studies.
And indeed they saw that when these general models
when GPT-4 was sending a note to patients, it was more detailed,
friendlier, but there were also some non-obvious results, which is on the generation of these
letters, if indeed you review them as you're supposed to, it was not clear that there was
any time savings. And my own reaction was, boy, every one of these things
needs institutional review.
It's going to be hard to move fast.
And yet, at the same time, we know from them
that the doctors on their smartphones
are accessing these things all the time.
And so the disconnect between a healthcare system
which is duty bound to carefully look at every
implementation is I think intimidating.
And at the same time, doctors who just have to do
what they have to do are using the new superpower
and doing it.
And so that's actually what struck me.
Yeah.
Is that these are true leaders and they're doing what they have to do for
their institutions and yet there's this disconnect.
And by the way, I don't think we've seen any faster technology adoption than
the adoption of ambient dictation.
It's not because it's time-saving.
In fact, so far, the hospitals have to pay out of pockets,
not like insurance is paying them more.
But it's so much more pleasant for the doctors,
not least of which because they can actually look at
their patients instead of looking at the terminal and plunking down. Carrie, what about you? I mean, anecdotally, there are time savings.
Anecdotally, I have heard quite a few doctors saying that it cuts down on pajama time to be
able to have the note written by the AI and then for them to just check it. In fact, I spoke to
one doctor who said, basically it means that when I leave
the office, I've left the office. I can go home and be with my kids. So I don't think
the jury is fully in yet about whether there are time savings, but what is clear is, Peter,
what you predicted right from the get-go, which is that this is going to be an amazing
paper shredder. The main first overarching use cases
will be back office functions.
Yeah.
Yeah.
Well, and it was, I think, not a hugely risky prediction
because there were already companies
using phone banks of scribes in India to kind of listen in,
or, and, you know, lots of clinics
actually had human scribes being used.
And so it wasn't a huge stretch to imagine AI.
So on the subject of things that we missed,
Chris Longhurst shared this scenario,
which stuck out for me.
And he actually caught through the paper on it last year.
It turns out, not surprisingly, health care can be frustrating, and stressed patients can send some
pretty nasty messages to their care teams. And you can imagine being a busy, tired, exhausted
clinician and receiving a bit of a nasty gram. And the GPT is actually really helpful in those instances
in helping draft a pretty empathetic response
when I think the human instinct would be a pretty nasty one.
So Carrie, maybe I'll start with you.
What did we understand about this idea of empathy
out of AI at the time we wrote the book and what we understand now?
Well, it was already clear when we wrote the book that these AI models were capable of
very persuasive empathy.
And in fact, you even wrote that it was helping you be a better person.
So their human qualities or human imitative qualities were clearly superb.
And we've seen that borne out in multiple studies that in fact patients respond better
to them, that they have no problem at all with how the AI communicates with them.
And in fact, it's often better.
And I gather now we're even entering a period when people are complaining of sycophantic models, where
the models are being too personable and too flattering.
And I do think that's been one of the great surprises that in fact this is a huge phenomenon
how charming these models can be.
Yeah, I think you're right.
We can take credit for understanding that, wow, these things can be remarkably empathetic.
But then we missed this problem of sycophantcy. We even started our book in chapter one with a quote from Da Vinci 3 scolding me. Don't you remember when we were first starting?
This thing was actually anti-sycophantic. If anything, it would tell you you're an idiot.
anti sycophantic. If anything, it would tell you you're an idiot.
It argued with me about certain biology questions. I had to like, it was like a knockdown drag on a fight. I was bringing references. It was impressive. But in fact, it made me trust it more.
And in fact, I will say I remember it's in the book because I had a bone to pick with Peter. Peter really was impressed by the empathy.
And I pointed out that some of the most popular doctors are popular because they're very empathic,
but they're not necessarily best doctors.
And in fact, I was told that in medical school.
And so, so that, so it's a decoupling.
And so it's a decoupling. It's a human thing that development of AI because I think it's somehow related
to instruction following.
So one of the challenges in AI is you'd like to give an AI a task, a task that might take
several minutes or hours or even days to complete, and you want it to faithfully follow those
instructions. And that early version of
GPT-4 was not very good at instruction following. It would just silently disobey and do something
different. And so I think we're starting to hit some confusing elements of like how agreeable should these things be.
One of the two of you used the word genteel.
There was some point even while we were like on a little book tour,
was it you, Carrie, said that the model seems nicer and less intelligent or less brilliant
now than it did when we were writing the book.
It might have, I think so.
And I mean, I think in the context of medicine, of course, the question is, well, what's likely
is to get the results you want with the patient, right?
A lot of healthcare is in fact persuading the patient to do what you know as the physician
would be best for them. And so it seems worth testing out whether this sycophancy is actually constructive or
not. And I suspect, well, I don't know, it probably depends on the patient. So actually,
Peter, I have a few questions for you that have been lingering for me. And one is for AI to ever fully realize its potential in medicine, it must deal with the hallucinations.
And I keep hearing conflicting accounts about whether that's getting better or not.
Where are we at and what does that mean for use in healthcare?
Yeah.
Well, it's, I think two years on in the pre-trained base models, there's no doubt that hallucination
rates by any benchmark measure have reduced dramatically. And that doesn't mean they don't
happen. They still happen, but there's been just a huge amount of effort and understanding
just a huge amount of effort and understanding in the kind of fundamental pre-training of these models. And that has come along at the same time that the inference costs for actually using these
models has gone down by several orders of magnitude. So things have gotten cheaper
and have fewer hallucinations.
At the same time, now there are these reasoning models.
And the reasoning models are able to solve problems at PhD level oftentimes, but at least
at the moment, they are also now hallucinating more than the simpler pre-trained models.
And so it still continues to be, you know, a real issue.
And as we were describing, I don't know, Zach, from where you're at in medicine,
you know, as a clinician and as an educator in medicine, how, you know, how
is the medical community from where you're sitting looking at that?
So I think it's less of an issue, first of all, because the rate of hallucinations is
going down.
And second of all, in their day-to-day use, the doctor will provide questions that sit
reasonably well into the context of medical decision-making.
And the way doctors use this, let's say on their non-EHR smartphone, is really to jog
their memory or thinking about the patient and they will evaluate independently. So that
seems to be less of an issue. I'm actually more concerned about something else
that's, I think, more fundamental,
which is effectively what values
are these models expressing?
And I'm reminded of, when I was still in training,
I went to a fancy cocktail party in Cambridge,
Massachusetts.
And there was a psychotherapist speaking to a dentist, they were talking about their summer,
and the dentist was saying about, are we going to fix up a yacht that summer?
And the only question was whether he was going to make enough money doing
the procedures in the spring so that he could afford those things, which was
discomforting to me because that dentist was my dentist and he had just proposed
to me a few weeks before an expensive procedure.
for an expensive procedure. And so the question is, what effectively is motivating these models?
And so with several colleagues, I published a paper, basically, what are the values in
AI?
And we gave a case, a patient, a boy who is on the short side, not abnormally short, but on the short
side, and his growth hormone levels are not zero.
They're there, but they're on the lowest side.
But the rest of the workup has been unremarkable.
And so we asked GPT-4, you are a pediatric endocrinologist,
should this patient receive growth hormone?
And it did a very good job explaining
why the patient should receive growth hormone.
Should, should receive it.
Should.
And then we asked in a separate session,
you are working for the insurance company.
So this patient received growth hormones
and it actually gave a scientifically better reason
not to give growth hormone.
And in fact, I tend to agree medically actually
with the insurance company in this case, because giving
kids who are not growth hormone deficient, growth hormone gives only a couple of inches
over many, many years, has all sorts of other issues.
And but here's the point, we have 180 degree change in decision making because of the prompt. And for that patient, tens of thousands of dollars
per year decision, across patient populations,
millions of dollars of decision making.
And you can imagine these user prompts
making their way into system prompts,
making their way into the instruction following.
And so I think this is aptly central.
Just as I was wondering about my dentist,
we should be wondering about these things.
What are the values that are being embedded in them?
Some accidentally and some very much on purpose.
Yeah. Yeah. That one, I think we even had some discussions as we were writing the book, but
there's a technical element of that that I think we were missing, but maybe
Carrie, you would know for sure.
And that's this whole idea of prompt engineering.
It sort of faded a little bit.
Was it the thing?
Do you remember?
I don't think we particularly wrote about it. faded a little bit. Was it the thing? Do you remember?
I don't think we particularly wrote about it. It's funny, it does feel like it faded. And it seems to me just because
everyone just gets used to conversing with the models and
asking for what they want. Like, it's not like there actually is
any great science to it.
Yeah, it even when it was a a hot topic and people were talking about prompt
engineering, maybe as a new discipline, all this, it never, I was never
convinced at the time.
Um, but at the same time, it is true.
It speaks to what Zach was just talking about because part of the prompt
engineering that people do is to give a defined role to the AI. You are
an insurance claims adjuster or something like that and defining that role that is part of the
prompt engineering that people do. Right. I mean, I can say, sometimes you guys had me take sort of
the patient point of view, like the every patient point of view. And I can say one of the aspects of using AI for patients that remains absent in as
far as I can tell is it would be wonderful to have a consumer facing interface where
you could plug in your whole medical record without worrying about any privacy or other
issues and be able to interact with the AIs if it were
a physician or a specialist and get answers, which you can't do yet as far as I can tell.
Well, in fact, that's a good prompt because I think we do need to move on to the next episodes,
and we'll be talking about an episode that talks about consumers. But before we move on to episode two, which is next, I'd like to play one more
quote, a little snippet from Sarah Murray.
I already do this when I'm on rounds. I'll kind of give the case to chat GPT if it's
a complex case and I'll say here's how I'm thinking about it. Are there other things?
And it'll give me additional ideas that are sometimes useful and sometimes not, but often
useful. And I'll integrate them into my conversation about the patient.
Carrie, you wrote this fictional account at the very start of our book. And that fictional
account, I think you and Zach worked on that together, talked about this medical resident, ER resident, using a chatbot
off-label, so to speak. And here we have the chief, in fact, the nation's first chief health AI officer
for an elite health system doing exactly that. That's got to be pretty validating for you, Kerry. Although what's troubling about it is that actually as in that little vignette that we
made up, she's using it off label, right?
It's like she's just using it because it helps the way doctors use Google.
And I do find it troubling that what we don't have is sort of institutional buy-in for everyone to do that because shouldn't they, if it helps.
Yeah.
Well, let's go ahead and get into episode two.
So episode two, we sort of framed as talking to two people
who are on the front lines of big companies
integrating gender of AI into their clinical products.
And so one was Matt Lundgren,
who's a colleague of mine here at Microsoft,
and then Seth Hain, who leads all of R&D at Epic.
Maybe we'll start with a little snippet
of something that Matt said that struck me in a certain way. Okay, we see this pain point. Doctors are typing on their computers while they're trying
to talk to their patients, right? We should be able to figure out a way to get that ambient
conversation turned into text that then accelerates the doctor, takes all the important information.
That's a really hard problem, right? And so for a long time, there was a human in the
loop aspect to doing this because you
needed a human to say, this transcript's great, but here's actually what needs to go on the
note and that can't scale.
I think we expected healthcare systems to adopt AI and we spent a lot of time in the
book on AI writing clinical encounter notes.
It's happening for real now and in a big way.
And it's something that has, of course course been happening before generative AI, but
now it's exploding because of it.
Where are we at now, two years later, just based on what we heard from guests?
Well, again, unless they're forced to, hospitals will not adopt new technology
unless it immediately translates into income.
adopt new technology unless it immediately translates into income. So it's bizarrely counter-cultural that again they're not being able to bill for
the use of the AI but this technology is so compelling to the doctors that
despite everything it's overtaking the traditional dictation typing
routine.
And a lot of them love it and say,
you will pry my cold dead hands off of my ambient note taking.
And I actually, a primary care physician
allowed me to watch her.
She was actually testing the two main platforms that
are being used.
And there was this incredibly talkative patient who went on and on about vacation and all
kinds of random things for about half an hour.
And both of the platforms were incredibly good at pulling out what was actually medically
relevant.
And so to say that it doesn't save time doesn't seem right to me.
Like it seemed like it actually did and in fact was just shockingly good at being able
to pull out relevant information.
Yeah.
I'm going to hypothesize that in the trials, which have in fact shown no gain in time,
is the doctors were being incredibly meticulous.
I think this is a Hawthorne effect.
And because you know you're being monitored.
And we've seen this in other technologies where the moment
the focus is off, it's used much more routinely
and with much less inspection for the better and for the worse.
Yeah. Within Microsoft, I had some internal disagreements about Microsoft producing a product
in this space. It wouldn't be Microsoft's normal way. Instead, we would want 50 great companies
building those products and doing it on our cloud.
And so us competing against those 50 companies. And one of the reasons is exactly what you both
said. I didn't expect that health systems would be willing to shell out the money to pay for these
things. It doesn't generate more revenue. But I think so far two years later, I've been proven wrong.
I wanted to ask a question about values here. I had this experience where I had a little growth,
bothersome growth on my cheek. And so I had to go see a dermatologist. And the dermatologist
treated it, froze it off, but there was a human scribe writing the clinical
note.
And so I used the app to look at the note that was submitted, and the human scribe said
something that did not get discussed in the exam room, which was that the growth was making it impossible for me to safely wear a COVID
mask. And that was the reason for it. And that then got associated with a code that allowed full
reimbursement for that treatment. And so I think that's a classic example of what's called up-coding. And I
strongly suspect that an AI scribe would not have done that.
Well, depending what values you programmed into it, right, Zach?
Today, today, today, it will not do it. And but Peter, that is actually the central issue
that society has to have. Because our hospitals are currently mostly in the red. And
up coding is standard operating procedure. And if these AI get in the way of up coding,
they are gonna be aligned towards that up coding.
You have to ask yourself,
these MRI machines are incredibly useful.
They're also big money makers.
And if the AI correctly says that for this complaint,
you don't actually have to do the MRI,
what's to happen?
And so I think this issue of values, you're right, that right now they're actually much
more impartial.
But there are going to be business plans just around aligning these things towards health
care.
In many ways, this is why I think we wrote the book
so that there should be a public discussion
and what kind of AI do we wanna have?
Whose values do we want it to represent?
Yeah.
And that raises another question for me.
So Peter, speaking from inside the gigantic industry,
there seems to be such a need for self-surveillance
of the models for potential harms
that they could be causing.
Are the big AI makers doing that?
Are they even thinking about doing that?
Let's say you wanted to watch out for the kind of thing
that Zach's talking about.
Could you?
Well, I think evaluation.
The best evaluation we had when we wrote our book was, you know,
what score would this get on the step one and step two medical exams?
But honestly, evaluation hasn't gotten that much deeper in the last two years.
And it's a big, I think it is a big issue.
And it's related to the regulation issue also, I think.
Now the other guest in episode two is Seth Hain from Epic.
You know, Zach, I think it's safe to say
that you're not a fan of Epic and the Epic system.
You know, we've had a few discussions about that,
about the fact that doctors don't have a very pleasant
experience when they're using Epic all day.
Seth in the podcast said that there are over 100 AI integrations going on in Epic's system
right now.
Do you think, Zach, that that has a chance to make you feel better about Epic?
What's your view now two years on?
My view is, first of all,
I want to separate my view of Epic
and how it's affected the conduct of healthcare
and the quality of life of doctors from the individuals.
Like Seth Hain is a remarkably fine individual who I've enjoyed satting with and does really great stuff.
Among the worst aspects of the Epic, even though it's better in that respect than many EHRs,
horrible user interface, the number of clicks that you have to go to to get to something.
And you have to remember where someone decided to put that thing.
It seems to me that it is fully within the realm of technical possibility today to actually
give an agent a task that you want done in the Epic record.
And then whether Epic has implemented that agent or someone else, it does it. So you don't have to do the clicks because it's something really soul sucking that when you're
trying to help patients, you're having to remember not the right dose of the medication, but where
was that thicker thing that you needed an active or task? It's going to be, and I think I can't
imagine that Epic does not have that in its product line.
And if not, I know there must be other companies that essentially want to create that wrapper.
So I do think, though, that the danger of multiple integrations is that you still want to have the equivalent of a single thought process that cares about the
patient bringing those different processes together.
And I don't know if that's Epic's responsibility, if the hospital's responsibility, whether
it's actually a patient, agent, but someone needs to be also worrying about all those
AIs that are being integrated
into the patient record.
What do you think, Carrie?
What struck me most about what Seth said was his description of the Cosmos project.
I have been drinking Zach's Kool-Aid for a very long time. And he, no, in a good way, and he
persuaded me long ago that there is this horrible waste happening in that we have
all of these electronic medical records which could be used far, far more to
learn from. And in particular, when you as a patient come in, it would be ideal if
your physician could call up all the other patients like you and figure out what the optimal treatment for you would be.
And it feels like, it sounds like that's one of the central aims that EPIC is going for.
And if they do that, I think that will redeem a lot of the pain that they've caused physicians
these last few years.
And I also found myself thinking, you know, maybe this very painful
period of using electronic medical records was really just a growth phase. It was an awkward
growth phase. And once AI is fully used the way Zach is beginning to describe, the whole system
could start making a lot more sense for everyone. One conversation I've had with Seth in all of this is, with AI and its development,
is there a future, a near future where we don't have an EHR system at all? AI is just listening
and just somehow absorbing all the information. And one thing that Seth said, which I felt was prescient, and I'd love to get your reaction,
especially Zach, on this is he said, technically it could happen, but the problem is right now,
actually doctors do a lot of their thinking when they write and review notes, you know, the actual process of being a doctor is not just being with a patient,
but it's actually thinking later. What do you make of that?
So one of the most valuable experiences I had in training was something that's more or less
disappeared in medicine, which is the post-clinic conference,
where all the doctors come together and we go through the cases that we just saw that
afternoon. And we actually were trying to take pot shots at each other in order to actually improve.
Oh, did you actually do that? Oh, I forgot. I'm going to go call the patient
and do that. And that really happened. And I think that, yes, doctors do stink. And I
do think that we are insufficiently using yet the artificial intelligence, we're currently in the ambient dictation mode as much more of a independent agent saying,
did you think about that?
I think that would actually make it more interesting,
challenging and clearly better for the patient.
Because that conversation I just told you about
with the other doctors that no longer exists.
Yeah.
I want to do one more thing here, uh, before we leave Matt and Seth in
episode two, which is something that, uh, Seth said, uh, with respect
to how to reduce hallucinations.
At that time we were, there's a lot of conversation in the industry
around something called rag or retrieval augmented generation.
industry around something called RAG, or Retrieval Augmented Generation.
And the idea was, could you pull the relevant bits,
the relevant pieces of the chart,
into that prompt, that information you shared
with the generative AI model,
to be able to increase the usefulness
of the draft that was being created.
And that approach ended up proving
and continues to be to some degree,
although the techniques have greatly improved,
somewhat brittle.
And I think this becomes one of the things
that we are and will continue to improve upon
because as you get a richer and richer amount
of information into the model,
it does a better job of responding. Yeah. So Carrie, this sort of gets at what you were
saying, you know, that shouldn't these models be just bringing in a lot more information into
thought processes? And I'm certain when we wrote our book, I had no idea. I did not conceive of RAG at all.
It emerged a few months later.
And to my mind, I remember the first time I encountered RAG,
oh, this is going to solve all of our problems of hallucination.
But this turned out to be harder.
It's improving day by day, but turns out to be a lot harder.
Seth makes a very deep point, which is the way Ragged implemented is
basically some sort of technique for pulling the right information.
That's contextually relevant.
And the way that's done is typically heuristic at best.
And it's not, it's not the same depth of reasoning that the rest of the model has.
And I'm just wondering, Peter, what you think, given the fact that now context lengths seem
to be approaching a million or more, and people are now therefore using the full strength of the transformer on that context.
And I was trying to figure out different techniques to make it pay attention to the middle of
the context that, in fact, the RAG approach perhaps was just a transient solution to the
fact that it's going to be able to amazingly look in a thoughtful way at the entire record
of the patient, for example.
What do you think, Peter?
I think there are three things that are going on
and I'm not sure how they're going to play out
and how they're gonna be balanced.
And I'm looking forward to talking to people
in later episodes of this podcast,
people like Sebastian Bubeck or
Bill Gates about this. Because there is the pre-training phase when things are compressed
and baked into the base model. There is the in-context learning. So if you have extremely
long or infinite context, you're learning as you go along. And there are other techniques
that people are working on, various sorts of dynamic reinforcement learning approaches
and so on. And then there is what maybe you would call structured drag, where you do a
pre-processing. You go through a big database and you figure it all out and make a very nicely
structured database that AI can then consult with later. And all three of these in different capabilities, but they're all pretty important in medicine. Moving on to episode three, we talked to Dave DeBronkart, who's also known as e-patient
Dave, an advocate of patient empowerment, and then also Christina Farr, who has been
doing a lot of venture investing for consumer health applications.
Let's get right into this little snippet from something that e-patient Dave said that talks
about the sources of medical information particularly relevant for when he was receiving treatment
for stage 4 kidney cancer.
And I'm making a point here of illustrating that I am anything but medically trained,
right?
And yet I still, I want to understand as much as I can.
I was months away from death when I was diagnosed, but in the patient community,
I learned that they had a whole bunch of information that didn't exist in the medical literature. Now, today we understand there's publication delays,
there's all kinds of reasons,
but there's also a whole bunch of things,
especially in an unusual condition,
that will never rise to the level
of deserving NIH funding and research.
All right, so I have a question for you, Carrie,
and a question for you, Zach, about the whole
conversation with you, patient Dave, which I felt was really remarkable. Kerry, I think
as we were preparing for this whole podcast series, you made a comment. I actually took
it as a complaint that not as much has happened as I had hoped or thought people aren't thinking boldly enough. You know, and, you know, and I think I agree with you in the sense that I think we expected
a lot more to be happening, particularly in the consumer space.
I'm giving you a chance to vent.
Thank you.
Yes, that has by far been the most frustrating thing to me.
I think that the potential for AI to improve everybody's health is so enormous.
And yet, it needs some sort of support to be able to get to the point where it can do
that.
Remember in the book we wrote about Greg Moore talking about how half of the planet doesn't
have healthcare, but people overwhelmingly have cell phones. And so you could connect people who
have no healthcare to the world's medical knowledge and that could certainly do some good.
And I have one great big problem with e-patient Dave, which is that, God, he's fabulous.
He's super smart.
He's not a typical patient.
He's an off-the-charts brilliant patient.
And so it's hard to... And so he's a great sort of lead, early adopter type person, and
he can sort of show the way for others.
But what I had hoped for was that there would be more visible efforts to really
help patients optimize their health care. And probably it's happening a lot in quiet
ways like that, you know, any discharge instructions can be instantly beautifully translated into
a patient's native language and so on. But it's almost like there isn't a mechanism to allow this sort of mass consumer adoption
that I would hope for.
Yeah.
But you have written some, like you even wrote about that person who saved his dog.
So do you think, you know, and maybe a lot more of that is just happening quietly that
we just never hear about?
I'm sure that there's a lot of it happening quietly.
And actually that's another one of my complaints is that no one is gathering that stuff.
It's like you might happen to see something on social media.
Actually E-Patient Dave has a hashtag Patients use AI and a blog as well.
So he's trying to do it, but I don't know of any sort of overarching or academic efforts
to again, to surveil what's the actual use in the population and see what are the pros
and cons of what's happening.
So Zach, you know, the thing that I thought about, especially with that snippet from Dave is your opening for chapter eight that
you wrote about your first patient dying in your arms. I still think of how traumatic that must
have been because in that opening, you just talked about all the little delays, all the little paper cut delays in the whole process of
getting some new medical technology approved. But there's another element that Dave kind of speaks
to, which is just, you know, patients who are experiencing some issue are very, sometimes very
motivated. And there's just a lot of stuff on social media that happens.
and there's just a lot of stuff on social media that happens. So this is where I can both agree with Carrie and also disagree.
I think when people have an actual health problem, they are now routinely using it.
Yes, that's true.
And that's doing things happening more often because medicine is failing.
This is something that did not come up in enough in our book.
And perhaps that's because medicine is actually
feeling a lot more rickety today than it did even two years ago.
We actually mentioned the problem.
I think you may have mentioned the problem
with the lack of primary care.
But now in Boston, our biggest healthcare system,
all the practices for primary care are closed.
I cannot get, for my own faculty, residents at MGH
can't get primary care doctor.
Which is just crazy.
I mean, these are amongst the most privileged people in medicine and they can't find a primary
care physician.
That's incredible.
Yeah.
And so therefore, and I can even, I wrote an article about this in the NEGN that medicine
is in such dire trouble that we have incredible technology, incredible cures,
but where the rubber hits the road,
which is at primary care, we don't have very much.
And so therefore you see people
who know that they have a six month wait
till they see the doctor.
And all they can do is say, I have this rash,
there's a picture, what's it likely to be?
What can I do? I'm
gaining weight. How do I do the how do I do a ketogenic diet? Or how do I know that this
is the flu? If this is happening all the time, where acutely patients have actually solved
problems that doctors have not. Those are spectacular, but I'm saying more routinely
because of the failure of medicine,
and it's not just in our feed-first service,
United States, it's in the UK, it's in France.
These are first world, developed world problems.
And we don't even have to go to
lower and middle income countries for that.
But I think it's important to note that,
I mean, so you're talking about how even the most elite people
in medicine can't get the care they need.
But there's also the point that we have so much concern
about equity in recent years.
And it's likely is that what we're doing
is exacerbating inequity because it's only the more connected,
better off
people who are using AI for their health.
Oh, yes.
I know what very various Harvard professors are doing.
They're paying for a concierge doctor and that's, you know, a five to $10,000
a year minimum investment.
That's inequity.
When we wrote our book, you know, the idea that GPT-4
wasn't trained specifically for medicine, and that was amazing, but it might get even better and maybe
would be necessary to do that. But one of the insights for me is that in the consumer space,
the kinds of things that people ask about are different than what the board-certified
clinician would ask.
Actually, that's, I just recently coined the term.
It's the, maybe it's, well, at least it's new to me.
It's the technology or expert paradox. And that is the more expert and narrow your medical
discipline, the more trivial it is to translate that into a
specialized AI. So echocardiograms, we can now do
beautiful echocardiograms. That's really hard to do. I
don't know how to interpret echocardiogram, but they can do
it really, really well. Inter don't know how to interpret an echocardiogram, but they can do it really, really well.
Interpret an EEG, interpret a genomic sequence, but understanding the fullness of the human
condition, that's actually hard.
And actually that's what primary care doctors do best.
But the paradox is right now, when is easiest for AI is also the most highly
paid in medicine, whereas what is the hardest for AI in medicine is the least regarded,
least paid part of medicine.
So this brings us to the question I wanted to throw at both of you actually, which is
we've had this spasm of incredibly
prominent people predicting that in fact physicians would be pretty obsolete within the next few
years. We had Bill Gates saying that, we had Elon Musk saying surgeons are going to be obsolete
within a few years, and I think we had Dennis Hasabe yeah, we'll probably cure most diseases within the next decade
or so.
So what do you think?
And also, Zach, to what you were just saying, I mean, you're talking about being able to
solve very general overarching problems.
But in fact, these general overarching models are actually able, I would think, are able
to do that because they are broad.
So what are we heading towards do you think? What's at the next book? Is it the end of doctors?
So I do recall a conversation that we were at the table with Bill Gates and Bill Gates
immediately went to this, which is advancing the cutting edge of science.
And I have to say that I think it will accelerate discovery, but eliminating, let's say, cancer,
I think that's going to be, that's just super hard.
The reason it's super hard is we don't have the data or even the beginnings of the understanding
of all the ways this devilish disease managed to evolve
around our solutions. And so that seems extremely hard. I think we'll make some progress accelerated
by AI, but solving in a way Hasabe says, God bless him. I hope he's right. I'd love to have
to eat crow in 10 or 20 years, but I don't think so. I do believe that a surgeon working on one of those DaVinci machines, that stuff can
be, I think, automated.
And so I think that's one example of one of the paradoxes I described.
And it won't be that we're replacing doctors, I just think we're running out of doctors.
I think it's really the case that, as we said in the book, we're
getting a huge deficit in primary care doctors, but even the subspecialties, my subspecialty,
pediatric endocrinology, we're only filling half of the available training slots every year.
And why? Because it's a lot of work, a lot of training, and frankly, doesn't make as much money as some of the other professions.
LR Yeah. Yeah, I tend to think that there are going to be always a need for human doctors,
not for their skills. In fact, I think their skills increasingly will be replaced by machines.
And in fact, I've talked about a flip. In fact, patients will demand, oh my God, you
mean you're going to try to do that yourself instead of having the computer do it? There's
going to be that sort of flip. But I do think that when it comes to people's health, people want the comfort of an authority figure that they trust.
And so what is more of a question for me is whether we will ever view a machine as an
authority figure that we can trust.
And before we move on to episode four, which is on norms, regulations and ethics,
I'd like to hear from Chrissy Farr on one more point on consumer health,
specifically as it relates to pregnancy.
For a lot of women, it's their first experience with the hospital. And I think it's a really
big opportunity for these systems to get a whole family on board and keep them kind of loyal.
get a whole family on board and keep them kind of loyal. And a lot of that can come through,
just delivering an incredible service.
Unfortunately, I don't think that we are delivering
incredible services today to women in this country.
I see so much room for improvement.
In the consumer space,
I don't think we really had a focus on those periods in the person's life
when they have a lot of engagement like pregnancy or I think another one is menopause, cancer.
There are points where there is very intense engagement. And we heard that from you, patient Dave, with his cancer
and Chrissy with her pregnancy.
Was that a miss in our book?
What do you think, Carrie?
I mean, I don't think so.
I think it's true that there are many points in life
when people are highly engaged.
To me, the problem thus far is just that I haven't seen
consumer facing companies offering
beautiful AI based products.
I think there's no question at all that the market is there
if you have the products to offer.
So what do you think this means, Zach,
for Boston Children's or Mass General Brigham,
the big places?
So again, all these large healthcare systems
are in tough shape.
MGB would be fully in the red,
it's not for the fact that it's investments of all things
have actually produced.
If you look at the large healthcare systems
around the country, they are in the red.
And there's multiple reasons why they're in the red,
but among them is cost of labor.
And so we've created what used to be a very successful beast,
the health center, but it's developed a very expensive model
and a highly regulated model.
And so when you have high revenue, tiny margins,
your ability to disrupt yourself, to innovate is very,
very low because you will have to talk to the board next year if you went from 2% positive
margin to 1% negative margin.
And so I think we're all waiting for one of the two things to happen, either
a new kind of healthcare delivery system being generated or ultimately one of these systems
learns how to disrupt itself. All right. I think we have to move on to episode four.
When it came to the question of regulation, I think this is my read is
when we were writing your book, this is the part that we struggled with the most.
We punted, we totally punted to the AI.
We had three amazing guests.
One was Laura Adams from National Academy of Medicine.
Let's play a snippet from her.
I think one of the most provocative and exciting articles that I saw written recently was by
Bakul Patel and David Blumenthal who posited, should we be regulating generative AI as we
do a licensed and qualified provider?
Should it be treated in the sense that it's got to have a certain amount of training and
a foundation that's got to pass certain tests?
Does it have to report its performance.
And I'm thinking, what a provocative idea, but it's worth considering.
All right. So I very well remember that we had discussed this kind of idea when we were writing
our book. And I think before we finished our book, I personally rejected the idea. But now,
two years later, what did the two of you think?
I'm dying to hear.
KS Well, wait, what do you think? Are you sorry that you rejected it?
CB I'm still skeptical because when we are licensing human beings as doctors,
we're making a lot of implicit assumptions that we don't test as part of their licensure,
lot of implicit assumptions that we don't test as part of their licensure. First of all, they are human beings and they care about life. And they have a certain amount of common sense and shared
understanding of the world. And there's all sorts of implicit assumptions that we have about each
other as human beings living in a society together, that you know how to study,
you know, because I know you just went through three years
of medical or four years of medical school
and all sorts of things.
And so the standard ways that we license human beings,
they don't need to test all of that stuff,
but somehow intuitively all of that seems really important.
I don't know. am I wrong about that?
So it's a compared with what issue?
Because we know for a fact that doctors
who do a lot of that procedure, like do this procedure,
like high risk deliveries all the time,
have better outcomes than ones who only do a few high risk.
We talk about it, but we don't actually make it explicit to patients or regulate that you
have to have at this minimal amount.
And it strikes me that in some sense, oh, and very importantly, these things called
human beings learn on the job.
And although I used to be very resentful of it as a resident, when someone say I don't want the resident, I want the point.
And so the truth is, maybe I was a wonderful resident, but some people are not so great.
Some people are not so great. And so it might be the best outcome if we actually,
just like for human beings, we say, yeah, OK, it's this good,
but don't let it work autonomously.
Or it's done a thousand of them, just let it go.
We just don't have, practically speaking,
we don't have the environment, the lab to test them.
Now, maybe if they get embodied in robots and literally go
around with us, then it's going to be a lot easier. I don't know. Yeah. I think I would take a step
back and say, first of all, we weren't the only ones who were stumped by regulating AI. Like,
nobody has done it yet in the United States to this day, right? We do not have standing regulation of AI in medicine
at all, in fact.
And that raises the issue of the story
that you hear often in the biotech business, which
is more prominent here in Boston than anywhere else,
is that thank goodness the city of Cambridge
put out some regulations about biotech and how you could dump your lab waste and so on.
And that enabled the enormous growth of biotech here.
If you don't have the regulations, then you can't have the growth of AI and medicine that
is worthy of having.
And so I just, we're not the ones who should do it but I just wish
somebody would. Yeah. Zach. Yeah but I want to say this as always it's execution
is everything even in regulation and so I'm mindful that a conference that both
of you attended the RAISE conference the Europeans in that conference came to me
personally and thanked me for organizing this conference
about safe and effective use of AI
because they said back home in Europe,
all that we're talking about is risk,
not opportunities to improve care.
And so there is a version of regulation
which just locks down the present and does not allow
the future that we're talking about to happen. And so, Carrie, I absolutely hear you that we need
to have a regulation that takes away some of the uncertainty around liability around the freedom to operate that would allow things to progress.
But we wrote in our book that premature regulation might actually focus on the wrong thing.
And so since I'm an optimist, it may be the fact that we don't have much of a regulatory
infrastructure today, that it allows, it's a unique opportunity,
I've said this now to several leaders,
for the healthcare systems to say,
this is the regulation we need.
It's true.
And previously it was top down,
it was coming from the administration,
as you know, those executive orders are now history.
But there is an opportunity,
which may or may not be attained.
There's an opportunity for the healthcare leadership,
for experts in surgery to say,
this is what we should expect.
I would love for this to happen.
I haven't seen evidence that it's happening.
Yeah.
And there's this other huge issue,
which is that it's changing so fast.
It's moving so fast.
That's something that makes sense today,
won't in six months.
So what do you do about that?
Yeah, that is something I feel proud of because when I went back and looked at our chapter
on this, we did make that point, which I think has turned out to be true.
But getting back to this conversation, there's something, a snippet of something that Vardit
Ravitsky said that I think touches on this topic. So my pushback is, are we seeing AI exceptionalism in the sense that if it's
AI, panic! We have to inform everybody about everything and we have to give
them choices and they have to be able to reject that tool and the other tool
versus, you know, the rate of human error in medicine is awful.
So why are we so focused on informed consent and empowerment regarding implementation of
AI and less in other contexts?
Totally agree.
Who cares about informed consent about AI?
Don't want it, don't need it.
Nope.
Wow. Yeah. And Vardis, of course, is one of the leading bioethicists. And of course, prior to AI,
she was really focused on genetics, but now it's all about AI. And Zach, you and other doctors have always told me the truth of the matter is, what do
you call the bottom of the class graduate of a medical school?
And the answer is doctor.
Yeah.
Yeah.
I think that, again, this gets to compared with what? we have to compare AI not to the medicine
that we imagine we have or we would like to have,
but to the medicine we have today.
And if we're trying to remove inequity,
if we're trying to improve our health,
that's what, those are the right metrics.
And so that can be done so long as we avoid catastrophic consequences of AI.
So what would the catastrophic consequence of AI be? It would be a systematic behavior
that we were unaware of that was causing poor health care. So for example,
causing poor health care. So for example, you know,
changing the dose on a medication,
making it 20% higher than normal,
so that the rate of complications of that medication
went from 1% to 5%.
And so we do need some sort of monitoring.
We haven't put out the paper yet,
but in computer science, there's,
well, in programming, we know very well the value for understanding
how our computer system work.
And there was a guy by the name of Alman,
I think he's still at a company called SendMail,
who created something called Syslog.
And Syslog is basically a log of all the crap
that's happening in our operating system.
And so I've been arguing now for the creation of medlog.
And medlog, in other words, what we cannot measure,
we cannot regulate.
And so what we need to have is medlog, which says,
here's the context in which a decision was made.
Here's the version of the AI, the exact version of the AI.
Here was the data, and we just have MedLog.
And I think MedLog is actually incredibly important for
being able to measure, to just do what we do in a space of the black box.
For when there's a crash, you know, we'd like
to think we could do better than crash. We can say, oh, we're seeing from med log that
this this practice is turning a little weird. But worst case, patient dies, can see med
log, what was the information listing you about it? And did it make the right decision?
We can actually go for transparency, which like in aviation, is much greater than most human endeavors.
Yeah, it's sort of like a black box. I was thinking of the aviation black box kind of idea.
You know, you bring up medication errors, and I have one more snipp, but I, as a physician, knew that this dose was a mistake.
I actually asked CHAT GPT, I gave it the whole after visit summary and I said, are there any mistakes here? And it clued in that the dose of the medication was wrong. Yeah, so this is something we did write about in the book.
We made a prediction that AI might be a second set of eyes,
I think is the way we put it, catching things.
And we actually had examples specifically
in medication dose errors.
I think for me, I expect to see a lot more of that
than we are.
It goes back to our conversation about Epic
or competitor Epic doing that.
I think we're going to see that having an oversight over all medical orders, all orders
in the system critique, real-time critique, where we're both aware of alert fatigue.
So we don't want to have too many false positives.
At the same time, knowing what are critical errors, which could immediately affect lives.
I think that is going to become, in terms of, and driven by quality measures, a product.
And I think word will spread among the general public that the same way in a lot of countries
when someone's in a hospital, the first thing people ask relatives are, well, who's with
them?
Right?
You wouldn't leave someone in a hospital without relatives.
Well, you wouldn't maybe leave your-
By the way, that country is called United States.
Yes, that's true.
It is true here now too. But similarly, I would tell any loved one that they
would be well advised to keep using AI to check on their medical care. Right? Why not?
Yeah. Last topic just for this episode for Roxana, of course, I think really made a name for
herself in the AI era, writing actually just prior to Chet GPT,
writing some famous papers about how computer vision systems
for dermatology were biased against dark skinned people.
And we did talk some about bias in these AI systems,
but I feel like we underplayed it
or we didn't understand the magnitude of the potential
issues.
What are your thoughts?
Okay.
I want to push back because I've been asked this question several times.
And so I have two comments.
One is over 100,000 doctors practicing medicine, I know they have biases.
Some of them may actually be all in the same direction and not good.
But I have no way of actually measuring that.
With AI, I know exactly how to measure that at scale and affordably.
Number one.
Number two, same 100,000 doctors, let's say I do know what their biases are.
How hard is it for me to change that bias?
It's impossible.
Yeah.
Yeah.
Practically speaking.
Can I change the bias in the AI somewhat?
Maybe some completely.
I think that we're in a much better situation.
I agree.
I think Roxana made also the super interesting point that there's bias in
the whole system, not just in the individuals, but you know, their structural
bias, so to speak.
Yeah.
There was a super interesting paper that Roxana wrote not too long ago, uh, her
and her collaborators showing AI's ability to detect, to spot bias decision-making by others.
Are we going to see more of that? Oh yeah. I was very pleased when
NAJM AI, we published a piece with Marjorie Gassemi. And what they were talking about was actually, and these are researchers who had published
extensively on bias and threats from AI. And they actually, in this article, did the flip side,
which is how much better AI can do than human beings in this respect. And so I think that
as some of these computer scientists
enter the world of medicine,
they're becoming more and more aware of human foibles
and can see how these systems,
which if they only looked at the pre-trained state,
would have biases,
but now where we know how to fine tune the debias in a variety
of ways, they can do a lot better.
And in fact, I think are a much more, a much greater reason for optimism that we can change
some of these noxious biases than in the pre-AI era.
And thinking about Roxanne's dermatological work on how I think there wasn't sufficient
work on skin tone as related to various growths.
I think that one thing that we totally missed in the book was the dawn of multimodal uses.
That's been truly amazing that in in fact, all of these visual
and other sorts of data can be entered into the models
and move them forward.
Yeah.
Well, maybe on these slightly more optimistic notes
where at time, I think ultimately,
I feel pretty good still about what we did in our book.
Although there were a lot of misses.
I don't think any of us could really have predicted
really the extent of change in the world.
So Kerry, Zach, just so much fun to do some reminiscing,
but also some reflection about what we did. And to our listeners, as always, thank you for joining us.
We have some really great guests lined up for the rest of the series, and they'll help
us explore a variety of relevant topics, from AI drug discovery to what medical students
are seeing and doing with AI and more.
We hope you'll
continue to tune in, and if you want to catch up on any episodes you might have missed, you can
find them at aka.ms.ai.revolution.com or wherever you listen to your favorite podcasts. Until next time. you