Orchestrate all the Things - Data, analytics, machine learning, and AI in healthcare in 2021. Featuring Gradient Flow Principal Ben Lorica and John Snow Labs CTO David Talby
Episode Date: March 29, 2021What do you get when you juxtapose two of the hottest domains today - AI and healthcare? A peek into the future, potentially. In 2020, few things went well and saw growth. Artificial intelligenc...e was one of them, and healthcare was another one. Artificial intelligence remained on a steady course of growth and further exploration -- perhaps because of the Covid-19 crisis. Healthcare was a big area for AI investment. Today, the results of a new survey focusing precisely on the adoption of AI in healthcare are being unveiled. We caught up with 2 of its architects: Gradient Flow Principal Ben Lorica, and John Snow Labs CTO David Talby, to discuss findings and the state of AI in healthcare. Article published on ZDNet.
Transcript
Discussion (0)
Welcome to the Orchestrate All the Things podcast.
I'm George Amadiotis and we'll be connecting the dots together.
What do you get when you juxtapose two of the hottest domains today,
AI and healthcare, a peek into the future, potentially?
In 2020, few things went well and saw growth.
Artificial intelligence was one of them and healthcare was another one.
Artificial intelligence remained on a steady course of growth and further exploration,
perhaps because of the COVID-19 crisis.
Healthcare was a big area for AI investment.
Today, the results of a new survey focusing precisely on the adoption of AI in healthcare
are being unveiled.
We caught up with two of its architects, Gradient Flow principal Ben Lorica
and Jon Snow Labs CTO David Talby to discuss findings and the state of AI in healthcare.
I hope you will enjoy the podcast. If you like my work, you can follow Link Data Orchestration
on Twitter, LinkedIn and Facebook. So I'm Ben Lorca. I'm the principal at Gradient Flow and also the external
program chair of the NLP Healthcare Summit. And I would say our main motivation for this
survey was just to basically understand really how people are beginning to use AI and machine
learning in healthcare. And part of that obviously was timed
because we were having this conference focus on healthcare.
But David is the real healthcare expert.
I don't know about that, but...
My name is David Talby. I'm the CTO at Johnson & Leibs.
I have been working in the healthcare space for about a decade now
and yeah
I created SparkNLP
and
that's open source, a product on top of it
SparkNLP for healthcare
and most of the work we do
we've been doing in the past few years
is helping large healthcare and pharma organizations put the technology to good use okay okay thank you thank you both for the
introduction and so i guess we can start well from the start which is typically a good a good place
yeah so um going uh going through the uh the survey findings and actually maybe even before findings, it may be a good idea if you say a few words about who responded to the survey.
I know it's mentioned there, but just for the benefit of people who may be listening.
So we targeted a bunch of channels. So we did online advertising,
both on search engines and social media networks. But we also did paid email blasts to healthcare
specific groups and lists. So it turns out there's a lot of large healthcare related groups on LinkedIn.
So we approached those groups and found respondents this way. So we did try to
target our ads as we were doing the survey to people who are in healthcare.
So we ourselves also have people on our list that have declared that their industry is in healthcare, biotech or pharma. So we also
did dedicated email blasts to people on,
in those industries.
Okay, thank you.
So going into your findings, basically,
you start by asking people about the use of various technologies.
And what caught my attention in your findings there
was that you seem to aggregate a number of different technologies.
And some of them I think you also refer to as foundational, such as data integration or business intelligence and this kind of thing and you aggregate those with what I would call more advanced techniques
such as natural language processing or machine learning operations and well this kind of thing.
So what puzzles me there basically is and since in my view there's a kind of spectrum, a kind of continuum among these technologies.
They're not all exactly on the same level.
So some are foundational, like data integration, for example, and some others kind of build on that.
So you need data integration, for example.
It's like the key foundation in which all of the rest can be built. So what really surprised me is how few people, I think,
responded that have actually data integration figured out,
let's say, in their organizations.
I think the findings were something like 45%, not even 50%.
So what do you think of that?
I mean, is data integration really such a pain point in organizations?
So a couple of things.
So the label foundational technologies was just for the report in some ways, right?
So the question really was asking which of these technologies are you using or planning to use.
And George, the way I interpret the results is more ordinal than numerical in the following
sense.
I kind of interpret it as more like people rank order the importance, whereas not necessarily the absolute number of the respondents.
So from that perspective, I think it comes out kind of reasonable, right?
So people rank data integration as the most important thing.
NLP.
NLP was a bit surprising, but then this is healthcare, so there is a
lot of text. But I guess David can speak more to data integration within healthcare. And
I'm assuming since they do a lot of work in healthcare, this is something that he encounters
quite a bit.
Yeah. Happy to hear from David.
Can you hear me?
Yes.
Thank you.
Yeah. So George, you're making very good points.
And I think specifically within healthcare,
there are two points to consider here.
One is, and I think one thing the report shows, I think especially things like NLP,
that they're becoming more foundational in healthcare.
Because, you know, really for 10 years, what we did in healthcare, we deployed EMRs, right?
So basically took what people were doing in paper and we put them in a computer, right?
Which is the same thing we did in the finance industry in the 80s, right? And now what we're discovering is that, well, all the really all the interesting clinical
information is still in text, right?
There's very little that's actually structured, even in healthcare, it's much more so than
in other industries.
So even if you want to ask fairly simple questions, okay, like, oh, show me all the diabetic
patients, right? Show me all the diabetic patients,
show me all the people who should be vaccinated for COVID,
show me what's happening to whatever,
patients with stage two cancer
or this kind of kidney failure over a year.
Today, the way you still do it,
very often you have humans,
you have nurses and doctors who sit down,
read things one by one,
because really the majority of the relevant clinical data
is in free text.
So the fact that now, really for the past two, three years,
we're starting to have algorithms
that at least match human specialist capability
in extracting this kind of information is a big changer,
is definitely a game changer.
And I think one of the things
that were a bit surprising in this survey
is that this is, I think,
this is on its way to become a foundational technology.
That's one thing.
The other thing I would say
in terms of data integration,
I think one of the things that we may be seeing here
is that the definition of what it means keeps moving.
Right?
Because, you know, a few years ago, you know,
if you were able to send invoices and get paid, then you were good, right?
But now really, I mean, you know, we raise the bar, right?
So now, you know, if I go, you know, if I move from a hospital to hospital,
I would expect my whole medical record to be there, right?
And you know what?
I would actually even expect documents to be there.
And it's not that crazy to expect that my images would be transferred as well right so people can see them
right so i think data integration is one of those things that uh you know when i talk to people in
the city yeah we you know we we are where we where we said we would be five years ago okay but the
goalposts have significantly moved now right in, in terms of people's expectation, right? So, no, we haven't figured it out, right?
And even when you get the data, it's also, look, and you see it very often in use cases like preauthorizations with insurance companies, right?
So, of course, you send all the bills to insurance.
But preauthorization, very often, yeah, you fill a form and we text, right?
And you say, oh, why should this patient, you know, why do they need to do a CT scan because the next day is not you know good enough for the use case right it's all text it's all free text
right so so so now we you know there are a lot of these use cases we can say it's not that people
are sending us faxes or paper but it's still yeah when they send us the data look it's still a human
thing or we need to find some way to automate it to actually be able to run a business process. Okay, so I figure from what you're saying that
since you specifically referred to the absence of structured data,
so that perhaps what has happened in the healthcare industry
is that they kind of leapfrogged, let's say. So they went from, well, paper documents to electronic documents
without actually having gone through the intermediate stage of having databases,
which is what happened in many industries.
And that may also explain the prevalence of natural language processing in this industry.
Would that be the case?
I think it could very well be the case, yes.
Yeah, and I think one, absolutely.
And I think we talked about that because there's been such a negative reactions
to EMRs from doctors, because of such the high administrative burden, right?
The fact that an average doctor now spends three hours per day,
you know, typing things into DMR.
It's fairly obvious there's huge reluctance.
Like you're not going to get doctors to fill in combo boxes and checkboxes.
Right. They're just not going to happen. Right.
There's, you know, there's a growing discontent there. Right.
And, and, and burnout.
So what we're turning to, we're turning to, to the other part comes along.
Let's just leave doctors doing what they're doing.
Can we use technology to extract from the free text, right?
What we actually need, right?
And yes, so in a sense, definitely, yes, it's a leapfrog.
We're leapfrogging over that middle generation.
Okay, I see.
And that would also explain what seemed at first blast
like a relatively low percentage of adoption
for business intelligence,
which again, in other industries is quite,
you know, it's stable stakes really.
But what you just said
could very well explain that as well.
So no databases, no BI as well.
Yeah, look, and of course,
I mean, there'll be applications in every health organizations right because you need to know you know like people use it to to do
finances to do operations right to order medications to do all of that or just how many patients we
have how many beds open yeah exactly yes exactly if i don't know exactly how many patients are at
the hospital right you know what do i need to bill them? Right. How much, you know, toilet paper I need to order.
Right.
That's all classic behind that structure data.
But a lot of the, you know, billing, like really how much should I bill?
Do it still?
You have today, you have a human coder who sits, reads and decides.
Right.
All clinical decisions.
Yeah.
I need to read the notes.
Right.
And decide, you know, if I need to know when to invite you back, when to release you home, what kind of post-treatment,
it's all reading the notes, right?
So all of that, yes, NLP is becoming, I think,
foundational there, basically to be able to restructure the data
so that you can enable all those use cases.
That's an interesting peculiarity, let's say, say in this industry which I was not aware of so
always always good to learn something. So moving a bit ahead with the findings of the survey that
you did, the other thing that piqued my interest was on the users of AI technologies.
So it seems from your findings that most AI applications are aimed at clinicians,
health care providers and then patients. I would assume that these are probably the biggest user
groups in this industry. So just in you know, in terms of numbers,
this probably makes sense.
What specific, the one thing that kind of got my attention there
was the fact that, so I need to introduce the fact
that you segregate the organizations
that the survey respondents work for into three different
types.
So you have mature, you have organizations that are kind of toying with the technology
and organizations that are just starting out.
So in the mature category, what picked miners was that about 43% of people who responded said they're building AI applications aimed at drug development professionals.
And this is much, much higher than what you got from respondents in any other category of organizations. And the reason that I'm specifically interested in that
is because we have seen in the last couple of years
a kind of explosion, let's say,
in drug development aided by AI.
So I was wondering if you have any insights there
that could potentially explain this finding.
David?
Yes.
So, look, drug development, especially the very early stages of finding candidates,
yeah, it has become a software problem.
Absolutely.
And very quickly.
We can even see that with the COVID drugs, right?
Yeah, exactly.
Yes.
Yeah, yeah.
Because look, it used to be the case that people would, I mean, it still is the case,
right?
You do a PhD in biology or biochemistry.
You go to a drug company and all you do for 40 years, you read those papers, academic
papers, and you try to find correlations, right?
So you've got to say, oh, wait, we have this molecule and we have someone in whatever mit doing work on it and we have this other drug that actually
uses the same molecule and you know it has another line you know it has it has a different
name but the same biological mechanism and then you do research on the biological mechanism come
and say oh this could you know may actually work that way okay what you can do, and you can do this very effectively,
actually more effectively than even human experts,
right, you can say, look,
let's look at all the academic papers that are out there,
all the patents that are submitted, right,
all the investment disclosures, okay, every week.
And basically, I try to do exactly the same,
build those knowledge graphs automatically, right,
that says, oh, if I have this drug, right?
And this drug is actually this molecule, right?
But it's not a brand name, it's a molecule.
This molecule is part of this family.
It's not to have this biological mechanism
that these side effects, okay?
This side effect, you know, here it's negative,
but you know, you could, you know,
in other patients it's going to be positive.
So what if I just try this,
not at five milligram, but at 40 milligram, right?
Would this be effective, right?
So right now, yes, there's a whole number of companies
who are building now drug pipelines,
basically by looking at medical ontologies,
gene ontologies, gene products,
of course, academic literature,
and trying to uncover candidates.
Because the other thing that you have
is what the large pharma companies would do.
They would buy and license this, right?
So if you want to, you know,
if you're really just a software person,
you know, just a software person
and you want to make money this way,
you can get to a point
where you have even pre-phase one trials, okay?
But you can come and say,
look, I have this, I patented this.
I think, you know, this molecule is,
you know, has potential, right? And by potential means I patented this. I think this molecule has potential.
And by potential means it's 5% likely to succeed.
This is still a very high risk thing.
But 5% is much, much higher than the 0.05% that you usually get at that stage.
And basically, you can get revenue just by selling it to Johnson & Johnson or Pfizer or one of the large pharma companies and work with them that way on some you know usually it's some kind of you know revenue share risk sharing component um so i think really the leap here is there was an industry that was so manual and so
human intensive right well really your competitive differentiator was oh look i have 500 phds right
to just do this day to day, right?
To a point where really within three and four years,
you can kind of say, hey,
I can actually do better than all of them almost,
right, with software, right?
So right now, really,
we are at the very initial stage of that,
where yes, you know,
everybody's rushing to kind of build those companies
and try that in different specialties.
And they're also, George,
they're also now more, they're also now
more accessible
open source tools as well
for
people who may not be experts in machine
learning
who can start playing around with
some of the techniques that you hear about, like
deep learning, for example.
And there's
benchmarks now as well.
In computer vision, there's a famous benchmark
that really led to a lot of progress called ImageNet.
So there are now similar benchmarks in drug discovery as well.
Yep.
Okay.
Hey, it's super exciting. And honestly, it's super important to all of us. Yeah. Yep. Okay. Hey, it's super exciting
and honestly,
it's super important
to all of us.
Yeah.
I was wondering
if you have
all of what you said
makes sense to me.
What I'm really
not much wiser
about is
whether you have
any idea
why this
specific peak, let's say, among mature
organizations, mature in terms of their use of AI, in their use of AI for this specific purpose.
Well, because maybe they've kind of addressed some of the low-hanging fruit, and so this is more of an advanced use case, David?
It could be.
And it could also be because I think some organizations probably,
because they already had the business workflow manually,
so it's kind of its acceleration of revenue stream they already have,
and they kind of know how to do it if they have the candidates.
That's another potential.
And, you know, we did have the covet thing yeah that kind of rushed everyone specifically to you know
to work on that but yeah one thing to remember you know we while we were all focused on covet
like we still have to cure cancer and alzheimer right and heart disease and kidney disease
so so there's a you know there's a, you know,
like to say that this is a trillion dollar opportunity,
it's almost an understatement, right?
This is super important for everyone.
Okay.
And picking up on what you mentioned, Ben,
about, well, the kind of democratization,
let's say, of machine learning tools via open source.
This is another one of your findings in the survey.
What you basically found is that the use of open source and public cloud,
which I think often goes hand in hand, is prevalent among your respondents. And again, I find it kind of normal because I think this is what we see across the board
in most industries.
What I found interesting is the industry specific insights that you provided.
You seem to suggest that this has a lot to do with the fact that healthcare is
a regulated industry and therefore people are especially aware of not just being compliant with
regulations such as HIPAA for example but they also don't really want their data to leave their premises.
Or if they do, they want to be absolutely certain that it's, you know,
for example, the cloud provider is regulation compliant.
So I was wondering if you want to explain your domain-specific insights that may apply here.
So I don't have domain-specific insights because I'm not in healthcare.
I'll let David address that.
But one thing, George, that we did not mention in this report is we did have an earlier survey
last year with David and his crew on NLP in general, not healthcare.
And in that survey, we had enough healthcare respondents that we were able to examine some
of the responses from healthcare.
And one of the things I believe, David, as I recall it from that survey, the healthcare people, when they were evaluating NLP solutions,
they were very adamant about having control of being able to tune models,
having control of data, making sure data doesn't leave their premises.
Yes.
And they were not that keen on cloud providers from that perspective.
Okay.
Yes.
So what happens in healthcare?
Basically, people who own patient data, first of all, often it's just illegal to share it without patient consent.
You could get in criminal trouble.
If you collaborate with Amazon and google and you give them things to
improve their models right you have to notify your customers sometimes each patient the other thing
they're very expensive so today for example if a pharma company wants to license you know
whatever you know a you know pharma company does a deal with you know say kaiser permanente or
blue cross blue shield to get access to 50000 records to do research for 50,000 patients one-time access is six you know
six month project they may pay them you know half a million dollars for access to the data
so when this started and you know completely de-identified and anonymized yeah exactly
completely de-identified yes identify just illegal right just illegal. Right. So when AWS said, oh, yeah, just, you know, send us, you know, to AWS comprehend medical. And then, by the way, we also use your notes to train our models. Right. I mean, that was laughable by the, you know, healthcare and pharma industry. Right. That was like, you know, and we pay you for the services. Right. It's more like, you know, right? That's not how it works.
And all the cloud providers have to adjust for that.
So that's one thing that happens in healthcare.
People do not want to share data.
They're afraid.
And by the way, they're afraid for good reasons, right?
Because there are breaches.
And when there are breaches, they get publicity, right?
Because people care about their medical data being out there.
So yeah, so definitely in this industry look first of all there's a lot of reluctance to share data with anyone it's often illegal there's you know no one would do it without
legal sign-off without compliance sign-off without security sign-off and they shouldn't the other
thing look as an industry yeah it's much more compliance aware, right?
This is not like, you know,
the e-commerce industry or social or gaming
where people just do stuff and, you know,
and kind of you fix it, you know, you scale later, right?
This is the industry where, you know,
if you do something just to experiment with
and things go wrong, you know,
people end up in jail, right?
So, you know, so it's a very different industry from that perspective.
In that light, it makes much sense. What you just described, for example, if
people in this industry are able to make access to anonymized data revenues to them, then they are completely disincentivized
to allow access to toolmakers to that same data.
It makes perfect sense.
Yep.
And one thing I would say, and I do apologize,
I had 30 meetings, I have to jump off.
One thing I would say that has also changed
with the advent of transfer learning is,
and that's a good thing,
you no longer need those large data sets
in order to train and tune models.
So I can tell you right now, I mean,
we've been working with this.
We can achieve state-of-the-art accuracy
in terms of peer-reviewed public academic benchmarks
and also with this in real-world systems
without the need for millions of patients.
So I think one of the good things that transfer learning
is giving us is that there's no longer this heart rate of
between privacy and accuracy.
At least
within this niche industry.
Okay.
I think that would also be a good message to send.
Yeah. And with that, George,
I would leave you in Ben's super capable hands.
And thank you
very much. Thank you very much
for joining us, much. Thank you very much for joining us, David.
Thank you.
Okay.
All right, we lost our healthcare domain expert,
but I'll try my best, George.
That's okay.
I mean, I'm sure we can manage.
Yeah, yeah.
Okay, so another thing
that we actually briefly referred to earlier was a kind of discrepancy, let's say,
that I noticed in the results you got in the types of data that people use.
And it seemed like there's much reliance on text for the reasons that David explained earlier
and not so much on the other data types.
So images and time series data a little bit.
And what I found a bit surprising was that video and audio were really underused except again for early for respondents who
who work for early stage organizations and my interpretation of that finding would be that well
maybe there's startups in the in the industry that are actually looking to capitalize on this
untapped opportunity on this untapped medium. Do you think that that makes
sense? Yeah, and also the other thing to bear in mind is audio and video tend to be more advanced
capabilities, right? So video in particular, you have to have the ability to capture and store video streams.
I don't know how many healthcare organizations have that ability.
And then the ability to basically tune your models because it's most likely that a lot of these computer vision models may not work off the box for your very specific use case. So you'll need to work with a vendor that will allow you to really work with domain
experts to fine tune your models. And I do talk to a lot of computer vision companies, and I think that most of the people I talk to, maybe this is by example,
tend to work in other domains and not in healthcare.
So manufacturing is a big thing among computer vision companies, for example.
Yeah.
Yeah, there's images and then there's audio and video.
I would presume that for images, for things such as reading x-rays, for example, there probably is already.
There is, yeah. Like I said, those tools might even be built into your medical imaging technology because those medical imaging companies are starting to partner with startups that supply that computer vision and deep learning capability. But I don't think that anyone is at the point where
those systems are very widely deployed because like I said, it's also very domain and data
specific, right? So if it's a disease in a medical image, you will have to go in,
you will have to, first of all, label all of those images
so that you can train models.
So that work needs to be done up front.
And, you know, are those images digitized from the past, right?
Okay.
Moving on then to what you found on how people in this industry treat model validation.
This is a part of machine learning operations and what you found which having said what we have said so far
about the specifics of the industry I guess it also chimes to the point that David made earlier
so how people are aware about whether they want to work with vendors outside the industry or not.
And it seems that to a large extent, this kind of reflects the maturity of the organization
as to how people treat that.
So organizations that are relatively new are more willing to defer model validation to vendors, while organizations who are more mature are not.
Yes, yeah. Not much of a surprise there, right? Because basically, the more you use machine
learning, the more you're likely to appreciate and understand limitations and challenges.
And the more likely you'll have to have processes for testing, validating, and monitoring models.
And you're more likely to build that capability in-house over time.
Yeah, I think it's an expectable finding really. Yeah, yeah. And you know, if you follow the whole
MLOps space, that's kind of part of the, this is increasingly part of the workflow right so uh model experimentation validation testing qa
cicd for models um so you tend to you tend to uh develop those capabilities over time
and and and practices and processes. Okay.
So overall, and to wrap up,
since I think we covered the main findings of the report,
and if not, feel free,
and you think that there's something that we should highlight and we haven't, feel free to do so.
But assuming we have covered the main findings,
what's the image that you see emerging from that?
So how do you see the use of AI developing in this industry?
I think just like all the other industries,
we're still in the early stages.
And I would say we're still scratching the surface.
Because I think outside of the most advanced technology companies, I think there's still a lot of digitization and kind of understanding about what the limitations of these models are,
what they can do, and what some of the pitfalls are.
And then in particular in healthcare, I think people are more likely to be more careful
than in other industries, right, As far as deploying these models,
first of all, set aside regulation, there's also safety.
So there's probably a higher premium
on reliability and safety.
And healthcare, I think, in particular, maybe one of the more consciousgo in the early days of the topic of model explainability,
the two industries that seem to always be giving presentations on these topics are financial
services and healthcare.
And then I think fairness is a topic that the entire machine learning community has come to understand.
And I think healthcare is not completely unaware of this topic.
In fact, I think, if anything, they're much more conscious about the need to make sure that their models are behaving reasonably unbiased.
But yeah, so I think that, as you pointed out, George, there are data types where,
according to our survey, there's a lot of room for growth, right? So audio, video, images and so on and so forth.
Yeah and as you also mentioned earlier, specifically for these media
types, it's a number of things. So first, the domain expertise that needs to be applied,
and then also the fact that by nature, let's say,
machine learning on those media types is kind of different
and probably more demanding than it is on text.
Slightly more advanced topics.
But if you think about some of those media types,
and in this domain in particular,
it's much harder to find human labelers right because
you need domain knowledge really it's not as simple as a autonomous driving where you can get
regular people to label images you know of stop signs and things like that. And also I think the one thing that we didn't cover in this survey is
challenges like data specific challenges. I would have been curious to ask them.
In particular, I think we have this sometimes as an industry, uh, ML and,
and big data industry, we have the stereotype that everyone is awash with data and, you
know, drowning with data, but really in, in, in some domains, and I would suspect in healthcare,
uh, rare diseases, you don't have that much data. So how do you use ML in that context?
Luckily, the one good thing, George, about healthcare is that I think that
in most scenarios, I would imagine these models are deployed with human in the loop.
They're meant to augment human users.
The healthcare community is already attuned to the need to not completely go full automation. So maybe they might be able to tackle kind of
more challenging problems that way. So for example, in problems where you don't have enough
data to train a model that you can just unleash and trust. Well, but you don't do that anyway in
this domain. You always have a model suggest something to an expert
and then the expert still evaluates the recommendation
or the output of the model.
So maybe it might still work in those scenarios.
Yeah.
I mean, I don't know if you have plans of doing that survey again,
maybe next year. Yeah, we hope to we hope to
if you do it may be interesting to ask these types of questions for example so
if there is a human in the loop approach taken by respondents and so on yeah yeah yeah i would
suspect that uh let's say in computer vision
and medical imaging,
I don't think we're going to get
to the point where you just remove
the radiologist
for a variety of reasons,
regulatory and ethical reasons, right?
So you make them more productive, right?
You're probably right.
At least I hope you are.
Let's find out next year
and see if the data you get
corroborates that.
Right.
Okay, great.
It's been a pleasure.
Yes, great meeting you.
I hope you enjoyed the podcast.
If you like my work, you can follow Link Data Orchestration
on Twitter, LinkedIn, and Facebook.