Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 2x07: Improving AI with Transfer Learning Featuring Frederic Van Haren
Episode Date: February 16, 2021Productive use of AI requires the application of existing models to new applications through a process called transfer learning. In this episode, High-Performance Computing and AI Expert Frederic van ...Haren joins Stephen Foskett to discuss the topic of transfer learning and what it means, from voice recognition to autonomous driving and enterprise applications. Transfer learning is analogous to the way teachers impart knowledge and experience to their students, and represents a feedback loop that improves the model over time. This is a valuable concept for applications like language processing but requires a feedback mechanism or it is something of a dead end. One challenge for machine learning is that models do not truly understand the world the way people do, but they can fool us into thinking that they do because of their uncanny ability to match patterns the way we would. Over time, we all must develop a better understanding of this technology even as it is being widely deployed around us. Guests and Hosts Frederic Van Haren, CTO and Founder of HighFens. Find Frederic on Twitter as FredericVHaren. Stephen Foskett, Publisher of Gestalt IT and Organizer of Tech Field Day. Find Stephen’s writing at GestaltIT.com and on Twitter at @SFoskett Date: 2/16/2021 Tags: @SFoskett, @FredericVHaren
Transcript
Discussion (0)
Welcome to Utilizing AI, the podcast about enterprise applications for machine learning,
deep learning, and other artificial intelligence topics. Each episode brings experts in enterprise
infrastructure together to discuss application of AI in today's data center. Today, we're
discussing transfer learning. Our guest, Frederick Van Haren, attended AI Field Day last year and will be at AI
Field Day again in May. Now, Frederick has a really interesting background in data, analytics,
and voice recognition. So, Frederick, why don't you tell us a little bit about yourself?
Sure. Well, first of all, thanks for having me. So, my name isik van Heren. I'm the CTO of Hyphens, which does consulting and
services in the AR markets. And my background is really speech recognition. So for the longest time,
I've been running a large organization doing HPC and AI for the speech markets. I can be found on
Twitter. It's Frederik V. Heren. And the website I run for the company is hyphens.com. So that's H-I-G-H-F-E-N-S.com.
Thanks, Frederick. And I'm Stephen Foskett, publisher of Gestalt IT and organizer of Tech Field Day, including AI Field Day. You can find me on Twitter at S. Foskett, and I would love to hear from you.
So, Frederick, one of the things that sort of piqued my interest when talking to you was this
whole concept of transfer learning. This is not a subject that we've broached before on the podcast,
and I think that it might be really interesting for our audience to learn more. So maybe,
can you kick off just by explaining what is transfer learning? Right.
So you can imagine that when you start working on AI, that there's a lot of data you need
to collect and build models.
And that takes a very significant amount of time, not only from resources like people,
but also from a hardware perspective.
So imagine that there is a complex neural network with a model and you want to extend that model.
And so the way it works is if you would visualize a neural network, you know, with all tail ends and replace the tail ends with a new little or smaller neural
network that you can attach to the existing one and then you can kind of adapt the model if you
wish with just the limited data you need in order for that particular use case so an example of that
would be let's imagine that you have built a language model
for American English. You spent millions and millions of CPU hours obtaining that model,
and you would like to build a Alexa or Siri-alike assistant. And so in order to do that,
you need additionally a wake-up word, right? So the model needs to react or get activated once you say that particular wake-up word.
So what you could do is you could take that language model, cut off the tail end, and add a little neural network that just does the wake-up keyword, wake-up.
And you only use some of the data in order to build your little
neural network in the end you have a much larger a much larger utility of that that model and so
the big the big the big the big win and advantage for enterprises is that you don't have to start from scratch. So that's
why transfer learning is really interesting. I think that's really is what we need because,
of course, most enterprises don't have a whole team of PhDs and the ability to create their own,
you know, model. Frankly, what you're describing sounds almost, you know,
analogous to the switch from custom written software to shrink wrap software in enterprise,
where you would, for example, buy a, you know, a license for an ERP system. That doesn't mean
that it's done. It doesn't mean that it's like ready to roll out, but it gives you a place to
start. Is that a way to describe transfer learning?
Yeah, it definitely is. I mean, it creates a level of modularity, right? So if every AI project
would have to take into consideration that you have to retrain the model, has multiple issues, when not everybody has access to the same amount of data. And as you know, collecting data is just one facet of it, right?
The quality of the data and the ability to process the data.
So it is really a way to jumpstart an AI project.
And it also has the benefits that you can combine multiple models with transfer learning, right?
So you can really get results much faster
than you normally would. But I agree with your thinking behind the product.
So in previous episodes, we've talked about how AI applications consist basically of three
components. You've got your model, you've got your feature
store, and then you've got your data. Is that really kind of analogous to what you're describing
here with transfer learning? Yes. I mean, the concepts are still the same. So the only difference
is that at some point you take something that you could consider as a finished model, and then you
kind of restart the whole process, right?
So there's still data collection,
there's still feature extraction,
there is still the feedback loop.
So all of these components are still valid.
It's just that you do it at a smaller scale
and you reuse somebody else's model.
So is this the model, do you think that going forward that enterprise ai applications are
going to be using that they will be taking a you know an existing model and then sort of applying
it to a new set of data yes i i'm actually convinced of that right now it's already
happening i think a lot of people that start working with, they don't start from scratch, right? If you look at public
clouds like Amazon and others, they deliver already models, pre-made models for even speech
recognition, right? I mean, speech recognition 10, 15 years ago was an incredibly challenging
problem. Nowadays, you can go to an Amazon and have access to a pre-built language model,
and if you want to add something to it or modify it or add some more training data,
I mean, the opportunities are incredible.
And I do think that also is one of the key components
for the growth of AI or the acceleration, I should say,
because with transfer learning, you can save yourself a lot of time. You might not be happy with the model you get,
but if you have a model that is reasonably good with transfer learning, you can get results really
fast and then rely on your loopback cycle in order to improve your new model over time, right? I mean,
it's AI is all about statistics. So there's nothing is a hundred percent.
You just have to understand what you have,
what you can do and transfer learning gives you that opportunity to bootstrap
any AI project. And I, and most of the, the, the,
the enterprises I talk to they don't say that they want transfer learning
because they don't have an understanding
of transfer learning, but it's clear that transfer learning is the solution to them.
Yeah, absolutely. And on your blog, you wrote about transfer learning. And one of the things
that you say is that this concept is essentially what we all do every day, that teachers transfer
their knowledge, they transfer sort of
the synthesis of what they've learned to their students. And many of those students, you know,
they may not have to learn everything from scratch, you know, that I mean, they can, they can
basically take that set of knowledge, that set of rules, that set of, you know, disciplinary
learning, and then they can apply that themselves to new things.
In many cases, they may not even know where this learning came from. They just need to know that
they can apply it to new information. Is that right? Right. We do the same thing every day,
so I do like math a lot. When I started using, and every time you learn something new,
you rely on theories that have been proven before.
So you take that as a given.
And so once you understand and know how to work with those,
that's how you can make fast progress.
Same thing with the software frameworks, right?
Like PyTorch and Keras.
I mean, most people don't really understand what's
happening underneath. There's a lot of math, there's a lot of matrix operations, you know,
with the weights and the bias and all that stuff. I mean, not everybody needs to know the details
in order to be successful with AI, but it's moving fast, right? And that's what transfer learning
allows you to do is transfer learning allows you to use something that a
baseline, if you wish, and then take that baseline even further with your ideas without having to
reinvent the wheel, so to speak. And it also reflects the nature of computing today where,
you know, we can't assume that every AI application or AI endpoint is going to have the same compute resources.
So by using this concept, right, we could, you know, use centralized compute resources to build up the model and then we can deploy that model.
And it can continue to do learning, but it won't do maybe the sort of heavy lifting initial learning that it might have had to do in order to get to this point,
because, you know, we're delivering it partially baked, right?
Right. That's right.
Now, your background is in, well, at least how I first met you was in voice and language processing, right? And I think that this is really an interesting field
because that was maybe one of the first
machine learning applications that people encountered.
Is it fair to say that sort of dictation software
and things like that, I mean, are they AI?
And were they AI?
And are they blazing the trail for what we're doing today?
Right, so when I started in the speech recognition business and are they blazing the trail for what we're doing today? Right.
So when I started in the speech recognition business about 20 years ago,
I mean, it was all about writing code
and software algorithms
and there was no open source community really.
And I would call it compute centric, right?
So it was all about the MIPS
and how much processing you could do.
Although we realized that if you want to deliver a speech product, you know, math by itself is not a
good indication to understand people's voices. So we had to start collecting data. But in those days, 15 to 16 years ago, or at least statistically represent the world population,
the voice of the world population. I mean, it's insane if you think about it. So what we decided
to do was to start collecting for particular verticals and starting with HPC, right? So you
scale everything. Then once you have your infrastructure available,
you start collecting data
and then you use HPC to scale your AI infrastructure.
But AI in the early days was all CPU based
and CPUs was one socket with one core, right?
That's when we started.
The early days of speech recognition, we could do
limited speech recognition with a single compute, single server, but that was dedicated. There was
nothing else we could do. And then we kind of started to understand that if we relied heavily
on data and good data, that we could get decent results.
And that's when we started kind of focusing more and more on AI and the advantage of analyzing data.
But we had to build our own frameworks.
We had to write our own code.
A funny joke there is when we actually reached out to NVIDIA,
which today is the number one GPU resource.
When we contacted the NVIDIA organization, they basically told us, well, we don't really have anything for you.
Why don't you talk to our gamer division?
So we actually had a conversation with NVIDIA Munich, where the conversation with them was about how to get consumer cards and go to enterprise.
And then once the GPUs came into play, then you could really do millions and millions of calculations per second.
And that's where everything took off, right?
So speech together with GPUs, the ability to collect a lot of data, process a lot of data,
the fact that the hardware didn't cost that much anymore.
Well, relatively, I should say.
It all came down to the combination of HPC and AI.
And just to give you an idea,
we needed about 110 racks of equipment
just to cover about 40 language models on a permanent basis, which today
you don't need that amount of hardware for the two reasons. One is that with open source frameworks,
you have a lot more intelligence in building the models. And secondly, using transfer learning
will help you get faster results without having to collect all the data.
But yes, I was referred to as speech recognition as ground zero for modern AI.
A lot of the items that we had to do in speech are now very common and really open sourced, if you wish, to the community.
I think one of the interesting aspects there as well is that a lot of the work that you guys were
doing early on was not, you know, using basically what we would recognize as sort of deep learning
today. And so essentially, you spent all this time basically building a capability and then had it all get wiped away
by this new technology of machine learning and deep learning. Is that accurate? Is that
kind of how it felt at the time? Yeah, I wouldn't call it wiped away. I mean,
the thing is that machine learning and deep learning and AI are all really, it's really,
those are labels, right? So we were doing all of these things we were really
working on on on algorithms and and the math behind it one thing one thing i have to say is
that most of the speech recognition innovation actually doesn't come from from large enterprises
but actually coming from universities and so universities around the world were basically
saying well here's a new algorithm
but we don't have the processing capabilities to do that and then students would write papers and
then we would look at those papers and we try them all out and I would hire these people so
from a label perspective I you could say that we're doing machine learning and deep learning
I wouldn't say that that it has wiped out we have done. I think it's still built
on things we have done. And in some cases, new technology came in that replaced some pieces.
A good example is natural language processing. So think about that. So if you think about a
language model, a speech language model, the idea is to accurately guess what you're saying, right?
So you can never get 100%.
And to a certain degree, you don't have to, right?
So people, when you and I have a conversation,
and if you get 85 words out of the 100 words I'm saying,
you really didn't miss anything, right?
And so one thing we realized is that even with
with using ai for language models trying to go from from an 80 plus to let's say 90 would would
would would be a would create a would need a significant amount of money and resources in
order to do that and then nlp came along and so the language model goes from from audio to text and now imagine
that if I could add context to the texts or the words that are coming out of of the recognizer
based with combining the context with the the words coming out of the speech recognition engine, I can actually use NLP to increase the
accuracy of the contents because I have the ability to bring context to the situation. So
let's take an example. I could say, for example, let's make a reservation for seven people
tomorrow at five. So there are pieces missing, right? And with context, I can increase the
accuracy of the results, meaning the system might know out of context that I go to the local Italian
restaurant around the corner. And so by bringing in context, I can increase the accuracy of the result of the recognition engine.
So you will see that there's a lot of focus nowadays on NLP as opposed to language models.
So, you know, coming back to your point, has it erased everything we have done?
No, but it's built on top of it, right? where transfer learning comes into play because NLP by itself is really a big challenge to solve
by itself. But all these things combined, it's more evolutionary and in some cases revolutionary
than really considering what we did 15 years ago like ancient right? I think it's true for most technologies is some of the technology doesn't disappear.
It's just being reused, recycled, and built upon.
One of the things you mentioned in there too, I think that's important for this conversation
and for a lot of the conversations we've been having here on utilizing AI is this gap between, like
you said, like 80% effective, 90% effective.
Will we ever get to 100% effective?
And that's been one of the core questions that we've been approaching whenever we've
been talking about AI applications.
And, you know, it's a big problem because essentially, you know, for example, we talked
about autonomous driving quite a lot on the podcast.
And, you know, one of the things that has occurred to me is that it's not easy, but it's totally doable to develop a system that can drive a car, you know, in 80% of highway situations. It's much, much more difficult. And in fact, there's some contention that it might
even be impossible to develop an autonomous driving system that can drive anywhere at any
time. So is that really kind of what we're seeing as well in other fields of AI in terms of, you
know, natural language processing? You know, are we ever going to have a system that can understand everything people say? Well, everything I wouldn't say,
but over time, I mean, the problem you're describing is control of your environment,
right? So to talk about the self-driving cars, so the self-driving cars cars if you always use the same the same layouts and and the same highways
um and the system will learn right so the thing about ai is it's ai is not a one-stop shop right
ai is continuous and so the advantage of ai is that if it makes a mistake and it realizes it
made a mistake it will improve the system over time um and it's not just self-driving cars, right?
I mean, on the other hand, I mean,
the example they always use is a self-driving car,
even if it's perfect for, you know,
dealing with people, bicycles, motorcycles, and so on.
What happens if there's an airplane
making a crash landing on the highway, right?
How do you deal with that?
Well, you know, those are those are of course are challenges but so in order to say that it will ever understand
everything or that the car will will will never ever make a mistake i don't think so i think
just like us humans it will learn to make to learn from its mistakes in the speech recognition world is the same thing right
so um what we what we used to do and is still being done today in the speech market is that
by controlling the environment you can improve the model and so an example would be let's assume you
sell a speech product to a bank right so and the the language model you go in with is, let's call it the generic
model, meaning that it's not really targeted at the customers of the bank. But if you allow to use
this model to be improved by the people who use or are customers to the bank,
then you can get very, very high accuracy, right?
So the thing about speech recognition is you're trying to recognize a pattern.
You're not necessarily trying to recognize, you know,
American English or British English.
You're trying to recognize a pattern. So at Nuance, we had a desktop application that was very popular
with people that had speech disabilities.
And why is that?
It's because if the person that has a speech disability always says the same thing, not the way we would understand it, but consistent in the way they're saying it, you can actually train the product to recognize that as valid, right?
So it's more about consistency
as opposed to variety.
But as long as there is variety,
AI will never be perfect,
but will always make an attempt at learning.
And it's the learning component
that actually I personally feel
is more important to AI than anything else, right? I mean, it's the learning component that actually I personally feel is more important
to AI than anything else, right? I mean, it's building a language model and it works out of
the box for your environment. That's great, but does it work for other people? And if it doesn't
work for other people, does it self-learn or do you have to feed it even more data and make some
changes in order to make the model
work and i think that is what is important for enterprises in ai is not to have and say checkbox
i i'm doing ai or have a product that is based on ai but it's the back the fact that you can learn
from it and that learning learning never stops i mean it's true for us i mean we we never stop
learning there's no there's no time where we can say we know everything well at least i don't but And that learning never stops. I mean, it's true for us. I mean, we never stop learning.
There's no time where we can say we know everything.
Well, at least I don't.
Well, I do wonder, though, if this lesson is being taken to heart by some of today's
AI applications, because frankly, the idea, I think learning and feedback is something
that we get lip service to.
But many of the AI applications that we're starting to see in the enterprise space are, you know, kind of a dead end street.
In other words, you know, the thing is trained it, you know, with data in the cloud initially, and then it's just sort of rolled out and added as a feature, it may be learning from a particular application, a particular, you know, deployment
of that application.
But I think customers are resistant to sending information back to the mothership to improve
the overall model.
And, you know, even in situations, you know, of upgrades and, you know, switching from
one product to another, it is very unlikely that some of these applications are going to be applying any of those lessons
forward.
Right.
Yeah, I think my definition of AI is, I mean, a lot of people talk about data, but for me,
it's about the self-learning.
I mean, if it's not self-learning, then using data doesn't qualify to be called AI, if you ask me.
It's the self-learning and learn from your mistakes.
I mean, it's just like us, right?
So the only way we can improve is by moving forward.
And if we make mistakes, we just learn from it.
But just living by rules and never change those rules is doesn't work for us
and it doesn't work for ai applications and you will see that um certainly in the speech market
instead of using one language model we started to personalize right because of the the self-learning
and the the continuous learning and so we would for for certain customers we delivered a personalized
language model so the
way you can see it is imagine let's take the bank example again right where let's assume the bank
has about a thousand customers and they they are international so you have no idea um how
successful your language model is going to be but you you the system assumes american english so you can you can expect that the accuracy level for certain certain um non-native speakers will be relatively low
and so you can you can start with a base model and then start adapting that base model or cloning
that base model if you want for each individual. And then you can apply a mechanism where if you see something for a particular individual
that is useful for the general population, then that information is being looped back
to the generic model.
And then you can reapply that to generic model.
And you can keep on updating to generic model while everybody
has their own personalized model. And that is a system where a non-native speaker might go into
the system with only 40% and rapidly get to the same level of accuracy as a native speaker.
And that's really where AI comes into play, right? As opposed to just delivering the
model and then walk away and say, hey, you know, this is it. It works for native speakers.
One more thing that you mentioned in there as well that I'd like to key in on is pattern matching.
And I think that many people who, when they think about artificial intelligence,
they jump to, you know, Mr. Data and a system that actually truly understands
things in the way that humans understand things, or at least approaches that. But of course,
that's not at all how machine learning works. Essentially, machine learning is only doing
pattern matching. It's building these pathways, it's building these connections, these statistical associations between inputs and outputs, and it's continually refining those based on the data that it encounters in the field.
But to say that at least a machine learning system truly knows things in the way we do is completely inaccurate. You know, I think that maybe
applications like language processing, in a way, they work counter to our true understanding of
this technology, because it seems like the system understands things. Because, you know, I say
something, and it's putting the words on the screen and the words
come out in the right order and they kind of match what I was intending to say. So it kind of fools
me into thinking this system understands me. It truly understands what I'm saying, but it
absolutely doesn't. You know, would you agree to that? Yes, I actually have a good example. So one of our head researchers who was responsible for developing the algorithms, right?
And if you know about audio, you're talking about sine waves, right?
And so sound is just a concatenation of a bunch of sine waves.
And mathematically, you can isolate those sine waves from each other and and that's
how you kind of try to recognize what's the sine wave representation for you know the word
the or cat and so on and the example i wanted to bring on is is that the the those individuals
left the speech business and went to develop a device that you install in your home.
And it measures or identifies the appliances in your home because electricity is also sine waves.
And your refrigerator has a certain pattern on how it consumes electricity.
It speaks and so on.
And so they basically took the speech
pattern recognition and apply that to electricity so i have actually have a device at home where i
can monitor and see which devices in my house are consuming electricity and what it is and and
because because it needs to learn i am the the one that's being asked by the application saying, hey, I recognized between 9 a.m. and 9.02 a.m. a device consumed 230 watts. can you label it and then you label it and go back and it's it's just an just just an example
on how ai and pattern recognition for speech can also be applied to similar items that are
completely non-speech related right but all the learnings and the methodologies are all applicable
i'm a big fan of of of claiming that the AI market is going horizontal, right? So there's a
lot of people that are saying all about the verticals. I'm a big fan of and a big advocate
for saying that a lot of the AI today can be applied across the verticals, horizontally as
opposed to vertical. And I think that's where most of the innovation will come into play.
And that also comes back to transfer learning where transfer learning can be very useful.
Well, you've given us a lot to think about here, Frederick.
Honestly, I'm gonna have to chew on this for a while,
but we do have to move on to the end of our podcast here.
One thing that we've been doing here in season two
of the Utilizing AI podcast
is springing a few kind of open-ended questions on our guests.
And I wonder if you're willing to play along with our little game.
Sure.
All right.
So as a reminder to the audience, he has not been prepared at all for these questions.
So we'll see what he comes up with. Now, we did mention
self-driving cars and the challenges of creating a car that can drive anywhere, anytime. But since
your background is in voice recognition, I'm going to give you the next question that I've
been throwing at folks here instead. So Frederick, how long will it take for us to have a voice
verbal conversational ai that can pass the turing test and fool an average person into thinking that
it's speaking with another person um i would say four years max so that's coming pretty quickly now right i i think well there are two phases
to that right so there's the the you you talking to the system and the system doing a good job
trying to understand what you're saying and i think we're we're pretty good there um as far
as that technology where where um the technology was a bit lacking, which was text-to-speech,
which is the machine talking to you, which is certainly for me, I mean, it was easy to
recognize when it was a machine versus a human.
But nowadays, they actually started using AI as well for text-to-speech, you know, speaking
to you.
And I must say, I heard a demo from my colleagues,
ex-colleagues a few months ago.
And it was very, very difficult
to differentiate the human from a machine.
So I would say four tops for productization and enterprise.
But it's definitely coming very,
very close in the near future.
Great.
And I appreciate having somebody
who actually knows this particular area
weighing in on that
because I've been asking other folks,
if you want to listen to some of the episodes,
and we've gotten a wide variety of answers.
So let me take you outside your field of experience.
Basically the same question,
except video. When will we have video focused ML in the home that operates the same way as
audio based assistants like Siri and Alexa? In other words, when will we have cameras watching
us that kind of know and can adapt to what we're doing without us even saying it?
Oh, that's an interesting question.
I thought in some areas this was already happening, but yeah, I don't know. I think that's because it's outside of my area.
I would probably have a more pessimistic view on this um i would say although on the other hand maybe
let's say seven years away or so although yeah maybe yeah it's a pessimistic view seven but
i might but you see it on the horizon that's what you're saying. Yeah, I definitely see it. Yeah, I think video, audio, all of that.
I mean, we're smart creatures,
but we're easily fooled, to be honest.
So I think it's just a matter of getting close enough
for us to go along.
I mean, yes.
All right, one more question then before we wrap here.
Are there any specific jobs or job roles that you see being completely eliminated by AI
in the next five years?
Well, I would think more in retail and robotics,
you know, people, manual labor, like where cars being built, being
replaced by robots.
So I think those are the areas that will be hit the heart.
But yes, those will be hit the heart.
And obviously, you know, just to give it a positive note, too, I mean, there will be
new jobs being created as well.
But I think the ones where labor will be replaced by AI and robots, I think those will be hit the hardest.
All right.
Well, thank you so much, Frederick.
It's been a wonderful conversation.
And I really learned a lot just from talking to you now.
Where can people connect with you and learn more about your
thoughts in big data and AI? Right. So my website is probably, or the company's website is probably
the easiest way. It's highfence.com. So it's H-I-G-H-F-E-N-S.com slash blog for the blog. I have written in the past about NLP, about transfer learning
and other related technologies towards AI. Some of the pretty basic AI, nothing too fancy,
but just at least to get the conversation going. And on Twitter, I'm Frederick V. Herons.
Well, thank you so much for joining us again, Frederick.
It's been a lot of fun having you here.
And thank you listeners for joining us as well.
If you enjoyed this discussion,
please do subscribe, rate, and review the show.
That does actually help your favorite podcasts
to get some visibility.
And please do feel free to share the show with your friends.
This podcast is brought to you by gestaltit.com,
your home for IT coverage from across the enterprise.
For show notes and more episodes,
please go to utilizing-ai.com
or you can find us on Twitter at utilizing underscore AI.
Thanks a lot for listening and we'll see you next week. Thank you.