Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 2x07: Improving AI with Transfer Learning Featuring Frederic Van Haren

Starting point is 00:00:00 Welcome to Utilizing AI, the podcast about enterprise applications for machine learning, deep learning, and other artificial intelligence topics. Each episode brings experts in enterprise infrastructure together to discuss application of AI in today's data center. Today, we're discussing transfer learning. Our guest, Frederick Van Haren, attended AI Field Day last year and will be at AI Field Day again in May. Now, Frederick has a really interesting background in data, analytics, and voice recognition. So, Frederick, why don't you tell us a little bit about yourself? Sure. Well, first of all, thanks for having me. So, my name isik van Heren. I'm the CTO of Hyphens, which does consulting and services in the AR markets. And my background is really speech recognition. So for the longest time,

Starting point is 00:00:54 I've been running a large organization doing HPC and AI for the speech markets. I can be found on Twitter. It's Frederik V. Heren. And the website I run for the company is hyphens.com. So that's H-I-G-H-F-E-N-S.com. Thanks, Frederick. And I'm Stephen Foskett, publisher of Gestalt IT and organizer of Tech Field Day, including AI Field Day. You can find me on Twitter at S. Foskett, and I would love to hear from you. So, Frederick, one of the things that sort of piqued my interest when talking to you was this whole concept of transfer learning. This is not a subject that we've broached before on the podcast, and I think that it might be really interesting for our audience to learn more. So maybe, can you kick off just by explaining what is transfer learning? Right. So you can imagine that when you start working on AI, that there's a lot of data you need

Starting point is 00:01:51 to collect and build models. And that takes a very significant amount of time, not only from resources like people, but also from a hardware perspective. So imagine that there is a complex neural network with a model and you want to extend that model. And so the way it works is if you would visualize a neural network, you know, with all tail ends and replace the tail ends with a new little or smaller neural network that you can attach to the existing one and then you can kind of adapt the model if you wish with just the limited data you need in order for that particular use case so an example of that would be let's imagine that you have built a language model

Starting point is 00:02:46 for American English. You spent millions and millions of CPU hours obtaining that model, and you would like to build a Alexa or Siri-alike assistant. And so in order to do that, you need additionally a wake-up word, right? So the model needs to react or get activated once you say that particular wake-up word. So what you could do is you could take that language model, cut off the tail end, and add a little neural network that just does the wake-up keyword, wake-up. And you only use some of the data in order to build your little neural network in the end you have a much larger a much larger utility of that that model and so the big the big the big the big win and advantage for enterprises is that you don't have to start from scratch. So that's why transfer learning is really interesting. I think that's really is what we need because,

Starting point is 00:03:55 of course, most enterprises don't have a whole team of PhDs and the ability to create their own, you know, model. Frankly, what you're describing sounds almost, you know, analogous to the switch from custom written software to shrink wrap software in enterprise, where you would, for example, buy a, you know, a license for an ERP system. That doesn't mean that it's done. It doesn't mean that it's like ready to roll out, but it gives you a place to start. Is that a way to describe transfer learning? Yeah, it definitely is. I mean, it creates a level of modularity, right? So if every AI project would have to take into consideration that you have to retrain the model, has multiple issues, when not everybody has access to the same amount of data. And as you know, collecting data is just one facet of it, right?

Starting point is 00:04:46 The quality of the data and the ability to process the data. So it is really a way to jumpstart an AI project. And it also has the benefits that you can combine multiple models with transfer learning, right? So you can really get results much faster than you normally would. But I agree with your thinking behind the product. So in previous episodes, we've talked about how AI applications consist basically of three components. You've got your model, you've got your feature store, and then you've got your data. Is that really kind of analogous to what you're describing

Starting point is 00:05:30 here with transfer learning? Yes. I mean, the concepts are still the same. So the only difference is that at some point you take something that you could consider as a finished model, and then you kind of restart the whole process, right? So there's still data collection, there's still feature extraction, there is still the feedback loop. So all of these components are still valid. It's just that you do it at a smaller scale

Starting point is 00:05:57 and you reuse somebody else's model. So is this the model, do you think that going forward that enterprise ai applications are going to be using that they will be taking a you know an existing model and then sort of applying it to a new set of data yes i i'm actually convinced of that right now it's already happening i think a lot of people that start working with, they don't start from scratch, right? If you look at public clouds like Amazon and others, they deliver already models, pre-made models for even speech recognition, right? I mean, speech recognition 10, 15 years ago was an incredibly challenging problem. Nowadays, you can go to an Amazon and have access to a pre-built language model,

Starting point is 00:06:48 and if you want to add something to it or modify it or add some more training data, I mean, the opportunities are incredible. And I do think that also is one of the key components for the growth of AI or the acceleration, I should say, because with transfer learning, you can save yourself a lot of time. You might not be happy with the model you get, but if you have a model that is reasonably good with transfer learning, you can get results really fast and then rely on your loopback cycle in order to improve your new model over time, right? I mean, it's AI is all about statistics. So there's nothing is a hundred percent.

Starting point is 00:07:26 You just have to understand what you have, what you can do and transfer learning gives you that opportunity to bootstrap any AI project. And I, and most of the, the, the, the enterprises I talk to they don't say that they want transfer learning because they don't have an understanding of transfer learning, but it's clear that transfer learning is the solution to them. Yeah, absolutely. And on your blog, you wrote about transfer learning. And one of the things that you say is that this concept is essentially what we all do every day, that teachers transfer

Starting point is 00:08:04 their knowledge, they transfer sort of the synthesis of what they've learned to their students. And many of those students, you know, they may not have to learn everything from scratch, you know, that I mean, they can, they can basically take that set of knowledge, that set of rules, that set of, you know, disciplinary learning, and then they can apply that themselves to new things. In many cases, they may not even know where this learning came from. They just need to know that they can apply it to new information. Is that right? Right. We do the same thing every day, so I do like math a lot. When I started using, and every time you learn something new,

Starting point is 00:08:45 you rely on theories that have been proven before. So you take that as a given. And so once you understand and know how to work with those, that's how you can make fast progress. Same thing with the software frameworks, right? Like PyTorch and Keras. I mean, most people don't really understand what's happening underneath. There's a lot of math, there's a lot of matrix operations, you know,

Starting point is 00:09:10 with the weights and the bias and all that stuff. I mean, not everybody needs to know the details in order to be successful with AI, but it's moving fast, right? And that's what transfer learning allows you to do is transfer learning allows you to use something that a baseline, if you wish, and then take that baseline even further with your ideas without having to reinvent the wheel, so to speak. And it also reflects the nature of computing today where, you know, we can't assume that every AI application or AI endpoint is going to have the same compute resources. So by using this concept, right, we could, you know, use centralized compute resources to build up the model and then we can deploy that model. And it can continue to do learning, but it won't do maybe the sort of heavy lifting initial learning that it might have had to do in order to get to this point,

Starting point is 00:10:06 because, you know, we're delivering it partially baked, right? Right. That's right. Now, your background is in, well, at least how I first met you was in voice and language processing, right? And I think that this is really an interesting field because that was maybe one of the first machine learning applications that people encountered. Is it fair to say that sort of dictation software and things like that, I mean, are they AI? And were they AI?

Starting point is 00:10:40 And are they blazing the trail for what we're doing today? Right, so when I started in the speech recognition business and are they blazing the trail for what we're doing today? Right. So when I started in the speech recognition business about 20 years ago, I mean, it was all about writing code and software algorithms and there was no open source community really. And I would call it compute centric, right? So it was all about the MIPS

Starting point is 00:11:02 and how much processing you could do. Although we realized that if you want to deliver a speech product, you know, math by itself is not a good indication to understand people's voices. So we had to start collecting data. But in those days, 15 to 16 years ago, or at least statistically represent the world population, the voice of the world population. I mean, it's insane if you think about it. So what we decided to do was to start collecting for particular verticals and starting with HPC, right? So you scale everything. Then once you have your infrastructure available, you start collecting data and then you use HPC to scale your AI infrastructure.

Starting point is 00:12:12 But AI in the early days was all CPU based and CPUs was one socket with one core, right? That's when we started. The early days of speech recognition, we could do limited speech recognition with a single compute, single server, but that was dedicated. There was nothing else we could do. And then we kind of started to understand that if we relied heavily on data and good data, that we could get decent results. And that's when we started kind of focusing more and more on AI and the advantage of analyzing data.

Starting point is 00:12:54 But we had to build our own frameworks. We had to write our own code. A funny joke there is when we actually reached out to NVIDIA, which today is the number one GPU resource. When we contacted the NVIDIA organization, they basically told us, well, we don't really have anything for you. Why don't you talk to our gamer division? So we actually had a conversation with NVIDIA Munich, where the conversation with them was about how to get consumer cards and go to enterprise. And then once the GPUs came into play, then you could really do millions and millions of calculations per second.

Starting point is 00:13:35 And that's where everything took off, right? So speech together with GPUs, the ability to collect a lot of data, process a lot of data, the fact that the hardware didn't cost that much anymore. Well, relatively, I should say. It all came down to the combination of HPC and AI. And just to give you an idea, we needed about 110 racks of equipment just to cover about 40 language models on a permanent basis, which today

Starting point is 00:14:10 you don't need that amount of hardware for the two reasons. One is that with open source frameworks, you have a lot more intelligence in building the models. And secondly, using transfer learning will help you get faster results without having to collect all the data. But yes, I was referred to as speech recognition as ground zero for modern AI. A lot of the items that we had to do in speech are now very common and really open sourced, if you wish, to the community. I think one of the interesting aspects there as well is that a lot of the work that you guys were doing early on was not, you know, using basically what we would recognize as sort of deep learning today. And so essentially, you spent all this time basically building a capability and then had it all get wiped away

Starting point is 00:15:07 by this new technology of machine learning and deep learning. Is that accurate? Is that kind of how it felt at the time? Yeah, I wouldn't call it wiped away. I mean, the thing is that machine learning and deep learning and AI are all really, it's really, those are labels, right? So we were doing all of these things we were really working on on on algorithms and and the math behind it one thing one thing i have to say is that most of the speech recognition innovation actually doesn't come from from large enterprises but actually coming from universities and so universities around the world were basically saying well here's a new algorithm

Starting point is 00:15:45 but we don't have the processing capabilities to do that and then students would write papers and then we would look at those papers and we try them all out and I would hire these people so from a label perspective I you could say that we're doing machine learning and deep learning I wouldn't say that that it has wiped out we have done. I think it's still built on things we have done. And in some cases, new technology came in that replaced some pieces. A good example is natural language processing. So think about that. So if you think about a language model, a speech language model, the idea is to accurately guess what you're saying, right? So you can never get 100%.

Starting point is 00:16:29 And to a certain degree, you don't have to, right? So people, when you and I have a conversation, and if you get 85 words out of the 100 words I'm saying, you really didn't miss anything, right? And so one thing we realized is that even with with using ai for language models trying to go from from an 80 plus to let's say 90 would would would would be a would create a would need a significant amount of money and resources in order to do that and then nlp came along and so the language model goes from from audio to text and now imagine

Starting point is 00:17:07 that if I could add context to the texts or the words that are coming out of of the recognizer based with combining the context with the the words coming out of the speech recognition engine, I can actually use NLP to increase the accuracy of the contents because I have the ability to bring context to the situation. So let's take an example. I could say, for example, let's make a reservation for seven people tomorrow at five. So there are pieces missing, right? And with context, I can increase the accuracy of the results, meaning the system might know out of context that I go to the local Italian restaurant around the corner. And so by bringing in context, I can increase the accuracy of the result of the recognition engine. So you will see that there's a lot of focus nowadays on NLP as opposed to language models.

Starting point is 00:18:12 So, you know, coming back to your point, has it erased everything we have done? No, but it's built on top of it, right? where transfer learning comes into play because NLP by itself is really a big challenge to solve by itself. But all these things combined, it's more evolutionary and in some cases revolutionary than really considering what we did 15 years ago like ancient right? I think it's true for most technologies is some of the technology doesn't disappear. It's just being reused, recycled, and built upon. One of the things you mentioned in there too, I think that's important for this conversation and for a lot of the conversations we've been having here on utilizing AI is this gap between, like you said, like 80% effective, 90% effective.

Starting point is 00:19:10 Will we ever get to 100% effective? And that's been one of the core questions that we've been approaching whenever we've been talking about AI applications. And, you know, it's a big problem because essentially, you know, for example, we talked about autonomous driving quite a lot on the podcast. And, you know, one of the things that has occurred to me is that it's not easy, but it's totally doable to develop a system that can drive a car, you know, in 80% of highway situations. It's much, much more difficult. And in fact, there's some contention that it might even be impossible to develop an autonomous driving system that can drive anywhere at any time. So is that really kind of what we're seeing as well in other fields of AI in terms of, you

Starting point is 00:19:59 know, natural language processing? You know, are we ever going to have a system that can understand everything people say? Well, everything I wouldn't say, but over time, I mean, the problem you're describing is control of your environment, right? So to talk about the self-driving cars, so the self-driving cars cars if you always use the same the same layouts and and the same highways um and the system will learn right so the thing about ai is it's ai is not a one-stop shop right ai is continuous and so the advantage of ai is that if it makes a mistake and it realizes it made a mistake it will improve the system over time um and it's not just self-driving cars, right? I mean, on the other hand, I mean, the example they always use is a self-driving car,

Starting point is 00:20:52 even if it's perfect for, you know, dealing with people, bicycles, motorcycles, and so on. What happens if there's an airplane making a crash landing on the highway, right? How do you deal with that? Well, you know, those are those are of course are challenges but so in order to say that it will ever understand everything or that the car will will will never ever make a mistake i don't think so i think just like us humans it will learn to make to learn from its mistakes in the speech recognition world is the same thing right

Starting point is 00:21:26 so um what we what we used to do and is still being done today in the speech market is that by controlling the environment you can improve the model and so an example would be let's assume you sell a speech product to a bank right so and the the language model you go in with is, let's call it the generic model, meaning that it's not really targeted at the customers of the bank. But if you allow to use this model to be improved by the people who use or are customers to the bank, then you can get very, very high accuracy, right? So the thing about speech recognition is you're trying to recognize a pattern. You're not necessarily trying to recognize, you know,

Starting point is 00:22:16 American English or British English. You're trying to recognize a pattern. So at Nuance, we had a desktop application that was very popular with people that had speech disabilities. And why is that? It's because if the person that has a speech disability always says the same thing, not the way we would understand it, but consistent in the way they're saying it, you can actually train the product to recognize that as valid, right? So it's more about consistency as opposed to variety. But as long as there is variety,

Starting point is 00:22:55 AI will never be perfect, but will always make an attempt at learning. And it's the learning component that actually I personally feel is more important to AI than anything else, right? I mean, it's the learning component that actually I personally feel is more important to AI than anything else, right? I mean, it's building a language model and it works out of the box for your environment. That's great, but does it work for other people? And if it doesn't work for other people, does it self-learn or do you have to feed it even more data and make some

Starting point is 00:23:24 changes in order to make the model work and i think that is what is important for enterprises in ai is not to have and say checkbox i i'm doing ai or have a product that is based on ai but it's the back the fact that you can learn from it and that learning learning never stops i mean it's true for us i mean we we never stop learning there's no there's no time where we can say we know everything well at least i don't but And that learning never stops. I mean, it's true for us. I mean, we never stop learning. There's no time where we can say we know everything. Well, at least I don't. Well, I do wonder, though, if this lesson is being taken to heart by some of today's

Starting point is 00:23:56 AI applications, because frankly, the idea, I think learning and feedback is something that we get lip service to. But many of the AI applications that we're starting to see in the enterprise space are, you know, kind of a dead end street. In other words, you know, the thing is trained it, you know, with data in the cloud initially, and then it's just sort of rolled out and added as a feature, it may be learning from a particular application, a particular, you know, deployment of that application. But I think customers are resistant to sending information back to the mothership to improve the overall model. And, you know, even in situations, you know, of upgrades and, you know, switching from

Starting point is 00:24:42 one product to another, it is very unlikely that some of these applications are going to be applying any of those lessons forward. Right. Yeah, I think my definition of AI is, I mean, a lot of people talk about data, but for me, it's about the self-learning. I mean, if it's not self-learning, then using data doesn't qualify to be called AI, if you ask me. It's the self-learning and learn from your mistakes. I mean, it's just like us, right?

Starting point is 00:25:14 So the only way we can improve is by moving forward. And if we make mistakes, we just learn from it. But just living by rules and never change those rules is doesn't work for us and it doesn't work for ai applications and you will see that um certainly in the speech market instead of using one language model we started to personalize right because of the the self-learning and the the continuous learning and so we would for for certain customers we delivered a personalized language model so the way you can see it is imagine let's take the bank example again right where let's assume the bank

Starting point is 00:25:51 has about a thousand customers and they they are international so you have no idea um how successful your language model is going to be but you you the system assumes american english so you can you can expect that the accuracy level for certain certain um non-native speakers will be relatively low and so you can you can start with a base model and then start adapting that base model or cloning that base model if you want for each individual. And then you can apply a mechanism where if you see something for a particular individual that is useful for the general population, then that information is being looped back to the generic model. And then you can reapply that to generic model. And you can keep on updating to generic model while everybody

Starting point is 00:26:46 has their own personalized model. And that is a system where a non-native speaker might go into the system with only 40% and rapidly get to the same level of accuracy as a native speaker. And that's really where AI comes into play, right? As opposed to just delivering the model and then walk away and say, hey, you know, this is it. It works for native speakers. One more thing that you mentioned in there as well that I'd like to key in on is pattern matching. And I think that many people who, when they think about artificial intelligence, they jump to, you know, Mr. Data and a system that actually truly understands things in the way that humans understand things, or at least approaches that. But of course,

Starting point is 00:27:33 that's not at all how machine learning works. Essentially, machine learning is only doing pattern matching. It's building these pathways, it's building these connections, these statistical associations between inputs and outputs, and it's continually refining those based on the data that it encounters in the field. But to say that at least a machine learning system truly knows things in the way we do is completely inaccurate. You know, I think that maybe applications like language processing, in a way, they work counter to our true understanding of this technology, because it seems like the system understands things. Because, you know, I say something, and it's putting the words on the screen and the words come out in the right order and they kind of match what I was intending to say. So it kind of fools me into thinking this system understands me. It truly understands what I'm saying, but it

Starting point is 00:28:36 absolutely doesn't. You know, would you agree to that? Yes, I actually have a good example. So one of our head researchers who was responsible for developing the algorithms, right? And if you know about audio, you're talking about sine waves, right? And so sound is just a concatenation of a bunch of sine waves. And mathematically, you can isolate those sine waves from each other and and that's how you kind of try to recognize what's the sine wave representation for you know the word the or cat and so on and the example i wanted to bring on is is that the the those individuals left the speech business and went to develop a device that you install in your home. And it measures or identifies the appliances in your home because electricity is also sine waves.

Starting point is 00:29:35 And your refrigerator has a certain pattern on how it consumes electricity. It speaks and so on. And so they basically took the speech pattern recognition and apply that to electricity so i have actually have a device at home where i can monitor and see which devices in my house are consuming electricity and what it is and and because because it needs to learn i am the the one that's being asked by the application saying, hey, I recognized between 9 a.m. and 9.02 a.m. a device consumed 230 watts. can you label it and then you label it and go back and it's it's just an just just an example on how ai and pattern recognition for speech can also be applied to similar items that are completely non-speech related right but all the learnings and the methodologies are all applicable

Starting point is 00:30:40 i'm a big fan of of of claiming that the AI market is going horizontal, right? So there's a lot of people that are saying all about the verticals. I'm a big fan of and a big advocate for saying that a lot of the AI today can be applied across the verticals, horizontally as opposed to vertical. And I think that's where most of the innovation will come into play. And that also comes back to transfer learning where transfer learning can be very useful. Well, you've given us a lot to think about here, Frederick. Honestly, I'm gonna have to chew on this for a while, but we do have to move on to the end of our podcast here.

Starting point is 00:31:20 One thing that we've been doing here in season two of the Utilizing AI podcast is springing a few kind of open-ended questions on our guests. And I wonder if you're willing to play along with our little game. Sure. All right. So as a reminder to the audience, he has not been prepared at all for these questions. So we'll see what he comes up with. Now, we did mention

Starting point is 00:31:46 self-driving cars and the challenges of creating a car that can drive anywhere, anytime. But since your background is in voice recognition, I'm going to give you the next question that I've been throwing at folks here instead. So Frederick, how long will it take for us to have a voice verbal conversational ai that can pass the turing test and fool an average person into thinking that it's speaking with another person um i would say four years max so that's coming pretty quickly now right i i think well there are two phases to that right so there's the the you you talking to the system and the system doing a good job trying to understand what you're saying and i think we're we're pretty good there um as far as that technology where where um the technology was a bit lacking, which was text-to-speech,

Starting point is 00:32:46 which is the machine talking to you, which is certainly for me, I mean, it was easy to recognize when it was a machine versus a human. But nowadays, they actually started using AI as well for text-to-speech, you know, speaking to you. And I must say, I heard a demo from my colleagues, ex-colleagues a few months ago. And it was very, very difficult to differentiate the human from a machine.

Starting point is 00:33:17 So I would say four tops for productization and enterprise. But it's definitely coming very, very close in the near future. Great. And I appreciate having somebody who actually knows this particular area weighing in on that because I've been asking other folks,

Starting point is 00:33:35 if you want to listen to some of the episodes, and we've gotten a wide variety of answers. So let me take you outside your field of experience. Basically the same question, except video. When will we have video focused ML in the home that operates the same way as audio based assistants like Siri and Alexa? In other words, when will we have cameras watching us that kind of know and can adapt to what we're doing without us even saying it? Oh, that's an interesting question.

Starting point is 00:34:15 I thought in some areas this was already happening, but yeah, I don't know. I think that's because it's outside of my area. I would probably have a more pessimistic view on this um i would say although on the other hand maybe let's say seven years away or so although yeah maybe yeah it's a pessimistic view seven but i might but you see it on the horizon that's what you're saying. Yeah, I definitely see it. Yeah, I think video, audio, all of that. I mean, we're smart creatures, but we're easily fooled, to be honest. So I think it's just a matter of getting close enough for us to go along.

Starting point is 00:35:00 I mean, yes. All right, one more question then before we wrap here. Are there any specific jobs or job roles that you see being completely eliminated by AI in the next five years? Well, I would think more in retail and robotics, you know, people, manual labor, like where cars being built, being replaced by robots. So I think those are the areas that will be hit the heart.

Starting point is 00:35:35 But yes, those will be hit the heart. And obviously, you know, just to give it a positive note, too, I mean, there will be new jobs being created as well. But I think the ones where labor will be replaced by AI and robots, I think those will be hit the hardest. All right. Well, thank you so much, Frederick. It's been a wonderful conversation. And I really learned a lot just from talking to you now.

Starting point is 00:36:02 Where can people connect with you and learn more about your thoughts in big data and AI? Right. So my website is probably, or the company's website is probably the easiest way. It's highfence.com. So it's H-I-G-H-F-E-N-S.com slash blog for the blog. I have written in the past about NLP, about transfer learning and other related technologies towards AI. Some of the pretty basic AI, nothing too fancy, but just at least to get the conversation going. And on Twitter, I'm Frederick V. Herons. Well, thank you so much for joining us again, Frederick. It's been a lot of fun having you here. And thank you listeners for joining us as well.

Starting point is 00:36:50 If you enjoyed this discussion, please do subscribe, rate, and review the show. That does actually help your favorite podcasts to get some visibility. And please do feel free to share the show with your friends. This podcast is brought to you by gestaltit.com, your home for IT coverage from across the enterprise. For show notes and more episodes,

Starting point is 00:37:09 please go to utilizing-ai.com or you can find us on Twitter at utilizing underscore AI. Thanks a lot for listening and we'll see you next week. Thank you.

Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 2x07: Improving AI with Transfer Learning Featuring Frederic Van Haren

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.