Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 03: How is Machine Learning Affecting IT Operations with @MLOpsCommunity

Episode Date: September 8, 2020

Stephen Foskett discusses practical aspects of enterprise AI with David Aponte and Demetrios Brinkmann. AI offers promise to help IT operations departments deal with the flood of data, since ML is so ...good at finding needles in haystacks. But is it true or just vendor hype? Are current vendors able to work in this space or do we need a new kind of product or vendor to develop AI models? Does the new data demand a different type of infrastructure? And how are AIOps related to DevOps? Find Stephen online at GestaltIT.com and on Twitter at @SFoskett. Find David Aponte online at LinkedIn.com/in/AponteAnalytics. Find Demetrios Brinkmann online at LinkedIn.com/in/DPBrinkm and on Twitter at @MLOpsCommunity This episode features: Stephen Foskett, publisher of Gestalt IT and organizer of Tech Field Day. Find Stephen's writing at GestaltIT.com and on Twitter at @SFoskett David Aponte, Machine Learning Engineer. Find David online at LinkedIn.com/in/AponteAnalytics Demetrios Brinkmann, Community Coordinator. Find Demetrios online at LinkedIn.com/in/DPBrinkm and on Twitter at @MLOpsCommunity Date: 09/08/2020 Tags: @SFoskett, @MLOpsCommunity, DavidAponte, DemetriosBrinkmann

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to Utilizing AI, the podcast about using AI in the enterprise. I'm your host, Stephen Foskett. Today, we're talking about using AI as part of IT operations. As I said, I'm Stephen Foskett. You can find me at gestaltit.com. I also am the organizer of the Tech Field Day event series, and you can find me on Twitter at sfoskett. You can find me at gestaltit.com. I also am the organizer of the Tech Field Day event series, and you can find me on Twitter at S Foskett. And I am David Aponte. I work as a
Starting point is 00:00:31 machine learning engineer for Benevolent AI, and I'm also a host of the coffee sessions in the ML Ops community. You could find me on LinkedIn, David Aponte, A-P-O-N-T-E. Glad to be here. Hey, everybody. I'm Demetrius Brinkman. I am the organizer of the MLOps community, and you can check us out at MLOps community on Twitter or find me on LinkedIn, Demetrius Brinkman. So in enterprise IT, the whole operations world has forever been flooded by applications trying to basically accelerate the job of humans doing the everyday running of IT. The problem is that those jobs tend to be sort of inglorious. They deal with a lot of data. There's a lot of things that people are missing. And AI promises to really help that. And so today we're going to talk a little bit about the various ways that AI is coming, you know, getting real, coming into the enterprise through the operations perspective. And I wanted to start with Dimitrios because, I mean, heck, you run the MLOps community.
Starting point is 00:01:39 I mean, this is your thing. And I know that this is your background as well. So tell us a little bit about how AI affects IT operations. Yeah so I think it could be nice to just start and explain a little bit about what MLOps is and how it differs from DevOps a little bit and David can keep me honest on this point but. you have a lot of similarities and crossover when it comes to MLOps, the operations side of machine learning. But there are quite a bit of differences because MLOps or machine learning is just a different beast. And so what you need to focus on when you're looking at MLOps are you have data, you have models that are being put into production and models, I mean, like machine learning models that are being produced, they're being trained. And then once they get put into
Starting point is 00:02:38 production, it's not just as simple as maybe pushing software into production. And when it's out in production, you have those day two operations like monitoring, and you can find all kinds of bugs because it's not so easy when you're monitoring a system for let's say, just regular infrastructure, you can tell is it working or is it not? And when you're monitoring machine learning, you have many more factors that come into play. It might be working,
Starting point is 00:03:08 but it might be giving you the wrong recommendations if you're using a recommendation model, or it might be giving you all kinds of, it might just have drifted. We call that like a model drift, which means that it's gone and it's done something that it shouldn't be doing. And when you originally trained it, you said it was going to do this,
Starting point is 00:03:30 or you thought it would do this, and then it stopped doing that. So I'm not sure if David wants to add any other points on what makes it different. I would also just add, it's just that element of data where the data is actually what powers the machine learning. So it actually makes it useful. It learns from the data. That's essentially what a machine learning model is, something that can learn from the data. And another, just to touch upon the point of how that relates to IT operations, how AI is changing that, I think it's affecting a lot of areas, like quality assurance, service management, so chatbots, things of that nature, process automation. So it's definitely making use of all that data that maybe before had to be manually curated
Starting point is 00:04:12 and kind of looked through. But now, if we have all this data, we could actually learn from it and make it useful and teach it how to do specific tasks. So I would just say that it's actually affecting a lot of different areas, areas that may be easily automated. And some are more difficult than others because of that human component. But there's even the possibility of a collaboration between AI and humans. You know, this field called active learning, where you're not having human in the loop. And that's a very important field, too. But yeah, in general, I think it's changing a lot of things.
Starting point is 00:04:42 It seems to me like this is one of the low-hanging fruit, as they say, of applications for AI, in that, you know, one of the things specifically ML is good at is being trained to, you know, find the needle in the haystack, you know, being trained to look through a large volume of data and find the outliers, find the unusual things. And then more interestingly, I mean, you could do that with a filter, you could do that with rules, but more interestingly is that ML can not only find the unusual things, but find the unusual things
Starting point is 00:05:17 that you don't even know are unusual. Find the odd balls that you might not have trained the system to look for, which in areas like, you know, one of the big applications for this, for ML early on is security, is it's just tremendously valuable because, you know, you may not know, or in fact, you probably don't know what the bad guys are doing to your system in order to get in, But ML might be able to say, hey, that's unusual. Is that the right picture? Is that the right way to look at the easy applications for ML? I would definitely say so, yeah, because it's good at doing things that are repetitive.
Starting point is 00:06:01 It's good at doing things that are, I guess, reproducible. That's what it's really essentially doing is finding patterns in your data. And these are typically patterns that you wouldn't be able to identify, you know, just by looking at it or by even sorting through it. And so it allows you to, it augments your intelligence. You know, I think even at my company at Benevolent AI, that's what we say about how it's affecting drug discovery. It's augmenting our ability to find hypotheses. So, you know, anyone can come up with a hypothesis, right? But the search space may be so large that it's not feasible to do by hand or even by a human. And when it comes to something like that, I think that's where ML is particularly useful.
Starting point is 00:06:39 When it could actually automate something that is typically, you know, requires a lot of data, a lot of parsing is maybe, you know, hard to spot these small little nuances. And yeah, I think, yeah, that's definitely a very, like you said, a low hanging fruit of where it's actually useful when anything that can be automated pretty easily. And I think it's funny that you had mentioned the security space. I have a friend that I was just talking to Ahmed at SafeAI, and he was saying what his product does is it makes data self-aware so that in case things start going haywire or in case data starts acting like it shouldn't be acting, they look at it and they say, hey, is this a security breach? What's going on? And he's doing that with AI. So is this a security breach? What's going on? And he's doing that with AI. So is this realistic though? You know, I mean, that was always my thought when I was seeing
Starting point is 00:07:31 companies, you know, present things. So for example, we do a security field day event. And at the security field day event, we had some companies talking about how they're using ML to do just simple things. You know, we're just monitoring the logs to see if anything unusual shows up. Is this realistic? I mean, you guys know this, you know, the world of models and AI and ML and deep learning and all this better than me. Is it really true that the systems will discover this stuff
Starting point is 00:07:59 or is it just more hype from the vendors? That's a great question. Personally, I do think that there is a lot of hype around AI, around what it's capable of doing. As someone who actually builds this stuff, I know its limitations. And I will say it's only as good as the infrastructure that you have in place, the data that you're actually feeding it. So it's only, I guess, it's feasible when you have a lot of these things in place. Like I mentioned the infrastructure if you have the ability to collect data, that's actually going to be useful for the model.
Starting point is 00:08:33 If there's some versioning around that, there's a way to organize it in a way that's useful and allow other people to make even make it even more useful, which in machine learning could be feature engineering is where you take a bunch of raw data, but you turn it into features or inputs that are actually going to be useful for the model. And it's very difficult to even get to that stage if you can't, you know, you don't have your data in one place. So I think it is possible, but it's not the type of thing that you can just throw at a problem and expect it to work. You have to have some infrastructure
Starting point is 00:09:01 in place. You have to have even a cultural perspective on how to manage these things in place. And you also have to have people that understand how to, you know, regulate some of these things because, you know, AI by itself could learn bad things. It could learn bias. It could be abused, you know, and that's commonly, you know, like for example, with fake news, you know, there's some really good algorithms that are good at generating fakes and those are hard to actually detect. So I would say, yes, you know, it's definitely doable, but you have to have certain things in place. And we could talk a little bit more about that if you'd like. But in general, I would say it's only as good as the infrastructure and the people and the data that you have in place. Yeah, to that point, I think it's really interesting because if you look at, I was talking to a VC who specifically invests in
Starting point is 00:09:48 machine learning startups and his whole thing was, it is really difficult to base your business around machine learning because not only are the people that create the models and the engineers very expensive but the actual ability to get it right to be able to have a model that continuously gets things right when it is you say okay i want ai or i want machine learning to do this and then you go out and you build it and you can continuously just like consistently make that model do that, it's very difficult to do that. So there are a lot of companies that set out with great expectations, but they get a bit burnt in the end. So I would say beware. That actually brings to mind a question that I was thinking of is, is this kind of the opportunity
Starting point is 00:10:43 for a new kind of vendor? In other words, instead of companies developing their own models for security or for monitoring or management or something, is there the opportunity for an entirely new kind of product vendor that is, that consists of, and of course this is something that exists, but a new to IT kind of product vendor company or something to be doing this? Like, it would be better for them to rely on a vendor that specializes in network security or process automation or whatever to bring them a model. Is that practical? Yeah, I think there's some people in this space, like know i'm thinking about you know ml for ml uh there's already people that are trying to automate machine learning and make
Starting point is 00:11:49 it super easy where it's at the click of a button you can actually provide useful results uh you know i think of aws which has a lot of you know uh you know fully managed services google gcp google has a lot of fully managed services so there's already people that are thinking about that exactly. How can we make it even easier so that you don't have to have a whole team of engineers and data scientists and AI scientists managing this thing? Is there a way that we can use things out of the box and apply them? I think so, for sure. But I would also say that I haven't seen that in my own experience work without people actually managing that.
Starting point is 00:12:26 But I do think that over time, we will get better. You know, it's like the more that we learn how to automate certain things, you know, the more we'll learn how to make it even easier for users to consume these models or to, you know, utilize this technology and embed it into their services. But yeah, feeding on the point that Demetrio said about, you know, it's hard for some companies to kind of, you know, throw ML at their problem. And when you try to build a company around just ML, it's a bit difficult. It's usually better when you focus on your domain and the problem that you're trying to solve, and then think about how AI can actually augment that can make it better. But, you know, looking at it from the perspective of just kind of making everything easy, or it's like solving all your problems, I don't think that's really the best way to look at it. I would caution that it's something that can help you,
Starting point is 00:13:08 but you have to have certain things in place. And this even applies to those fully managed solutions. You still have to have some human in the loop managing that because again, AI on its own can learn bad things and can actually, you know, cause problems. And some of them can be hard to spot if you don't know how to really look into them. And I'm sure maybe people will come up with solutions for that as well.
Starting point is 00:13:28 But generally, I would say that, yeah, that's something that's a work in progress. Yeah, and to that vendor point, I think right now what we see in the space are companies like AutoML companies that are trying to do something like that. They're trying to make it easier for the company. That's not necessarily willing to go out and hire a huge data team and get all of this infrastructure team for machine learning. They're, they're trying to help with that. Uh, that being said, yeah, it's not there yet. Um, you, there's a lot of people that are finding a lot of good uses from it i think but when you compare that to an actual team it's gonna like the team is going to be much better in 99.9 percent of those use cases so let's talk about another thing um that you know
Starting point is 00:14:23 affects all this and that's sort of the infrastructure component. David, I know that one of the things that you've talked about a lot is this whole world of that AI and ML, it's a different kind of data than companies are used to. They're used to dealing with, I mean, they like to talk about, oh, we're data-driven, or we've got large amounts of unstructured data or something. This is an entirely different kind of data. Tell us a little bit more about that. Sure. I think, you know, maybe to clarify a little bit, it's not so much, it's like an entirely different type of data. It's just the way that you deal with it and the way that it needs to be processed for it to actually be useful for the machine learning models that requires some care and requires some science. You know, it's not just an engineering problem. And even before you get to the part where you
Starting point is 00:15:10 actually like do some cool things with it, you have to actually manage it and collect it. That's the first challenge, you know, and depending on your use case, you know, for example, if you're a website where your machine learning model is making predictions in real time, or I could think about, let's say some IT solution that's actually monitoring, you know, a real life system and needs to give you real alerts. That needs a database that can actually have low latency, that can actually pull that data in a way that's actually, you know, good for the model and it doesn't take a long time.
Starting point is 00:15:37 But that's a little bit different from, let's say, like a data lake or a data store that is going to be used for training the model, developing it, but it can store a lot more things. It doesn't need to be as fast. So considering the different types of databases that you may need is an issue. Also, collecting the right type of data, knowing how to, I guess, find things that are useful. And then the other component is once you get to the part where you have,
Starting point is 00:16:02 let's say, a bunch of raw unstructured data that you think you could turn into something more useful is actually engineering it for inputs like feature engineering and making it even more useful, more structured, more specific for the algorithm. Depending on the algorithm that you're using, there's know a mixture of different things there's not really a uh there's not really a well-defined way to do these things but the the component around knowing what to collect how to collect it that it it's it's you're considering some of those uh components of the speed right at which you're getting it the amount that you could store um and also you know that component of just making sure that you're actually processing it in a way that's useful so again it's not so much that it's not the same data because i think of let's say just you have like some sensors that
Starting point is 00:16:54 are just recording everything right that could actually be turned into a useful uh you know set of features for a model but you may have to process it normalize it standardize it uh i think about some of the things of like let's say if you have a bunch of missing values, what's a smart way to impute those missing values? And that's where there's some more thought that requires, it requires some more thought than just kind of just putting it into a model. Yeah. And I've heard a statistic that, you know, 80% of a data scientist's time is doing that with the data, like transforming the data or cleaning the data. And the other parts are spent doing everything else, but that is a majority of your time. And just one little tangent that I wanted to mention before we jump onto the next
Starting point is 00:17:40 question. I was talking to this guy Vin yesterday on the MLOps meetup and he was really like a big proponent on monetizing ML and making sure you're getting ROI from this ML. And he talked about how, you know, if you don't have the data that you need to build a model and you need to go out and gather that data, that is a cost. And you need to look at that very clearly. And he did a lot of, he made some very poignant points, poignant points on this, on this idea of really getting your model to be making you money from the very beginning. So one final thing that I'll throw out here is, especially in enterprise IT,
Starting point is 00:18:34 where some of these terms are a little bit foreign and unusual and so on, I could forgive people for hearing MLOps and thinking DevOps and thinking that these are somehow, well, if not the same thing, at least the same category of things. But that's really not true, right? Yeah, there are definitely some big differences.
Starting point is 00:18:55 We had an actually an episode about this a little while ago with Ryan Dawson, so that's Ryan Dawson from Selden. But what we talked about was, you know, there's this added, so let me just talk about a high level, some of the differences. So one is that you have data. So typical, you know, software development release, it doesn't have a huge data component. Like if you're storing your data in a GitHub repo or a GitLab repository, the data is not going to be stored there. It's going to be elsewhere. And so releasing, you know, data and thinking about that alongside the code itself is
Starting point is 00:19:24 an additional challenge that needs a whole, you know, different set of practices. There's also this component of randomness, you know, algorithms sometimes have, they learn things and it's not always reproduced, it's not deterministic. And what I mean by that is there, like, for example, a deep learning model has some weights that it initializes randomly and it may be different every time. So there's that other component where you can't always reproduce the same results exactly in the same way, especially when you have a black box model, like a deep learning model. And then there's the infrastructure in place. We were talking earlier about the need to monitor it when it's in production. You can't really monitor it in the same way as, you know, typical software because of the way that it's, what it's actually doing. It's generating predictions. You need to make sure that those
Starting point is 00:20:03 predictions make sense, that there's some way to validate that their quality is good, that they're not drifting, that they're not changing because maybe the inputs change. Like let's say, if you have some sort of seasonality, that's a part of your data set. All of these things need to be considered. And it's not that DevOps doesn't think about those things.
Starting point is 00:20:21 I think they're very much related. So I would say, even at my company, the DevOps engineers, they know a machine learning, they're very familiar with it, but their specialty is around just the software component. An MLOps person, or let's say someone that works in an ML infrastructure team, they're thinking about that same sort of software, but then there's also added element of complexity with the data, with the infrastructure, they need to monitor it in real time, or even an offline. So there's these other layers of complexity that make it a little bit different. But I would say, you know,
Starting point is 00:20:47 if you're familiar with DevOps, that's awesome. And it can give you a lot to work with. So I don't want to, you know, say that they're so different in the sense that, you know, there's no comparison, but there are some differences, you know, the randomness, the data, things of that nature. For me, it's easier to understand
Starting point is 00:21:02 some of this stuff with stories or real life examples. And when you talk about monitoring and okay, maybe something happens. And so now the model is not doing what it said it would do. in my mind, which was, let's imagine that you have a model or some kind of AI that is looking at different pictures and it's for a self-driving car, let's say, right? And so it sees stop signs, but what happens to that? So you train the model and everything is okay. It's working well, but when it snows, all of a sudden the stop signs that it's seeing look completely different and so it's not going to be able to predict that this is a stop sign so that's something that when you're monitoring that's what we would call like drift right because something happened and now boom your model is not doing what it said it would be doing yeah i would say traditional software maybe won't operate in that
Starting point is 00:22:05 same way. There's, of course, you know, applications that maybe don't have any machine learning that, you know, you can regularly redeploy them, regularly re-release them, you know, you could do a lot of things, and they're similar in the way that you would think about deploying machine learning models, but like you said, there is, if it's being used for, you know, like, let's say, self-driving car, I guess the stakes are a little bit higher around what it's doing. And there's even a lot of fields that the stakes are really high. I would say for me, in drug discovery, what it really is doing is helping us make the process of finding drugs a lot easier. Hopefully, it's cheaper and that it takes less time.
Starting point is 00:22:43 But those decisions are really important. You know, if we just decide to test these things that costs a lot of money. And so if there's real decisions being made off of the outputs of these models, I don't know. I mean, I'm sure there's apps that obviously work like that. But I guess that added element of the stakes a little bit higher, because of what it's actually doing. And the way to monitor that and the infrastructure around that is a little bit different. It's new, right? You know, this hasn't been around for too, too long.
Starting point is 00:23:10 DevOps and software has been around for a little while, but this new field of machine learning ops came up because it didn't exactly answer all the questions, right? Those old fields were an influence to it, but it just wasn't the exact same thing. And so we had to start developing new ways to think about this. But definitely a lot of overlap, of course, because it comes from that, you know, it developed from these earlier fields. Yeah, it's interesting to me that I see the analogy
Starting point is 00:23:36 being more that, you know, IT needs to be more involved in development of software, and that's where DevOps came from and similarly the developers have to be more involved in the operation of their solutions that they come up with and and then that accelerates the the pace of development and in a similar way it's the same with machine learning and AI generally that you know if this stuff is coming home, IT needs to get involved, needs to be involved, needs to continue to be involved, and the AI, ML community needs to understand the realities of IT operations. And so in that way, they are similar, but generally speaking,
Starting point is 00:24:27 they're not the same thing. That's for sure. I love that you brought up that point. We, machine learning engineers, data scientists, they should learn from these other fields, especially IT. Something that I particularly learned more recently is the need, like security. Security is a big thing now. And I don you know, don't really care about those things because I just want to build cool stuff. But it's so important for me to understand that, you know, that component and why it's important and how to, you know, how does that inform
Starting point is 00:24:55 the way that I build things? Or even maybe not necessarily, I don't have to do the security work, but at least I'm considering that things like privacy. So yeah, I think that it's very important to learn from these other fields, to be cross-disciplinary as much as possible, to learn from others that are dealing with these things on a day-to-day basis. Well, thank you guys very much. I think that that's where we're going to leave it here today.
Starting point is 00:25:17 Of course, we'll be talking about this quite a lot more. Just to wrap up, you know, on this topic, where can people find more of what you're speaking about on the topic of machine learning and its effects on IT operations? Yeah, if you want to dive into the MLOps community, you can see all kinds of good stuff. We've been recording weekly meetups for about six months now. And we've also been doing, like David said earlier, we've been doing coffee sessions, which are a bit more podcast style and everything is on our YouTube channel. It's just, if you search for ML Ops community and that's ML Ops, uh, and you'll see basically the whole catalog, or you can find us on anywhere that you listen to podcasts we're also on there yeah and i would just also add if feel free to reach out to me i love talking to new people if any of you guys are interested in reaching out to me and learning
Starting point is 00:26:18 hey how can i learn more about this stuff uh please reach out and uh you know if i can i'll get back to you and i'll point you to some good resources that, that were useful for me that were really helpful for me in understanding this field, because you know, it's, this is, I'm learning every day. And I want to remind people of that too, that this is, it's, it's a moving target. Things are changing pretty rapidly. And even practitioners who are in the day-to-day you know, trenches are still learning new things. So yeah, it's,
Starting point is 00:26:42 it's an ongoing learning process and very much open to helping people kind of get up to speed or find good places to look because there's so much out there. You know, there's a lot of content and it's not all great in my opinion, but there's, you know, reach out to us and we'll try to point you in the right direction. Yeah, that's a good point. I forgot to mention, if you really want to do a deep dive into this, you can check out our Slack community, which is MLOps Community Slack. If you just go to MLOps.community, you should be able to find the links to everything I just mentioned. Awesome. Thank you. And I have to say that, you know, it's really great to find a community of people that are just so enthusiastic and willing to help
Starting point is 00:27:23 and willing to talk and willing to welcome new people in. And, you know, I really appreciate that. I really appreciate the fact that when I reached out to the two of you, your first answer was, yeah, instead of wait. Yeah, I love collaboration. I know Demetrius does too. And I think that's an underplayed, you know, thing in this space where, you know, know like i just want a last little point like you know that i'm i'm only as smart as the the people who put out great content out there you know i'm completely unoriginal everything that i know has been from other people and i want to give that back i love when someone is is you know writes a very clear article and explains something
Starting point is 00:28:02 that's difficult or someone's easy to talk to and can answer your question. And I think we need more of that. I think we need more openness for collaboration, more people that are willing to teach what they know and learn from others and just having that, you know, sort of open source mentality. I'm definitely for that. I know Demetrius is and the MLOps community certainly is as well. All right. Well, thank you guys so much. This has been a really wonderful discussion. If you've enjoyed listening to this discussion, please do head over to your favorite podcast app and subscribe to Utilizing AI.
Starting point is 00:28:34 We are going to be recording more and more conversations like this. And I have a feeling that we're probably going to have these folks back in the future because it's been a lot of fun. I would love to. Yeah, I would love to. Yeah, this is such a pleasure. Thank you, Stephen. So thank you guys very much for being part of this. Again, this is the Utilizing AI podcast
Starting point is 00:28:53 from Gestalt IT, your home for IT content from across the enterprise. Thanks for listening. you

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.