Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 03: How is Machine Learning Affecting IT Operations with @MLOpsCommunity
Episode Date: September 8, 2020Stephen Foskett discusses practical aspects of enterprise AI with David Aponte and Demetrios Brinkmann. AI offers promise to help IT operations departments deal with the flood of data, since ML is so ...good at finding needles in haystacks. But is it true or just vendor hype? Are current vendors able to work in this space or do we need a new kind of product or vendor to develop AI models? Does the new data demand a different type of infrastructure? And how are AIOps related to DevOps? Find Stephen online at GestaltIT.com and on Twitter at @SFoskett. Find David Aponte online at LinkedIn.com/in/AponteAnalytics. Find Demetrios Brinkmann online at LinkedIn.com/in/DPBrinkm and on Twitter at @MLOpsCommunity This episode features: Stephen Foskett, publisher of Gestalt IT and organizer of Tech Field Day. Find Stephen's writing at GestaltIT.com and on Twitter at @SFoskett David Aponte, Machine Learning Engineer. Find David online at LinkedIn.com/in/AponteAnalytics Demetrios Brinkmann, Community Coordinator. Find Demetrios online at LinkedIn.com/in/DPBrinkm and on Twitter at @MLOpsCommunity Date: 09/08/2020 Tags: @SFoskett, @MLOpsCommunity, DavidAponte, DemetriosBrinkmann
Transcript
Discussion (0)
Welcome to Utilizing AI,
the podcast about using AI in the enterprise.
I'm your host, Stephen Foskett.
Today, we're talking about using AI as part of IT operations.
As I said, I'm Stephen Foskett.
You can find me at gestaltit.com.
I also am the organizer of the Tech Field Day event series, and you can find me on Twitter at sfoskett. You can find me at gestaltit.com. I also am the organizer of the Tech Field Day
event series, and you can find me on Twitter at S Foskett. And I am David Aponte. I work as a
machine learning engineer for Benevolent AI, and I'm also a host of the coffee sessions in the
ML Ops community. You could find me on LinkedIn, David Aponte, A-P-O-N-T-E. Glad to be here. Hey, everybody. I'm Demetrius Brinkman. I am the organizer of the
MLOps community, and you can check us out at MLOps community on Twitter or find me on LinkedIn,
Demetrius Brinkman. So in enterprise IT, the whole operations world has forever been flooded by applications trying to basically accelerate the job of humans doing the everyday running of IT.
The problem is that those jobs tend to be sort of inglorious. They deal with a lot of data. There's a lot of things that people are missing. And AI promises to really help that.
And so today we're going to talk a little bit about the various ways that AI is coming,
you know, getting real, coming into the enterprise through the operations perspective.
And I wanted to start with Dimitrios because, I mean, heck, you run the MLOps community.
I mean, this is your thing.
And I know that this is your background as well.
So tell us a little bit about how AI affects IT operations. Yeah so I think it could be nice to just start
and explain a little bit about what MLOps is and how it differs from DevOps a little bit and
David can keep me honest on this point but. you have a lot of similarities and crossover when it comes to MLOps, the operations side of machine learning.
But there are quite a bit of differences because MLOps or machine learning is just a different beast. And so what you need to focus on when you're looking at MLOps are you
have data, you have models that are being put into production and models, I mean, like machine
learning models that are being produced, they're being trained. And then once they get put into
production, it's not just as simple as maybe pushing software into production. And when it's
out in production, you have those day two operations like monitoring,
and you can find all kinds of bugs
because it's not so easy when you're monitoring a system
for let's say, just regular infrastructure,
you can tell is it working or is it not?
And when you're monitoring machine learning, you have many more factors that come into play.
It might be working,
but it might be giving you the wrong recommendations
if you're using a recommendation model,
or it might be giving you all kinds of,
it might just have drifted.
We call that like a model drift,
which means that it's gone
and it's done something that
it shouldn't be doing. And when you originally trained it, you said it was going to do this,
or you thought it would do this, and then it stopped doing that. So I'm not sure if David
wants to add any other points on what makes it different. I would also just add, it's just that
element of data where the data is actually what powers the machine learning. So it actually makes it useful. It learns from the data. That's essentially what a
machine learning model is, something that can learn from the data.
And another, just to touch upon the point of how that relates to IT operations, how AI is changing
that, I think it's affecting a lot of areas, like quality assurance, service management,
so chatbots, things of that nature, process automation.
So it's definitely making use of all that data that maybe before had to be manually curated
and kind of looked through. But now, if we have all this data, we could actually learn from it
and make it useful and teach it how to do specific tasks. So I would just say that it's actually
affecting a lot of different areas, areas that may be easily automated.
And some are more difficult than others because of that human component.
But there's even the possibility of a collaboration between AI and humans.
You know, this field called active learning, where you're not having human in the loop.
And that's a very important field, too.
But yeah, in general, I think it's changing a lot of things.
It seems to me like this is one of the low-hanging fruit, as they say, of applications for AI, in that, you know,
one of the things specifically ML is good at is being trained to, you know, find the needle in
the haystack, you know, being trained to look through a large volume of data and find the
outliers, find the unusual things. And then more interestingly, I mean,
you could do that with a filter,
you could do that with rules,
but more interestingly is that ML can not only find
the unusual things, but find the unusual things
that you don't even know are unusual.
Find the odd balls that you might not have trained
the system to look for, which in areas like, you know,
one of the big applications for this, for ML early on is security, is it's just tremendously valuable
because, you know, you may not know, or in fact, you probably don't know what the bad guys are doing
to your system in order to get in, But ML might be able to say, hey,
that's unusual. Is that the right picture? Is that the right way to look at the easy applications for
ML? I would definitely say so, yeah, because it's good at doing things that are repetitive.
It's good at doing things that are, I guess, reproducible. That's what it's
really essentially doing is finding patterns in your data. And these are typically patterns that
you wouldn't be able to identify, you know, just by looking at it or by even sorting through it.
And so it allows you to, it augments your intelligence. You know, I think even at my
company at Benevolent AI, that's what we say about how it's affecting drug discovery. It's augmenting our ability to find hypotheses.
So, you know, anyone can come up with a hypothesis, right?
But the search space may be so large that it's not feasible to do by hand or even by a human.
And when it comes to something like that, I think that's where ML is particularly useful.
When it could actually automate something that is typically, you know, requires a lot of data, a lot of parsing is maybe,
you know, hard to spot these small little nuances. And yeah, I think, yeah, that's definitely a very,
like you said, a low hanging fruit of where it's actually useful when anything that can be
automated pretty easily. And I think it's funny that you had mentioned the security space. I have
a friend that I was just talking to Ahmed at SafeAI, and he was saying what his product does is it makes data self-aware so that in case things start going haywire or in case data starts acting like it shouldn't be acting, they look at it and they say, hey, is this a security breach?
What's going on?
And he's doing that with AI.
So is this a security breach? What's going on? And he's doing that with AI. So is this realistic though? You know, I mean, that was always my thought when I was seeing
companies, you know, present things. So for example, we do a security field day event. And
at the security field day event, we had some companies talking about how they're using ML
to do just simple things. You know, we're just monitoring the logs to see if anything unusual shows up.
Is this realistic?
I mean, you guys know this, you know,
the world of models and AI and ML and deep learning
and all this better than me.
Is it really true that the systems will discover this stuff
or is it just more hype from the vendors?
That's a great question.
Personally, I do think that
there is a lot of hype around AI, around what it's capable of doing. As someone who actually builds this stuff, I know its limitations. And I will say it's only as good as the infrastructure
that you have in place, the data that you're actually feeding it. So it's only, I guess,
it's feasible when you have a lot of these things in place.
Like I mentioned the infrastructure if you have the ability to collect data,
that's actually going to be useful for the model.
If there's some versioning around that,
there's a way to organize it in a way that's useful and allow other people to
make even make it even more useful,
which in machine learning could be feature engineering is where you take a
bunch of raw data, but you turn it into features or inputs that are actually going
to be useful for the model. And it's very difficult to even get to that stage if you can't, you know,
you don't have your data in one place. So I think it is possible, but it's not the type of thing
that you can just throw at a problem and expect it to work. You have to have some infrastructure
in place. You have to have even a cultural perspective on how to manage these things in place. And you also have to have people that
understand how to, you know, regulate some of these things because, you know, AI by itself
could learn bad things. It could learn bias. It could be abused, you know, and that's commonly,
you know, like for example, with fake news, you know, there's some really good algorithms that
are good at generating fakes and those are hard to actually detect. So I would say, yes, you know, it's definitely doable, but you have to have certain things in place. And
we could talk a little bit more about that if you'd like. But in general, I would say it's
only as good as the infrastructure and the people and the data that you have in place.
Yeah, to that point, I think it's really interesting because if you look at, I was talking to a VC who specifically invests in
machine learning startups and his whole thing was, it is really difficult to base your business
around machine learning because not only are the people that create the models and the engineers very expensive but the actual ability to get it
right to be able to have a model that continuously gets things right when it is you say okay i want
ai or i want machine learning to do this and then you go out and you build it and you can
continuously just like consistently make that model do that,
it's very difficult to do that. So there are a lot of companies that set out with great
expectations, but they get a bit burnt in the end. So I would say beware.
That actually brings to mind a question that I was thinking of is, is this kind of the opportunity
for a new kind of vendor? In other words,
instead of companies developing their own models for security or for monitoring or management or
something, is there the opportunity for an entirely new kind of product vendor that is,
that consists of, and of course this is something that exists, but a new to IT kind of product vendor company or something to be doing this? Like,
it would be better for them to rely on a vendor that specializes in network security or process
automation or whatever to bring them a model. Is that practical? Yeah, I think there's some
people in this space, like know i'm thinking about you know
ml for ml uh there's already people that are trying to automate machine learning and make
it super easy where it's at the click of a button you can actually provide useful results uh you
know i think of aws which has a lot of you know uh you know fully managed services google gcp google
has a lot of fully managed services so there's already people that are thinking about that exactly.
How can we make it even easier so that you don't have to have a whole team of engineers
and data scientists and AI scientists managing this thing?
Is there a way that we can use things out of the box and apply them?
I think so, for sure.
But I would also say that I haven't seen that in my own experience work without people actually managing that.
But I do think that over time, we will get better.
You know, it's like the more that we learn how to automate certain things, you know, the more we'll learn how to make it even easier for users to consume these models or to, you know, utilize this technology and embed it into their services.
But yeah, feeding on the point that Demetrio said about, you know, it's hard for some companies to kind of, you know, throw ML at their problem. And when you try to
build a company around just ML, it's a bit difficult. It's usually better when you focus
on your domain and the problem that you're trying to solve, and then think about how AI can actually
augment that can make it better. But, you know, looking at it from the perspective of just kind
of making everything easy, or it's like solving all your problems, I don't think that's really the best way to look at it.
I would caution that it's something that can help you,
but you have to have certain things in place.
And this even applies to those fully managed solutions.
You still have to have some human in the loop managing that because again,
AI on its own can learn bad things and can actually, you know,
cause problems.
And some of them can be hard to spot if you don't know how to really look
into them.
And I'm sure maybe people will come up with solutions for that as well.
But generally, I would say that, yeah, that's something that's a work in progress.
Yeah, and to that vendor point, I think right now what we see in the space are companies like AutoML companies
that are trying to do something like that. They're trying to make
it easier for the company. That's not necessarily willing to go out and hire a huge data team and
get all of this infrastructure team for machine learning. They're, they're trying to help with
that. Uh, that being said, yeah, it's not there yet. Um, you, there's a lot of people that are finding a lot of good uses
from it i think but when you compare that to an actual team it's gonna like the team is going to
be much better in 99.9 percent of those use cases so let's talk about another thing um that you know
affects all this and that's sort of the infrastructure component.
David, I know that one of the things that you've talked about a lot is this whole world of that AI and ML, it's a different kind of data than companies are used to.
They're used to dealing with, I mean, they like to talk about, oh, we're data-driven, or we've got large amounts of unstructured data or something.
This is an entirely different kind of data. Tell us a little bit more about that.
Sure. I think, you know, maybe to clarify a little bit, it's not so much, it's like an entirely
different type of data. It's just the way that you deal with it and the way that it needs to
be processed for it to actually be useful for the machine learning models that requires some care
and requires some science. You know, it's not just an engineering problem. And even before you get to the part where you
actually like do some cool things with it, you have to actually manage it and collect it. That's
the first challenge, you know, and depending on your use case, you know, for example, if you're
a website where your machine learning model is making predictions in real time, or I could think
about, let's say some IT solution that's actually monitoring,
you know, a real life system and needs to give you real alerts.
That needs a database that can actually have low latency,
that can actually pull that data in a way that's actually, you know,
good for the model and it doesn't take a long time.
But that's a little bit different from, let's say, like a data lake or a data store
that is going to be used for training the model, developing it,
but it can store a lot more things.
It doesn't need to be as fast.
So considering the different types of databases that you may need is an issue.
Also, collecting the right type of data,
knowing how to, I guess, find things that are useful.
And then the other component is once you get to the part where you have,
let's say, a bunch of raw unstructured data
that you think you could turn into something more useful is actually engineering it for inputs like feature engineering and making it even more useful, more structured, more specific for the algorithm.
Depending on the algorithm that you're using, there's know a mixture of different things there's not really a uh there's not really
a well-defined way to do these things but the the component around knowing what to collect how to
collect it that it it's it's you're considering some of those uh components of the speed right
at which you're getting it the amount that you could store um and also you know that component
of just making sure that you're actually processing it in a way that's useful so again it's not so
much that it's not the same data because i think of let's say just you have like some sensors that
are just recording everything right that could actually be turned into a useful uh you know set
of features for a model but you may have to process it normalize it standardize it uh i think about
some of the things of like let's say if you have a bunch of missing values, what's a smart way to impute those missing values?
And that's where there's some more thought that requires, it requires some more thought than just
kind of just putting it into a model. Yeah. And I've heard a statistic that, you know,
80% of a data scientist's time is doing that with the data, like transforming the data or
cleaning the data. And the other parts are spent doing everything else, but that is a majority of
your time. And just one little tangent that I wanted to mention before we jump onto the next
question. I was talking to this guy Vin yesterday on the MLOps meetup and he was really
like a big proponent on monetizing ML and making sure you're getting ROI from this ML. And he
talked about how, you know, if you don't have the data that you need to build a model and you need
to go out and gather that data, that is a cost. And you need to look at
that very clearly. And he did a lot of, he made some very poignant points, poignant points on
this, on this idea of really getting your model to be making you money from the very beginning.
So one final thing that I'll throw out here is,
especially in enterprise IT,
where some of these terms are a little bit foreign and unusual and so on,
I could forgive people for hearing MLOps
and thinking DevOps
and thinking that these are somehow,
well, if not the same thing,
at least the same category of things.
But that's really not true, right?
Yeah, there are definitely some big differences.
We had an actually an episode about this a little while ago
with Ryan Dawson, so that's Ryan Dawson from Selden.
But what we talked about was, you know,
there's this added, so let me just talk about a
high level, some of the differences. So one is that you have data. So typical, you know, software
development release, it doesn't have a huge data component. Like if you're storing your data in a
GitHub repo or a GitLab repository, the data is not going to be stored there. It's going to be
elsewhere. And so releasing, you know, data and thinking about that alongside the code itself is
an additional challenge that needs a whole, you know, different set of practices.
There's also this component of randomness, you know, algorithms sometimes have, they learn things and it's not always reproduced, it's not deterministic.
And what I mean by that is there, like, for example, a deep learning model has some weights that it initializes randomly and it may be different every time.
So there's that other component where you can't always reproduce the same results exactly in the same way, especially when you
have a black box model, like a deep learning model. And then there's the infrastructure in
place. We were talking earlier about the need to monitor it when it's in production. You can't
really monitor it in the same way as, you know, typical software because of the way that it's,
what it's actually doing. It's generating predictions. You need to make sure that those
predictions make sense,
that there's some way to validate
that their quality is good, that they're not drifting,
that they're not changing because maybe the inputs change.
Like let's say, if you have some sort of seasonality,
that's a part of your data set.
All of these things need to be considered.
And it's not that DevOps doesn't think about those things.
I think they're very much related.
So I would say, even at my company, the DevOps engineers, they know a machine learning, they're very familiar with it,
but their specialty is around just the software component. An MLOps person, or let's say someone
that works in an ML infrastructure team, they're thinking about that same sort of software, but
then there's also added element of complexity with the data, with the infrastructure, they need to
monitor it in real time, or even an offline. So there's these other layers of complexity
that make it a little bit different.
But I would say, you know,
if you're familiar with DevOps, that's awesome.
And it can give you a lot to work with.
So I don't want to, you know,
say that they're so different in the sense that,
you know, there's no comparison,
but there are some differences, you know,
the randomness, the data, things of that nature.
For me, it's easier to understand
some of this stuff with stories or real life examples. And when you talk about monitoring and okay, maybe something happens. And so now the model is not doing what it said it would do. in my mind, which was, let's imagine that you have a model or some kind of AI that is looking at
different pictures and it's for a self-driving car, let's say, right? And so it sees stop signs,
but what happens to that? So you train the model and everything is okay. It's working well,
but when it snows, all of a sudden the stop signs that it's seeing look completely different and so
it's not going to be able to predict that this is a stop sign so that's something that when you're
monitoring that's what we would call like drift right because something happened and now boom
your model is not doing what it said it would be doing yeah i would say traditional software maybe
won't operate in that
same way. There's, of course, you know, applications that maybe don't have any machine learning that,
you know, you can regularly redeploy them, regularly re-release them, you know, you could
do a lot of things, and they're similar in the way that you would think about deploying machine
learning models, but like you said, there is, if it's being used for, you know, like, let's say,
self-driving car, I guess the stakes are a little bit higher around what it's doing.
And there's even a lot of fields that the stakes are really high.
I would say for me, in drug discovery, what it really is doing is helping us make the process of finding drugs a lot easier.
Hopefully, it's cheaper and that it takes less time.
But those decisions are
really important. You know, if we just decide to test these things that costs a lot of money.
And so if there's real decisions being made off of the outputs of these models, I don't know.
I mean, I'm sure there's apps that obviously work like that. But I guess that added element of the
stakes a little bit higher, because of what it's actually doing. And the way to monitor that and
the infrastructure around that is a little bit different.
It's new, right?
You know, this hasn't been around for too, too long.
DevOps and software has been around for a little while, but this new field of machine
learning ops came up because it didn't exactly answer all the questions, right?
Those old fields were an influence to it, but it just wasn't the exact same thing.
And so we had to start developing new ways to think about this.
But definitely a lot of overlap, of course,
because it comes from that, you know,
it developed from these earlier fields.
Yeah, it's interesting to me that I see the analogy
being more that, you know, IT needs to be more involved
in development of software, and that's where DevOps came
from and similarly the developers have to be more involved in the operation of
their solutions that they come up with and and then that accelerates the the
pace of development and in a similar way it's the same with machine learning and
AI generally that you know if this stuff is coming home,
IT needs to get involved, needs to be involved, needs to continue to be involved, and the AI, ML
community needs to understand the realities of IT operations. And so in that way, they are similar, but generally speaking,
they're not the same thing. That's for sure. I love that you brought up that point. We,
machine learning engineers, data scientists, they should learn from these other fields,
especially IT. Something that I particularly learned more recently is the need, like security.
Security is a big thing now. And I don you know, don't really care about those things
because I just want to build cool stuff.
But it's so important for me to understand that,
you know, that component and why it's important
and how to, you know, how does that inform
the way that I build things?
Or even maybe not necessarily,
I don't have to do the security work,
but at least I'm considering that things like privacy.
So yeah, I think that it's very
important to learn from these other fields, to be cross-disciplinary as much as possible,
to learn from others that are dealing with these things on a day-to-day basis.
Well, thank you guys very much. I think that that's where we're going to leave it here today.
Of course, we'll be talking about this quite a lot more. Just to wrap up, you know, on this topic, where can people find more of what you're speaking about on the topic of machine learning and its effects on IT operations?
Yeah, if you want to dive into the MLOps community, you can see all kinds of good stuff.
We've been recording weekly meetups for about six months now. And we've also been
doing, like David said earlier, we've been doing coffee sessions, which are a bit more podcast
style and everything is on our YouTube channel. It's just, if you search for ML Ops community
and that's ML Ops, uh, and you'll see basically the whole catalog, or you can find us on anywhere that you listen to
podcasts we're also on there yeah and i would just also add if feel free to reach out to me i love
talking to new people if any of you guys are interested in reaching out to me and learning
hey how can i learn more about this stuff uh please reach out and uh you know if i can i'll
get back to you and i'll point you to some good resources that,
that were useful for me that were really helpful for me in understanding this
field, because you know, it's, this is, I'm learning every day.
And I want to remind people of that too, that this is, it's,
it's a moving target. Things are changing pretty rapidly.
And even practitioners who are in the day-to-day you know,
trenches are still learning new things. So yeah, it's,
it's an ongoing learning process and very much open to
helping people kind of get up to speed or find good places to look because there's so much out
there. You know, there's a lot of content and it's not all great in my opinion, but there's, you know,
reach out to us and we'll try to point you in the right direction. Yeah, that's a good point. I
forgot to mention, if you really want to do a deep dive into this, you can check out our Slack community, which is
MLOps Community Slack. If you just go to MLOps.community, you should be able to find the
links to everything I just mentioned. Awesome. Thank you. And I have to say that, you know,
it's really great to find a community of people that are just so enthusiastic and willing to help
and willing to talk and willing to welcome new people in. And, you know, I really appreciate that. I really appreciate the
fact that when I reached out to the two of you, your first answer was, yeah, instead of wait.
Yeah, I love collaboration. I know Demetrius does too. And I think that's an underplayed,
you know, thing in this space where, you know, know like i just want a last little point like you know
that i'm i'm only as smart as the the people who put out great content out
there you know i'm completely unoriginal everything that i
know has been from other people and i want to give that back i love when
someone is is you know writes a very clear article and explains something
that's difficult or someone's easy to talk to and can answer your question. And I think we need more of that. I think we need more
openness for collaboration, more people that are willing to teach what they know and learn from
others and just having that, you know, sort of open source mentality. I'm definitely for that.
I know Demetrius is and the MLOps community certainly is as well. All right. Well, thank
you guys so much. This has been a really wonderful discussion.
If you've enjoyed listening to this discussion,
please do head over to your favorite podcast app
and subscribe to Utilizing AI.
We are going to be recording
more and more conversations like this.
And I have a feeling that we're probably going to have
these folks back in the future
because it's been a lot of fun.
I would love to.
Yeah, I would love to. Yeah, this is such a pleasure. Thank you, Stephen.
So thank you guys very much for being part of this. Again, this is the Utilizing AI podcast
from Gestalt IT, your home for IT content from across the enterprise. Thanks for listening. you