Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 09: Deploying AI Models in the Enterprise with @DataCereal
Episode Date: October 20, 2020Stephen Foskett discusses the practicalities involved in packaging, deploying, and operating AI models with Manasi Vartak of Verta. Deploying an AI model in production is a challenge, just like it was... in the past with software. Once a company has an AI model to deploy, they must validate its results, create scaffolding code to make it consumable, optimize the data pipelines, instrument it, and assign operators. This is what Manasi and Verta have developed, and the world of AIOps parallels that of DevOps but with some unique twists. The data component of AI models presents a unique challenge not found in some other enterprise applications, and it is important to continually test the model to ensure that it hasn't drifted off target as data changes. Previously, training models was the main challenge for AI, but now it's all about getting things into production. That's why we started this podcast and why we created AI Field Day! This episode features: Stephen Foskett, publisher of Gestalt IT and organizer of Tech Field Day. Find Stephen's writing at GestaltIT.com and on Twitter at @SFoskett Manasi Vartak, CEO and Founder of Verta (@VertaAI). Find Manasi on Twitter at @DataCereal Date: 10/20/2020 Tags: @SFoskett, @DataCereal, @VertaAI
Transcript
Discussion (0)
Welcome to Utilizing AI, the podcast about enterprise applications for machine learning,
deep learning, and other artificial intelligence topics.
Each episode brings experts in enterprise infrastructure together to discuss applications
of AI in today's data center.
Today we're discussing packaging and deployment of models in the enterprise.
First, let's meet our guest.
Hey there.
My name is Manasi Virta.
Thanks so much for having me.
I am the founder and CEO of Virta.
We are an enterprise infrastructure company helping teams ship ML models in a way that we're accustomed to shipping,
monitoring, and operating regular software. So super excited to be on this podcast. And
I'm sure we'll talk a lot more. But you can find Virta at virta.ai on the web and on Twitter as at the rate Virta.ai.
Great. Thanks a lot. And I'm Stephen Foskett, organizer of Tech Field Day and publisher
of Gestalt IT. You can find me on Twitter at S. Foskett. Now, I've been working in enterprise
IT for a long time, long enough to see the emergence of cloud and the emergence of AI.
And that's exactly why I was excited enough about AI to start this AI podcast, because I see so much promise in this. And I see
so many companies starting to think about how are we going to utilize AI? How are we going to bring
models into our space? And how are we going to deploy them? Now, Virta is doing some interesting
things that are analogous to sort of enterprise package management, application performance management, and operations
tasks, except focused on AI models. Maybe you could start, Manasi, by just telling us a little
bit more about sort of what the product is. Absolutely. So a quick high-level view on the model lifecycle and then going into what we do.
So when you're building an ML model and you want to get it into production, there are several steps
involved. First, you get your data together, you munch it, you get it into the right format,
then you train the model. That's where your TensorFlow, Pyflow pytorch and all of those great libraries come in
and once you have a model there is the there's a host of steps that need to happen before you can
run it in production so this is very similar to what we used to do with software we want to
validate the model results so testing we want to write scaffolding code that's going to transform
the model into something that can be consumed by a host of other services.
We want to optimize the model, build pipelines to get the data to the model, instrument them to monitor how they're performing,
and also assign engineering headcount to make sure that the model is actually working as expected. So what we do at Virta is we take
a trained model and we remove all of the heavy lift involved in packaging, deployment, operations,
and monitoring so that these models can be shipped to Virta and then Virta takes care of running them
in the Virta inference engine and making sure they're working as expected.
Yeah. And that's the interesting thing because indeed, as you said, you know, it used to be that companies wrote their own software and then eventually they kind of moved to what I guess was
sort of called the shrink wrap software model. But even there, companies, you know, it wasn't
like really ready to go, even if, you know, you bought a product, companies, you know, it wasn't like really ready to go,
even if, you know, you bought a product for deployment, you know, you have to do a lot of
tuning. But also, you had to think about packaging, you had to think about deploying and infrastructure,
and then you had to think about monitoring and management and so on. It really sounds like
exactly the same thing, except for AI. That's such a great observation.
And that's why this space is getting termed as MLOps,
very similar to how we saw DevOps.
And the goal with DevOps is really ship products faster,
more reliably.
We want to do that with ML.
What makes MLOps different from DevOps is a few things. One is the users of the system are data
scientists who are actually very different from software engineers. These
folks, and I used to be one, I used to write code, write models for Twitter
actually, so the data scientists we're talking about are really statisticians.
They're great at math. They're not necessarily thinking about,
how do I scale a system?
Or how do I make it reliable?
And so while dev and ops were still fairly, I would say,
similar, data scientists have a very different skill set.
And that requires a different view
on how the product is designed.
And our big thesis with Virta is let data scientists be data scientists.
We don't need them to know Kubernetes or to know Docker. They're great at building models,
let them do that, and let the software take care of the plumbing. So that's one. A couple others
that are unique are there's a ton of heterogeneity in data science workflows as well as frameworks so this could be
you can use python or scala to build your models you might want to run them in a batch processing
fashion or a real-time fashion you might need to run it in a serverless way containerized way so
there's a lot of heterogeneity, which is great
for innovation, not so great for production that you need to account for. And finally,
one thing that I think software teams also face is there are silos of different data science teams
and bringing all of the models under one roof so that they can be governed by the same IT processes and
use a basic infrastructure is another unique challenge we see with MLOps.
Yeah, it really does sound like my experience when I was working in large enterprise IT
organizations, because you would have multiple development teams, you know, you would have,
you know, maybe lines of business with their own, you know, IT developers. And everybody wanted everything to be able to run in enterprise IT
operations. But now, as you say, we've got the added wrinkle of sort of these new, you know,
next generation infrastructure. So many companies are deploying their own, you know, on premises
cloud or their own, you know, infrastructure to their own infrastructure to support next generation applications.
And many are also using cloud service providers as well for these, which this is something,
frankly, that the conventional developers are still coming to terms with.
How do we package and deliver products
in this sort of hybrid cloud mode.
And AI is born into that.
I mean, ML models, they exist in this space.
And so that seems to me like a really big focus
for you as well, and for the people deploying these models?
100%. With AI and ML, libraries that are getting used to build these models get updated so
frequently. It's every few months. And if you are an IT organization, you want to keep
very close eye on what libraries are being used and they need to be patched they need to be secure so a lot of use cases we see with
deploying ml or ai models and enterprises
really have to do with are the libraries that are being used
for these models are they the ones that are vetted by it or do we need them to apply a patch and how do we make that seamless for
data scientists who might not even be aware of the fact that, hey, there was a security
vulnerability with this particular library you're using?
Yeah.
And it's just like deja vu talking about these things.
I mean, frankly, I think that you could know, you could basically read, you know, almost any book from sort of the emergence of DevOps and just substitute and replace ML Ops,
and a lot of the same issues would come up. Yeah. Although I would say that I think the
monitoring piece is turning out to be different for deploying ML models. And that really has to do with the data component,
which I think is new with models because regular software didn't have to do as much with,
is the data changing and how do we respond to it? Absolutely. And so let's talk a little bit
more about that because I think that that's one of the things that's going to be surprising.
So a lot of my, like I said, my enterprise audience might be listening and saying, oh, okay, I got this. It's the same as it used to be, except this is some new kind of application.
Maybe it needs some new hardware, whatever.
But it's not like that, right?
I mean, because there's, you know, you have to think about the data as well as the model.
And I think that a lot of people aren't familiar with that. So maybe, I don't know, you know, summarize to caveman IT guy, what does an ML application
look like in this context? What does it contain? That's a great question. Okay. So if we think
about an ML application, I would say it has three odd components.
The first one is the model itself that you can think of as a set of numbers and basic function that's taking in a set of numbers, applying a function to it, and then producing a set of numbers.
So that's our model.
Then there is the scaffolding code around it, which might be a web server that was serving the
model or it could be a pip install of a library and then the third really
important component is the data and when we train the model we've actually fit
the model to the particular nature of our training data so this might be in
the simplest case suppose that your model was a
linear model like AX plus BY plus C. Suppose that's a model that you fit that model to your
data. However, once you start seeing the real data, you realize that the relationship is not linear.
It might be a quadratic relationship or something much more funky. In that case, the relationship is not linear. It might be a quadratic relationship or something
much more funky. In that case, that model is not a good description of the data anymore.
And so as your underlying data changes, the efficacy of a model or how well the model is
going to predict the future also changes. So in addition to measuring regular things like CPU usage and
throughput and latency, for an ML model you need to pay really close attention to,
does my data have more zeros than it did before? Does it have more null values? Or is the
distribution of a particular feature or a column that's being fed into this
model, is it the same over time or has it changed, which might indicate that I need to
retrain the model or even remove a model from production because it's not going to work for
this new kind of data. Yeah, I think that's one of those kind of, I don't know, spooky things in a
way, because, you know, I mean, although these things are, it wouldn't be fair to say that
they're not deterministic.
They are, you know, they can vary quite a lot based on the inputs you get.
We've been talking about that the last few weeks, actually, in terms of understanding
that, you know, it's only as good as you feed it into.
And basically, if you create a model that's intending to, you know, churn through this haystack to find a needle,
and then you give it a different kind of haystack to find a different kind of needle,
not only might it not find the needle, it might find something else.
And, you know, we really have to be very, very careful about how we're using these things.
In a way, there's a political issue, sort of, I don't mean politics, you know, capital P,
but I just sort of mean like an internal business political issue here too. And that's that, you know, IT has to understand how to use these things
and has to basically open this dialogue between the lines of business and the ML folks
in order to have the conversation about,
is this the proper model
and are we using it in the right way?
100%, oh, so many great points there.
I completely agree.
One of the really exciting things
about building an ML infrasystem in general
has to do with the stakeholders that you mentioned.
So it is IT that is really running these models and making sure they're working as expected,
but it is the data scientists and the lines of business that are the domain experts.
And so as we talk about model monitoring, it's actually the data scientists who are in the best
position to make sense
of whether a model needs to get retrained or it needs to be taken out of commission.
And they need to do that in association with the domain experts, the lines of business
owners.
So that's where we've seen a big need from the partners that we work with to have different
views of the system, something that's
really tailored towards data scientists that helps them get a pulse on whether their models
are behaving as expected.
Our IT folks really care about the SLAs and whether they're getting met and the environment
that we talked about, how the libraries change and the lines of business owners really care
about, all right, how is this affecting my key KPIs?
And is this model still valid?
And so we're seeing that more and more,
like for instance, with COVID,
some of the key sort of credit scoring features,
maybe it's your FICO score or something,
are not as relevant anymore
because the sort of repayment patterns
of credit cards have changed a lot. And so the
features that we used to trust in these models are no longer valid. And so they have had to be
updated very, very fast. And without a monitoring system, it would have even been hard to figure out
that the reason why the credit models weren't doing so well was because of, say, a FICO
score or some proxy for that. Yeah, man, that, you know, it's funny because, you know, the last
episode we were speaking about something very similar to this and sort of like the social
justice implications almost of many of these things. And then you, you know, I expect to be
talking about IT operations and, you know, here we are talking about the pandemic and how, you know, how things can change. I mean, it's, it
really is pretty remarkable what comes out of these systems and how, how this affects more than
just the business, more than just, you know, the, the, the, is this thing working or is it not
working? Is it performing well, or is it performing well? But actually it affects, you know, real people doing real things. I wonder what, you know,
what kinds of applications like this, I mean, you mentioned, you know, credit scores,
what kind of applications do you think people are putting these things to right now?
Oh, it's across the board and that's what makes it so interesting, both from an IT perspective,
just in terms of what are going to be different ways
in which the model gets deployed and the end applications.
Like the most common ones are still recommender systems,
search systems and spam, fraud, things like that.
More and more, we see things that are,
you're doing NLP on comments, you're doing NLP on text. And if folks have been following the GPT-3 conversations online,
these models are so dependent on the kind of data that they were trained on. And so bias creeps in
very, very quickly without the model developer, or even more importantly, the consumer realizing.
And that's where models are unique in that they were built to be used in this particular environment.
And then we just pick them up and throw them into a different environment.
So it's like an analogy might be an ecosystem. You pick up a organism from one ecosystem and then you put it
into a very different ecosystem and you expect it to work and it's not going to, or it's going to
have disastrous consequences. And so it's, we're just coming to terms with how do we put in the
processes, particularly governance processes to make sure these models are used in a responsible way.
Yeah, absolutely. And I'm glad that you mentioned GPT-3 because that's, again, one of the things we were talking about last week. And, you know, it is shocking how good it is and how seductive it is.
You know, you look at these things and you say, wow, this thing is just like a person. It is not a person. It is not just like a person.
And, you know, I mean, stepping back into the world of IT, even something as seemingly
straightforward, as you mentioned, as like a credit scoring algorithm, you know, it may have
no idea how to deal with data that isn't normal, that isn't the sort of thing that it would
be expected.
Now, how do we deal with that though, I guess from a software functional perspective?
Is that the sort of thing that you're trying to attack?
Absolutely.
That's a great question.
So I think with DevOps, and I'm going to draw heavily on that, it's people, processes, and
tools or technology.
So what we can do with a system like Virta is that we can provide the tools that can help
the operators or data scientists make the right decisions. So we're the people who can tell you,
hey, your model used to be producing, say, loan approvals that were in the 70th that was
70% yeses most so 70% of the predictions were yes and now we're seeing that the
number of predictions that were yeses have gone to 30% and so we can provide
statistical warnings and alarms around that. And we have this unique position
where we can see the best practices on how to monitor different kinds of models and make them
available to a large set of customers. And that's the advantage of being a vendor, because you can see how do you monitor an NLP model versus how
do you monitor a recommender systems model,
and then out of the box provide those types of libraries.
The domain expertise is still going
to reside with the people who know the data the best.
Or if you're an economist, you're
going to know a ton more about the FICO
scores and how to adjust for models using FICO scores and so what we can do for an economist
is to tell them hey this model is behaving in a way that's unexpected here is the distribution
here is maybe even what we think are the reasons, really pointing to the features that might be involved or combinations of features.
It might be that this model has stopped behaving as expected for people with FICO scores in this bracket and maybe in this age range. a lot of the analytic heavy lifting, but leave the ultimate interpretation
and the decision on mitigation up to the domain expert.
I think that's something that we feel strongly about
is a system shouldn't be making those automated decisions.
We just don't have enough of the domain knowledge quite yet.
Yeah, I would think so because I mean, in a way,
the answers are only gonna be as good as the questions and the questions are only as good as understanding the domain.
And so I guess that would be my question for you then is, how do we know what questions
to ask?
How do we know if we should see if our credit approval system is failing people from certain
zip codes or certain ages or something?
Is there, I guess, in best practices, it's not the right
word, but like, how do we know what questions to ask in order to profile the performance of
these systems? Oh, I love that. And I love that for a very particular reason. So my master's thesis
was actually on this thing called automated data visualization. So you ask a question and it's
going to find you the graphs that answer the question for you. And what we realized very, very quickly there was that
the realm of questions you can answer is combinatorial. So you can look at whether
it's column one and column two together that are special or unique is a column one through 10 that are really causing this challenge.
And you run into a lot of very mathematical issues that have to do with multiple hypothesis
testing. I'm getting too much into theory here. But like the more tests you do, you're more likely
to find spurious correlations. So it actually becomes a fairly challenging problem where the
best way that we found was to do a limited number of tests and go by heuristics that we had
developed over time on what were the most obvious or most frequent reasons for a model to fail or a visualization to produce a particular result
and then use that or provide that to the domain expert in order to inform their exploration
so you can provide your domain expert a set of let's call it 20 x explanations that are fairly
high level they can pick what explanation seems more aligned with their intuition, and then you
can go down further into that, let's call it branch of the tree, and keep exploring. But I think that
that's a question that needs to be, it has serious statistical implications downstream, and so
automating it completely comes with a host of challenges. In a way, it seems that if models, one of my feelings is that ML is really good at finding
needles in haystacks.
I know I said that before.
ML is good at finding things that are outliers.
That's why it's so good with log management and security, for example, because you don't
have to tell it what's unusual.
Because you can just say, find me the unusual stuff, and it starts to try to find things that don't match its preconceived notions.
Now, that doesn't mean that it's going to not have a false negative rate, and it doesn't mean even that the false positives are going to be useful, but at least it's good at finding outliers.
Are you saying that you are basically
sort of meta using ML here? Are you using it to assess itself, to assess the output?
Basically, are you an ML user? Oh, using ML to probe ML. Let's have the robots write the robots. Exactly. I think it's a mixture of heuristics
plus doing a bit of meta-learning, as you were just saying. And the funny thing, like outliers
is one category that we can detect, I think, fairly well at this point. But a lot of the
major ML algorithms require a lot of training data. And so if you think about a recommender
system, it requires tens of thousands of examples to make a reasonable prediction. And if you are looking for unexpected patterns or patterns that are
only seen a handful of times, you actually can't use terribly sophisticated ML algorithms.
So that's where I think heuristics have a very large role to play. And that's where a library
really is a great way to go. Because if you can just collect
the different kinds of outliers that you have seen over time, or different failure modes for
a particular model, then you can start sharing this information between different use cases,
and learn in kind of an ensemble manner. And that's what we're seeing to be fairly
effective so far so in a way you're saying that you would sort of try to
codify the the knowledge of these domain experts in order to use that to assess
the output of the model so that you're sort of magnifying their efforts without
you know replacing them exactly so the So the best in class way to monitor
your recommender models or the best in class way to monitor your log processing models.
I think that's a very unique way to scale knowledge across a lot of data scientists.
Well, I would love to spend hours talking about this. Honestly, this has been really,
really interesting. And like I said, I'm thrilled that we went
from a pretty simple operations question
of how do we deploy AI models
to start talking about the really interesting stuff,
which is sort of who watches the watchers and so on.
So before we go, maybe you can sort of sum up,
what's the learning that you've had that led to you to create Virta? And what advice do you have to enterprise folks out there who are trying to deploy ML models now? I would say it this way. If this was five years ago, training models was the hardest thing to getting AI and ML into production.
Now it's become fairly easy to train models, whether it's pre-trained models or auto ML.
The big heavy lift even now is I have a model and let me get it into production.
And we saw that a lot with the model TB system I developed at MIT.
And we saw that companies could bring models, could build really good models, but then it used
to take them months to actually use them in a product or a business process. And that's what
we're seeing as the high pole in the tent. And that's why I started Virta in order to solve that problem.
And while there are a lot of parallels to DevOps and we are really big on let's pick up what's
working and apply it to a particular problem, while there are a lot of parallels, the unique
angles that we found have been related to data scientists being a new kind of user,
this model monitoring business being very new and we're just figuring out how to do it correctly.
And also not making data scientists turn into IT professionals because IT professionals are great
at what they do, data scientists are great. So really bridging the gap.
And I think we're just getting started in this space.
And at Virta, we're super, super thrilled
to be at the forefront of this new category.
And honestly, I'm always happy
to have a conversation about this.
So I'd be more than thrilled
if someone wants to reach out at Virta AI
or Data Serial, which is my handle, to chat more.
Yeah, thank you very much.
I really appreciate that.
And honestly, I do recommend reaching out.
This has been just a wonderful conversation.
You know, I've learned so much speaking with you just for a short period of time.
Thank you.
So thank you, everyone, for listening to the Utilizing AI podcast.
If you've enjoyed this show,
please do rate, subscribe and review the show on iTunes
since that really does help our visibility.
And please share the show with your friends.
This podcast is brought to you by gestaltit.com,
your home for IT coverage from across the enterprise.
For show notes and more episodes,
go to utilizing-ai.com
or find us on Twitter at utilizing underscore AI.
Thanks, and we'll see you next time.