Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 09: Deploying AI Models in the Enterprise with @DataCereal

Starting point is 00:00:00 Welcome to Utilizing AI, the podcast about enterprise applications for machine learning, deep learning, and other artificial intelligence topics. Each episode brings experts in enterprise infrastructure together to discuss applications of AI in today's data center. Today we're discussing packaging and deployment of models in the enterprise. First, let's meet our guest. Hey there. My name is Manasi Virta.

Starting point is 00:00:32 Thanks so much for having me. I am the founder and CEO of Virta. We are an enterprise infrastructure company helping teams ship ML models in a way that we're accustomed to shipping, monitoring, and operating regular software. So super excited to be on this podcast. And I'm sure we'll talk a lot more. But you can find Virta at virta.ai on the web and on Twitter as at the rate Virta.ai. Great. Thanks a lot. And I'm Stephen Foskett, organizer of Tech Field Day and publisher of Gestalt IT. You can find me on Twitter at S. Foskett. Now, I've been working in enterprise IT for a long time, long enough to see the emergence of cloud and the emergence of AI.

Starting point is 00:01:23 And that's exactly why I was excited enough about AI to start this AI podcast, because I see so much promise in this. And I see so many companies starting to think about how are we going to utilize AI? How are we going to bring models into our space? And how are we going to deploy them? Now, Virta is doing some interesting things that are analogous to sort of enterprise package management, application performance management, and operations tasks, except focused on AI models. Maybe you could start, Manasi, by just telling us a little bit more about sort of what the product is. Absolutely. So a quick high-level view on the model lifecycle and then going into what we do. So when you're building an ML model and you want to get it into production, there are several steps involved. First, you get your data together, you munch it, you get it into the right format,

Starting point is 00:02:19 then you train the model. That's where your TensorFlow, Pyflow pytorch and all of those great libraries come in and once you have a model there is the there's a host of steps that need to happen before you can run it in production so this is very similar to what we used to do with software we want to validate the model results so testing we want to write scaffolding code that's going to transform the model into something that can be consumed by a host of other services. We want to optimize the model, build pipelines to get the data to the model, instrument them to monitor how they're performing, and also assign engineering headcount to make sure that the model is actually working as expected. So what we do at Virta is we take a trained model and we remove all of the heavy lift involved in packaging, deployment, operations,

Starting point is 00:03:12 and monitoring so that these models can be shipped to Virta and then Virta takes care of running them in the Virta inference engine and making sure they're working as expected. Yeah. And that's the interesting thing because indeed, as you said, you know, it used to be that companies wrote their own software and then eventually they kind of moved to what I guess was sort of called the shrink wrap software model. But even there, companies, you know, it wasn't like really ready to go, even if, you know, you bought a product, companies, you know, it wasn't like really ready to go, even if, you know, you bought a product for deployment, you know, you have to do a lot of tuning. But also, you had to think about packaging, you had to think about deploying and infrastructure, and then you had to think about monitoring and management and so on. It really sounds like

Starting point is 00:03:58 exactly the same thing, except for AI. That's such a great observation. And that's why this space is getting termed as MLOps, very similar to how we saw DevOps. And the goal with DevOps is really ship products faster, more reliably. We want to do that with ML. What makes MLOps different from DevOps is a few things. One is the users of the system are data scientists who are actually very different from software engineers. These

Starting point is 00:04:32 folks, and I used to be one, I used to write code, write models for Twitter actually, so the data scientists we're talking about are really statisticians. They're great at math. They're not necessarily thinking about, how do I scale a system? Or how do I make it reliable? And so while dev and ops were still fairly, I would say, similar, data scientists have a very different skill set. And that requires a different view

Starting point is 00:04:59 on how the product is designed. And our big thesis with Virta is let data scientists be data scientists. We don't need them to know Kubernetes or to know Docker. They're great at building models, let them do that, and let the software take care of the plumbing. So that's one. A couple others that are unique are there's a ton of heterogeneity in data science workflows as well as frameworks so this could be you can use python or scala to build your models you might want to run them in a batch processing fashion or a real-time fashion you might need to run it in a serverless way containerized way so there's a lot of heterogeneity, which is great

Starting point is 00:05:45 for innovation, not so great for production that you need to account for. And finally, one thing that I think software teams also face is there are silos of different data science teams and bringing all of the models under one roof so that they can be governed by the same IT processes and use a basic infrastructure is another unique challenge we see with MLOps. Yeah, it really does sound like my experience when I was working in large enterprise IT organizations, because you would have multiple development teams, you know, you would have, you know, maybe lines of business with their own, you know, IT developers. And everybody wanted everything to be able to run in enterprise IT operations. But now, as you say, we've got the added wrinkle of sort of these new, you know,

Starting point is 00:06:36 next generation infrastructure. So many companies are deploying their own, you know, on premises cloud or their own, you know, infrastructure to their own infrastructure to support next generation applications. And many are also using cloud service providers as well for these, which this is something, frankly, that the conventional developers are still coming to terms with. How do we package and deliver products in this sort of hybrid cloud mode. And AI is born into that. I mean, ML models, they exist in this space.

Starting point is 00:07:18 And so that seems to me like a really big focus for you as well, and for the people deploying these models? 100%. With AI and ML, libraries that are getting used to build these models get updated so frequently. It's every few months. And if you are an IT organization, you want to keep very close eye on what libraries are being used and they need to be patched they need to be secure so a lot of use cases we see with deploying ml or ai models and enterprises really have to do with are the libraries that are being used for these models are they the ones that are vetted by it or do we need them to apply a patch and how do we make that seamless for

Starting point is 00:08:09 data scientists who might not even be aware of the fact that, hey, there was a security vulnerability with this particular library you're using? Yeah. And it's just like deja vu talking about these things. I mean, frankly, I think that you could know, you could basically read, you know, almost any book from sort of the emergence of DevOps and just substitute and replace ML Ops, and a lot of the same issues would come up. Yeah. Although I would say that I think the monitoring piece is turning out to be different for deploying ML models. And that really has to do with the data component, which I think is new with models because regular software didn't have to do as much with,

Starting point is 00:08:55 is the data changing and how do we respond to it? Absolutely. And so let's talk a little bit more about that because I think that that's one of the things that's going to be surprising. So a lot of my, like I said, my enterprise audience might be listening and saying, oh, okay, I got this. It's the same as it used to be, except this is some new kind of application. Maybe it needs some new hardware, whatever. But it's not like that, right? I mean, because there's, you know, you have to think about the data as well as the model. And I think that a lot of people aren't familiar with that. So maybe, I don't know, you know, summarize to caveman IT guy, what does an ML application look like in this context? What does it contain? That's a great question. Okay. So if we think

Starting point is 00:09:40 about an ML application, I would say it has three odd components. The first one is the model itself that you can think of as a set of numbers and basic function that's taking in a set of numbers, applying a function to it, and then producing a set of numbers. So that's our model. Then there is the scaffolding code around it, which might be a web server that was serving the model or it could be a pip install of a library and then the third really important component is the data and when we train the model we've actually fit the model to the particular nature of our training data so this might be in the simplest case suppose that your model was a

Starting point is 00:10:26 linear model like AX plus BY plus C. Suppose that's a model that you fit that model to your data. However, once you start seeing the real data, you realize that the relationship is not linear. It might be a quadratic relationship or something much more funky. In that case, the relationship is not linear. It might be a quadratic relationship or something much more funky. In that case, that model is not a good description of the data anymore. And so as your underlying data changes, the efficacy of a model or how well the model is going to predict the future also changes. So in addition to measuring regular things like CPU usage and throughput and latency, for an ML model you need to pay really close attention to, does my data have more zeros than it did before? Does it have more null values? Or is the

Starting point is 00:11:22 distribution of a particular feature or a column that's being fed into this model, is it the same over time or has it changed, which might indicate that I need to retrain the model or even remove a model from production because it's not going to work for this new kind of data. Yeah, I think that's one of those kind of, I don't know, spooky things in a way, because, you know, I mean, although these things are, it wouldn't be fair to say that they're not deterministic. They are, you know, they can vary quite a lot based on the inputs you get. We've been talking about that the last few weeks, actually, in terms of understanding

Starting point is 00:12:03 that, you know, it's only as good as you feed it into. And basically, if you create a model that's intending to, you know, churn through this haystack to find a needle, and then you give it a different kind of haystack to find a different kind of needle, not only might it not find the needle, it might find something else. And, you know, we really have to be very, very careful about how we're using these things. In a way, there's a political issue, sort of, I don't mean politics, you know, capital P, but I just sort of mean like an internal business political issue here too. And that's that, you know, IT has to understand how to use these things and has to basically open this dialogue between the lines of business and the ML folks

Starting point is 00:12:43 in order to have the conversation about, is this the proper model and are we using it in the right way? 100%, oh, so many great points there. I completely agree. One of the really exciting things about building an ML infrasystem in general has to do with the stakeholders that you mentioned.

Starting point is 00:13:06 So it is IT that is really running these models and making sure they're working as expected, but it is the data scientists and the lines of business that are the domain experts. And so as we talk about model monitoring, it's actually the data scientists who are in the best position to make sense of whether a model needs to get retrained or it needs to be taken out of commission. And they need to do that in association with the domain experts, the lines of business owners. So that's where we've seen a big need from the partners that we work with to have different

Starting point is 00:13:43 views of the system, something that's really tailored towards data scientists that helps them get a pulse on whether their models are behaving as expected. Our IT folks really care about the SLAs and whether they're getting met and the environment that we talked about, how the libraries change and the lines of business owners really care about, all right, how is this affecting my key KPIs? And is this model still valid? And so we're seeing that more and more,

Starting point is 00:14:09 like for instance, with COVID, some of the key sort of credit scoring features, maybe it's your FICO score or something, are not as relevant anymore because the sort of repayment patterns of credit cards have changed a lot. And so the features that we used to trust in these models are no longer valid. And so they have had to be updated very, very fast. And without a monitoring system, it would have even been hard to figure out

Starting point is 00:14:40 that the reason why the credit models weren't doing so well was because of, say, a FICO score or some proxy for that. Yeah, man, that, you know, it's funny because, you know, the last episode we were speaking about something very similar to this and sort of like the social justice implications almost of many of these things. And then you, you know, I expect to be talking about IT operations and, you know, here we are talking about the pandemic and how, you know, how things can change. I mean, it's, it really is pretty remarkable what comes out of these systems and how, how this affects more than just the business, more than just, you know, the, the, the, is this thing working or is it not working? Is it performing well, or is it performing well? But actually it affects, you know, real people doing real things. I wonder what, you know,

Starting point is 00:15:31 what kinds of applications like this, I mean, you mentioned, you know, credit scores, what kind of applications do you think people are putting these things to right now? Oh, it's across the board and that's what makes it so interesting, both from an IT perspective, just in terms of what are going to be different ways in which the model gets deployed and the end applications. Like the most common ones are still recommender systems, search systems and spam, fraud, things like that. More and more, we see things that are,

Starting point is 00:16:07 you're doing NLP on comments, you're doing NLP on text. And if folks have been following the GPT-3 conversations online, these models are so dependent on the kind of data that they were trained on. And so bias creeps in very, very quickly without the model developer, or even more importantly, the consumer realizing. And that's where models are unique in that they were built to be used in this particular environment. And then we just pick them up and throw them into a different environment. So it's like an analogy might be an ecosystem. You pick up a organism from one ecosystem and then you put it into a very different ecosystem and you expect it to work and it's not going to, or it's going to have disastrous consequences. And so it's, we're just coming to terms with how do we put in the

Starting point is 00:16:58 processes, particularly governance processes to make sure these models are used in a responsible way. Yeah, absolutely. And I'm glad that you mentioned GPT-3 because that's, again, one of the things we were talking about last week. And, you know, it is shocking how good it is and how seductive it is. You know, you look at these things and you say, wow, this thing is just like a person. It is not a person. It is not just like a person. And, you know, I mean, stepping back into the world of IT, even something as seemingly straightforward, as you mentioned, as like a credit scoring algorithm, you know, it may have no idea how to deal with data that isn't normal, that isn't the sort of thing that it would be expected. Now, how do we deal with that though, I guess from a software functional perspective?

Starting point is 00:17:53 Is that the sort of thing that you're trying to attack? Absolutely. That's a great question. So I think with DevOps, and I'm going to draw heavily on that, it's people, processes, and tools or technology. So what we can do with a system like Virta is that we can provide the tools that can help the operators or data scientists make the right decisions. So we're the people who can tell you, hey, your model used to be producing, say, loan approvals that were in the 70th that was

Starting point is 00:18:26 70% yeses most so 70% of the predictions were yes and now we're seeing that the number of predictions that were yeses have gone to 30% and so we can provide statistical warnings and alarms around that. And we have this unique position where we can see the best practices on how to monitor different kinds of models and make them available to a large set of customers. And that's the advantage of being a vendor, because you can see how do you monitor an NLP model versus how do you monitor a recommender systems model, and then out of the box provide those types of libraries. The domain expertise is still going

Starting point is 00:19:16 to reside with the people who know the data the best. Or if you're an economist, you're going to know a ton more about the FICO scores and how to adjust for models using FICO scores and so what we can do for an economist is to tell them hey this model is behaving in a way that's unexpected here is the distribution here is maybe even what we think are the reasons, really pointing to the features that might be involved or combinations of features. It might be that this model has stopped behaving as expected for people with FICO scores in this bracket and maybe in this age range. a lot of the analytic heavy lifting, but leave the ultimate interpretation and the decision on mitigation up to the domain expert.

Starting point is 00:20:10 I think that's something that we feel strongly about is a system shouldn't be making those automated decisions. We just don't have enough of the domain knowledge quite yet. Yeah, I would think so because I mean, in a way, the answers are only gonna be as good as the questions and the questions are only as good as understanding the domain. And so I guess that would be my question for you then is, how do we know what questions to ask? How do we know if we should see if our credit approval system is failing people from certain

Starting point is 00:20:39 zip codes or certain ages or something? Is there, I guess, in best practices, it's not the right word, but like, how do we know what questions to ask in order to profile the performance of these systems? Oh, I love that. And I love that for a very particular reason. So my master's thesis was actually on this thing called automated data visualization. So you ask a question and it's going to find you the graphs that answer the question for you. And what we realized very, very quickly there was that the realm of questions you can answer is combinatorial. So you can look at whether it's column one and column two together that are special or unique is a column one through 10 that are really causing this challenge.

Starting point is 00:21:28 And you run into a lot of very mathematical issues that have to do with multiple hypothesis testing. I'm getting too much into theory here. But like the more tests you do, you're more likely to find spurious correlations. So it actually becomes a fairly challenging problem where the best way that we found was to do a limited number of tests and go by heuristics that we had developed over time on what were the most obvious or most frequent reasons for a model to fail or a visualization to produce a particular result and then use that or provide that to the domain expert in order to inform their exploration so you can provide your domain expert a set of let's call it 20 x explanations that are fairly high level they can pick what explanation seems more aligned with their intuition, and then you

Starting point is 00:22:25 can go down further into that, let's call it branch of the tree, and keep exploring. But I think that that's a question that needs to be, it has serious statistical implications downstream, and so automating it completely comes with a host of challenges. In a way, it seems that if models, one of my feelings is that ML is really good at finding needles in haystacks. I know I said that before. ML is good at finding things that are outliers. That's why it's so good with log management and security, for example, because you don't have to tell it what's unusual.

Starting point is 00:23:06 Because you can just say, find me the unusual stuff, and it starts to try to find things that don't match its preconceived notions. Now, that doesn't mean that it's going to not have a false negative rate, and it doesn't mean even that the false positives are going to be useful, but at least it's good at finding outliers. Are you saying that you are basically sort of meta using ML here? Are you using it to assess itself, to assess the output? Basically, are you an ML user? Oh, using ML to probe ML. Let's have the robots write the robots. Exactly. I think it's a mixture of heuristics plus doing a bit of meta-learning, as you were just saying. And the funny thing, like outliers is one category that we can detect, I think, fairly well at this point. But a lot of the major ML algorithms require a lot of training data. And so if you think about a recommender

Starting point is 00:24:14 system, it requires tens of thousands of examples to make a reasonable prediction. And if you are looking for unexpected patterns or patterns that are only seen a handful of times, you actually can't use terribly sophisticated ML algorithms. So that's where I think heuristics have a very large role to play. And that's where a library really is a great way to go. Because if you can just collect the different kinds of outliers that you have seen over time, or different failure modes for a particular model, then you can start sharing this information between different use cases, and learn in kind of an ensemble manner. And that's what we're seeing to be fairly effective so far so in a way you're saying that you would sort of try to

Starting point is 00:25:10 codify the the knowledge of these domain experts in order to use that to assess the output of the model so that you're sort of magnifying their efforts without you know replacing them exactly so the So the best in class way to monitor your recommender models or the best in class way to monitor your log processing models. I think that's a very unique way to scale knowledge across a lot of data scientists. Well, I would love to spend hours talking about this. Honestly, this has been really, really interesting. And like I said, I'm thrilled that we went from a pretty simple operations question

Starting point is 00:25:51 of how do we deploy AI models to start talking about the really interesting stuff, which is sort of who watches the watchers and so on. So before we go, maybe you can sort of sum up, what's the learning that you've had that led to you to create Virta? And what advice do you have to enterprise folks out there who are trying to deploy ML models now? I would say it this way. If this was five years ago, training models was the hardest thing to getting AI and ML into production. Now it's become fairly easy to train models, whether it's pre-trained models or auto ML. The big heavy lift even now is I have a model and let me get it into production. And we saw that a lot with the model TB system I developed at MIT.

Starting point is 00:26:47 And we saw that companies could bring models, could build really good models, but then it used to take them months to actually use them in a product or a business process. And that's what we're seeing as the high pole in the tent. And that's why I started Virta in order to solve that problem. And while there are a lot of parallels to DevOps and we are really big on let's pick up what's working and apply it to a particular problem, while there are a lot of parallels, the unique angles that we found have been related to data scientists being a new kind of user, this model monitoring business being very new and we're just figuring out how to do it correctly. And also not making data scientists turn into IT professionals because IT professionals are great

Starting point is 00:27:41 at what they do, data scientists are great. So really bridging the gap. And I think we're just getting started in this space. And at Virta, we're super, super thrilled to be at the forefront of this new category. And honestly, I'm always happy to have a conversation about this. So I'd be more than thrilled if someone wants to reach out at Virta AI

Starting point is 00:28:04 or Data Serial, which is my handle, to chat more. Yeah, thank you very much. I really appreciate that. And honestly, I do recommend reaching out. This has been just a wonderful conversation. You know, I've learned so much speaking with you just for a short period of time. Thank you. So thank you, everyone, for listening to the Utilizing AI podcast.

Starting point is 00:28:25 If you've enjoyed this show, please do rate, subscribe and review the show on iTunes since that really does help our visibility. And please share the show with your friends. This podcast is brought to you by gestaltit.com, your home for IT coverage from across the enterprise. For show notes and more episodes, go to utilizing-ai.com

Starting point is 00:28:44 or find us on Twitter at utilizing underscore AI. Thanks, and we'll see you next time.

CODACE Plant Stand

Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 09: Deploying AI Models in the Enterprise with @DataCereal

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 09: Deploying AI Models in the Enterprise with @DataCereal

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.