Orchestrate all the Things - The State of MLOps in 2021. Featuring New Relic Lead Researcher Ori Cohen and Monte Carlo Co-Founder Lior Gavish

Episode Date: September 2, 2021

MLOps is the art and science of bringing machine learning to production, and it means many things to many people. The State of MLOps is an effort to define and monitor this market Article publis...hed on ZDNet

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Orchestrate All the Things podcast. I'm George Amadiotis and we'll be connecting the dots together. MLOps is the art and science of bringing machine learning to production and it means many things to many people. The state of MLOps is an effort to define and monitor this market. I hope you will enjoy the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn and
Starting point is 00:00:25 Facebook. So, thanks again everyone for making the time to discuss today. And the occasion is the State of ML Ops, which is work that Ori has just produced a few days ago, I think a week or something like that. And we also have Lior from Monte Carlo with us, who is quite knowledgeable on the topic. And so I think it's going to be a pretty interesting discussion. But, well, my job here is basically to facilitate and get out of the way as much as possible so without further at all just here to you for introductions and just a few words about yourselves and your background so whoever wants to
Starting point is 00:01:16 go first um you know you can go first so I go okay everyone my name is Leora I'm one of the co-founders of monte carlo we're the data observability company so um helping out uh folks both uh with with uh managing data reliability as far as concerns uh their ml models and analytic dashboards and analytic data stores um before monte carlo, I used to run engineering at a company called Barracuda. It's a cybersecurity firm. I worked on machine learning to help prevent some of the more advanced fraud use cases. That's where I got a lot of my passion and excitement about MLOps in the first place. I'm very excited to be here today.
Starting point is 00:02:07 Great, thank you. And, Auri? Oh, hi. So my name is Auri Cohen. I'm from Israel. Currently working for New Relic, doing machine learning and data science. I have a background in machine learning.
Starting point is 00:02:27 I did my PhD in computer science, brain-computer interface with machine learning. And I've been doing machine learning ever since. Okay. Great, thanks. And actually that's a good segue for actually kind of kicking off the discussion. The fact I mean that you mentioned Ori, well a number of things that you mentioned. First that you've been into machine learning for quite a while. And second, that your day job, let's say, is with New Relic. And I was wondering when I realized that, where did you ever find the time for such an extensive project
Starting point is 00:03:16 as the one you just put out, which is called State of MLOps? And yeah, I was wondering if you would like to share a few words about your motivation basically and a little bit about the background of the project. Like, besides motivation, what made you choose those specific data points that you collected? How did you collect them? What's the technical infrastructure you used for the project and this kind of thing? Okay, many questions. You'll have to remind me some of them. Sure, if you miss something I'll remind you, no worries. So I usually write on Medium. I wrote a big article about monitoring and observability for data scientists maybe two
Starting point is 00:04:08 years ago based on experience in the previous company. And it was kind of a call to action for data scientists to start monitoring their models or probably everything related to their models, not just the model themselves. It's available on Towards Data Science, which is a publication on Medium. So I came back to that subject again recently, and I started to think about data science teams and developer teams and found out that there's a kind of there's a gap between both teams so something is responsible until a certain point and from that the other team so I started to think, can I write something about it, like an opinion article, another
Starting point is 00:05:09 call to action that we should not just monitor data, we should also monitor data pipelines. And kind of going that route of, it's not just data and just data pipelines separately, it's kind of like a correlation of both, which kind of sits well for the things that we can do in New Relic. But it's not related to New Relic, it's just like an afterthought. So I started writing that article and then I said, okay, so I did a market review for a lot of monitoring companies in in the previous article so let's do another one and then I started doing another one and I found out that there are more than 30 new companies in that space which is crazy it's only been two years and so I said okay so let's get some more information. Let's look at the, like, what kind of, are they data centric or data pipeline centric?
Starting point is 00:06:10 Are they doing both correlating between those things? Which personas they're catering to? Is it just data science, machine learning engineer, data science leads, business personas, executives, and try to figure out to map that whole space. And then kind of, okay, so how much money did they raise? And how many people do they have in their company? And based on how many sales they have on the LinkedIn page, can I speculate how many clients they have? So it became like a huge project. And I thought, okay, so I'm not going to keep this to myself.
Starting point is 00:06:52 It's probably best to share it. And so I built a website and I got help from a friend, Liran Hasson from Aporia, which is one of the companies. And he said, you should put it on Airtable. It will look nice. I said, okay, cool.
Starting point is 00:07:14 And let's do that. So Airtable became the product that I'm using and also the product that I'm giving to everybody who's reading that table. And it has really nice features. It's really easy to use.
Starting point is 00:07:32 I'm not selling a table, but it's a great product. So then I said, okay, let's do a small marketing campaign. We had a week off at Muralik. We were given a week off just because we worked really hard for the last year, year and a half. And I said, okay, so we can't go abroad. We can't because we have kids and we can take them with us and nobody's vaccinated abroad or something. So let's see if I can push on some big projects in that week. So this was one of the projects that I finished and I marketed it and it made a really good impact.
Starting point is 00:08:19 Everybody wants to be on the list. Everyone wants to do demos with me now, even though I've done demos with them for the last two years. Everybody wants to change something in the description, in that table, and now you missed that feature. And I said, okay, so give me, if you want to give me feedback, I'll fix it, but I will validate it first. And um yeah i hope that answers most of the questions it does it does and i have to say that well i totally sympathize especially with the last part of what you were describing since well among other things i'm also an analyst and yes it's
Starting point is 00:08:59 it's a precisely as you described so when you start an effort like this well and it gets some some traction then well obviously then everybody wants to be included and everybody wants to add this new little feature or that and everybody wants to brief you and well long story short it gets it gets uh pretty much out of control uh quite quite fast so i hope you're able to to keep up and you're not regretting uh having created the monster no no i i hope it helps people because they shouldn't sit on my computer um and it's like you can see good analysis you can i don't know if it's good but you can analyze it pretty well like the space and you can see like everybody's almost following the same persona and like mostly around data and not data pipelines so you can see where's almost following the same persona, mostly around data and not data pipelines.
Starting point is 00:09:47 So you can see where the market is going, which is pretty nice to see once you have that mapping. Okay, great. Great, thanks. So then, Leo, the company that you have co-founded, actually, Monte Carlo, is among the ones included in Ori's analysis so I was wondering if you'd like to share with us well first of all if you have like an opinion on the work that he's done and then how do you see well both yourself personally but I guess mostly speaking about the organization as a whole, how do you see the work that you do in relation to the overall, let's say, MLOps landscape? Yeah, absolutely.
Starting point is 00:10:37 First of all, I want to thank Ori for spending his vacation time building out this list. I think it's incredibly useful. I think that data ops and ML ops spaces are at the end of the day very early in their life cycle. And there's so many different solutions. The market hasn't really formed out very clearly yet. I'm sure our customers and everyone in general do feel confused about all the different capabilities
Starting point is 00:11:12 that exist in all different vendors. And so I think what Ori has done is, it's critical in helping create clarity around what different solutions do, what they don't do, and how the people that are building data products and data pipelines should consider the different pieces of their stack. So I want to commend Ori for actually doing that
Starting point is 00:11:39 and publishing it and circulating it. As far as Monte Carlo goes, we call ourselves the data observability company. And when we say data observability, we're kind of borrowing a term that was originally used in the context of New Relic, right, of Ori's employer. And it's the idea that if you're building a software product, right, there's a set of methodologies and tools, New Relic being one of them, that you adopt in order to make sure you're delivering quality and reliability in your software products, right? It's a whole practice called DevOps or cyber liability engineering. And, and over the years, people have have developed those mythologies and those tools. Monte Carlo was
Starting point is 00:12:31 founded to help people that specifically build data products. And as we know, data products are, are composed of software, but also of data. And, And so it poses quite a different challenge from what some of the software observability vendors have created over the years. And Monte Carlo was founded to address just that. And the way to think about Monte Carlo is kind of like a new relic for your data pipelines and data products. So we're trying to connect to as much of the infrastructure as possible, the infrastructure that our customers use, and create end-to-end visibility, really,
Starting point is 00:13:14 by collecting metadata and logs and metrics, creating end-to-end visibility around the health of the system, the health of the data that's flowing through the system. And so we help our customers, you know, monitor the data that goes into training or, or, or inference of their machine learning model. We help customers track the health of their, of the dashboards that their companies rely on to, to make decisions.
Starting point is 00:13:41 We help customers track the data that it goes into, you know, product features that they're building, right? Modern products are heavily used data in dashboards and analytics stuff and automating marketing and services and operations. And so in order to guarantee, you know, the reliability and the SLAs of these things, Monte Carlo is kind of the, by now the established way to do that. And
Starting point is 00:14:14 and we're excited to work with some of the best data teams in the world. And yeah, does that does that answer your question, George? I guess yes. To be honest with question, George? Yes, yes. To be honest with you, as you probably know, recently I had a kind of related, let's say, conversation with some people from another company in this space called Soda. And I guess the core of our conversation was precisely around terminology
Starting point is 00:14:43 and what does each term precisely mean and how it's defined and so on. So there's many terms flying around. And after that conversation, at least I felt a little bit better about myself in the fact that I'm not the only one who's not entirely comfortable with all of them. It seems that, you know, even people like them or like yourselves who are very, very much into the core of it sometimes have a hard time clearly differentiating one from the other. So with that in mind, I guess it's a good time in the conversation
Starting point is 00:15:20 and a good idea to actually try to define a little bit what people mean when they say MLOps actually. I mean, in broad strokes, the way I understand it at least, it's, I guess, what you referred to earlier, earlier. So basically, the idea of, let's say, applying the DevOps notion in machine learning space, which is a bit more complicated than your good old DevOps in a way because it's not just code that you have to keep track of it's also data sets and models and just adding all of those together gets pretty
Starting point is 00:15:54 complex quite quite soon I guess so whoever wants again whoever wants to go first that yeah trying to define let's, what MLOps means for you personally and what do you think falls under this umbrella? Ray, do you want to go? You're the writer of the State of MLOps. Yeah, sure. So do you guys know that figure from Google's, like, there's a little square in the middle, and which is ML and everything related to that space around that small box of machine learning. infrastructure systems, experiment management, which was like the previous state of MLOps two, three years ago. A lot of companies that came about started to do experiment management.
Starting point is 00:17:22 And of course now it also means monitoring and observability. So for data, for data pipelines, etc. So all of that for me is MLOps. I hope you agree. Well it's not about agreeing or disagreeing actually, no. I was just curious just how does each of you define it because something tells me that if you ask let's say there's like a joke if you ask well two doctors you you probably get four opinions and I'm wondering if the same goes for for machine learning practitioners as far as I know maybe the Maybe if you all did this PhD, you can ask the two doctors here and we can give you an answer. I agree with Ori 100%.
Starting point is 00:18:11 The way I kind of articulated to myself, it includes everything that Ori mentioned. mentioned um mlops for me is broadly the practices and the tools uh that uh help you deliver machine learning uh with certain constraints that you're interested in so for example um you know we all want to ship faster right we want to build faster we want to increase our philosophy uh we all want reliability in our products right We want to build faster. We want to increase our philosophy. We all want reliability in our products, right? We want to release something and have high confidence that it's going to work. You know, we want to manage SLAs, right? We want to make sure it's going to meet certain requirements in runtime. We want to make sure it's secure, right? Like we might want to address certain compliance requirements that we have, right? So MLOps for me is the set of tools and practices
Starting point is 00:19:13 that help you accomplish those goals as a company. And every company might have different goals depending on its own kind of priorities and use case, but it's this stack that really helps you do that effectively and quickly and securely and compliantly and reliably. So yeah, that's how I view it, if that's of any help. That's a great answer, by the way. Great. Well, Great. Thanks both. And so then again, I guess having kind of defined, let's say, at least in your views, what MLOps applies to, let's very, very interesting, the work that you've done, Ori, in trying to distill, let's say, the different facets that apply to solutions in this space. So you mentioned briefly earlier in your introduction that you identified different aspects that sort of characterize solutions in this space and you used well kind of classic let's say tabular format for that and you used you
Starting point is 00:20:33 mentioned you also used Airtable to serve that basically on your website and going through that well first of all I have to agree with you that it's you know it's pretty easy to use and therefore quite approachable, I would say, by everyone. And so I tried to identify the facets that, well, I found the more important, at least. You have collected many different data points for each solution that you included. Some of them are, well, kind of trivial, let's say. Some of them I think are more important in the sense that I feel that they sort of define the category.
Starting point is 00:21:15 So I would like to basically go into a little bit more depth into those. So one of those to start with was product focus. And you have identified some different values for that. Would you like to elaborate on your rationale for doing that? Yes, of course. So product focus, as I already mentioned, but it's, is the company more centric around or more focused around data or data pipelines or data and data pipelines separately or both? So some companies will just monitor and observe data. So inputs, model internals, outputs, things like drift, loss, precision recall, accuracy, drift for data, drift for labels, etc. So basically looking at the world of data around models. And other companies will do a kind of a similar
Starting point is 00:22:30 but different thing around data pipelines. So for example, you could monitor queries or data frames or steps within bugs, direct or cyclic graphs. So it's a different focus. And there are only a few companies that are actually doing both. Some of them are doing it like one next to the other. And some of them are, I think, are trying to kind of correlate between events. So if you have a problem with your data, it could mean that maybe a machine or some host or service has died, or the CPU is at 100%.
Starting point is 00:23:15 And if you correlate that, you will allow the developer, the devops, whatever the persona that you're catering to, to find the problem quicker. And that's actually kind of what we're trying to do in New Relic right now with the new MLOs. Okay, then the next one that kind of piqued my interest was the personas facets. So you identified a few different personas that the solutions addressed. Yeah, so when you talk about personas, if you're a data centric company,
Starting point is 00:23:58 you are probably catering to data scientists, data science leads, maybe machine learning engineers, could be data analysis, data analysts, sorry. And if you're a data pipeline company, it could be data ops, data engineering, but it could be a mix. So it's, there's kind of a stereotype for data centric and for data pipeline centric, but you can see a mix of all those personas. And there's also maybe a business executive or someone which, so imagine that there's a data scientist and he's working on a GPU machine and the GPU machine costs a lot of money. So maybe there's a dashboard somewhere for some data executive cares about using all that use case and looking at how many spent they have on GPU machines or how many how does the business KPI is impacted by
Starting point is 00:25:17 models that are not that are broken going down, having issues, etc. So data observability is not necessarily just for the tech people, it's also for the executives. And I also have an article about that, which I wrote with our previous PM, Dol. Great. And yeah, I think that's a very good point, actually. And it's interesting to see how its solution deals with it and which of these personas they're able to address and in what way. Another interesting aspect is data types. So I think that's more straightforward, probably, what kind of data you you're able to monitor, right? Yeah, so I think most companies start with tabular data, because it's the easiest use case that to do. Some companies
Starting point is 00:26:19 are now there's quite a few actually, but they're going into images, so computer vision and audio and it started spreading towards that area because in some ways, tabular data is kind of mostly solved. You get that 80% so they can move on to those other things and kind of, at the moment, differentiate themselves from the other companies in that space. And yeah, that's actually another good point. I was recently also doing a little bit of analysis work and related but different area in data labeling tools. And yeah, that was, again, again in this area a very important aspect so what kind of data its tool is
Starting point is 00:27:11 able to cater to and let's say last but not least out of the what picked my interest at least as the most important set of facets is the features. So would you like to say a few words about that as well? Um, yes, of course. So let me just look at the features because there are many. So let's, let's just take the data centric companies, they are mostly so the features that they have is mostly around drift. So it could be data drift or concept drift for labels, data quality, data integrity, which
Starting point is 00:27:54 could be the same but kind of different. It depends, you need to look at the company's website and ask them like exactly what they are doing behind the scenes. Um, monitoring bias and fairness, which is, uh, probably getting more important now with the, um, EU regulations that were just released like a few months ago, um, obviously anomaly detection, segmentation, tracking, and explainability in general. So those are the basics and probably the most important thing to start right out of the box and to help you catch issues and companies are building on that and expanding and
Starting point is 00:28:36 updating and upgrading their offering. Okay great Thanks for the mini analysis, very condensed and to the point. So, Lior, I was also wondering, you may have your internal analysis and you definitely have your product management team and your spr sprints and features you're pursuing and also you're probably also keeping an eye on the competition so in that lens of someone who's you know who has a different point of view let's say on the kind of analysis that the Torii did does that resonate with you does it make sense is it a sensible lens to use to look at this landscape? Would you have something to comment or add or subtract to the kind of analysis that you did? And I agree with pretty much everything Ori mentioned here. I think the persona element of it is critical.
Starting point is 00:29:54 Again, there's a lot of confusion in industry, but the ML engineer, the analysts, the data scientists, and the data engineer, the analysts, the data scientists, and the data engineer, they all have a very different set of concerns and products that meet those concerns and needs. And I think it's really critical to understand that. You know, just looking at kind of my neck of the woods if you look at observability right giving observability into machine learning models that are running in production is very very different from providing observability on data pipelines on the data pipelines that are feeding those models right and and and and while we both use the word
Starting point is 00:30:43 observability we're actually servicing different personas, right? Like, you know, something like Monte Carlo would speak more to the data engineers and analysts, whereas kind of AI observability would speak more to machine learning engineers that are accountable for specifically for models running in production. So I think that's a kind of very, very important distinction. And the feature set, you know, I agree with Ori. There's a good amount of overlap there, but there's also differences. There's also differences in the stack that you'd work with, right?
Starting point is 00:31:21 As, you know, as a pipeline observability company, if you will're focusing a lot on the you know on the data lake on the data warehouse on the analytic dashboards that are they're consuming from there whereas an AI observability you might you might focus more on on the stack that people used to deploy to train and then deploy machine learning models and the frameworks and libraries that are used in that context. So I agree with Ori's analysis and I think it adds a lot of clarity
Starting point is 00:31:59 to what folks that are building stuff should be thinking about. Okay, great. We still have like a few minutes left. So let's wrap it up with future plans, I guess. And I'll follow up initially with you, Leo, since you were the last to speak. And I'm going to ask you, having sort of defined, let's say, the space in general
Starting point is 00:32:24 and some very important facets that you can use to understand and classify the solutions in that space. Where do you see Monte Carlo going forward, basically? So do you plan on adding more features or more personas or how are you going to grow your offering? Yeah, I think Monte Carlo, you know, our objectives, if you will, are to, you know, to help mostly data engineers, data analysts, data scientists know about data health issues as soon as they happen.
Starting point is 00:33:04 So basically reduce the time to detection. We want to help them resolve problems quicker. So as soon as you know that you have an issue in your data infrastructure, how quickly can you get to resolution and fix it? And then we also want to help folks prevent those things from happening in the first place. And that's actually possible and something that we're very excited about. And we're building across the stack to address all of those. So we're building new capabilities
Starting point is 00:33:36 to help detect things earlier, right? So we're tackling more and more data health issues and we're accelerating time to detection. So I think over the last two years, we've gotten from what would have been weeks or months, we shortened it to hours. And then I think going forward, we'll get there closer to the minutes mark, right? I think on the resolution side,
Starting point is 00:34:03 there's so much we can do to help. You know, data issues are complex in the sense that they can, like Ori said, it can come from an operational issue in the infrastructure, but it could also come from a change or a drift in the data. It can come from a code change that had unintended consequences. And so kind of putting a full picture of that together and helping teams really get a full sense of what's going on and how to fix issues is another area where we're investing a lot and making a lot of progress. And again, time to resolution can, you know, some teams will measure it in weeks or months right now.
Starting point is 00:35:08 And we're shooting to get to minutes and operated. That's something we're very excited about as well and are putting a lot of investment there to help address that. And of course, you know, we're trying we're trying to create, to do all of that end to end, right? So we're constantly adding new integrations and new capabilities. We've, we've, we've focused on historically we've built data warehouse and BI integrations. Now this year, we've been working a lot on data lake integrations and there's a lot of
Starting point is 00:35:41 exciting stuff around orchestration and streaming and those kinds of things that we're going to release later this year and, and, and then going into 2022. Great. Thanks. I guess for you, Ori, things are, if not simpler, at least a bit different. So you're, you don't have a product to, to cater to, but you still have your pet project let's say and now that you've made the big push using that extra week and then you managed to get it out how do you plan to keep it up to date and what's your ambition with that? So the two things also there
Starting point is 00:36:23 is the point that companies don't change so often. So maybe the next update will be in six months or maybe a year. Um, but there are also two forms inside the webpage that companies can, uh, submit their details or ask for changes. So if companies work with me, this could be a lot easier. And there is a new list for what I'm hoping to be more an exhaustive MLOps companies, not just MLOps monitoring. So if companies want to want to submit their details, feel free. I keep in mind that I try to validate
Starting point is 00:37:10 everything that's that's for me personally again having having been where you are that's the hardest part the most time-consuming and yeah you know everyone who's who's ever done that really appreciates the kind of work that you put into it so well thanks thanks for for doing that I think you probably hear that a lot because I think it's really useful work that helps many people and thanks for for yeah for making the time for the conversation. Thank you both.
Starting point is 00:37:49 And I hope it's been a nice experience and people will find it interesting. I hope you enjoyed the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn and Facebook.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.