Orchestrate all the Things - MLGUI: Building user interfaces for machine learning applications. Featuring KPMG Germany Senior Data Engineer Philip Vollet

Episode Date: July 19, 2021

Machine learning is eating the world, and spilling over to established disciplines in software, too. After MLOps, is the world ready to welcome MLGUI? Philip Vollet is somewhat of a celebrity, a...ll things considered. Miley Cyrus or Lebron James he is not, at least not yet, but if data science lives up to the hype, who knows. As the senior data engineer with KPMG Germany, Vollet leads a small team of machine learning and data engineers building the integration layer for internal company data, with access standardization for internal and external stakeholders. Outside of KPMG, Vollet has built a tool chain to find, process, and share content on data science, machine learning, natural language processing, and open source using exactly those technologies, which makes for a case of meta, if nothing else. There is a flood of social media influencers sharing perspectives on data science and machine learning. While most influencers direct their attention solely toward issues of model building and infrastructure scaling, Vollet also looks at the user view, or frameworks for building user interfaces for applications utilizing machine learning. We were intrigued to discuss with him how building these user interfaces is necessary to unlock AI's true potential. Article published on VentureBeat. Photo by Kelly Sikkema on Unsplash

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Orchestrate All the Things podcast. I'm George Amadiotis and we'll be connecting the dots together. Machine learning is it in the world and spilling over to established disciplines in software too. After MLOps, is the world ready to welcome MLGUI? Philip Follett is somewhat of a celebrity, all things considered. Miley Cyrus or LeBron James, he is not, at least not yet. But if data science lives up to the hype, who knows? As the senior data engineer with KPMG Germany, Paulette leads a small team of machine learning and data engineers, building the
Starting point is 00:00:36 integration layer for Intel and company data, with access standardization for internal and external stakeholders. Outside of KPMG, Paulette has built a toolchain to find, process and share content on data science, machine learning, natural language processing and open source using exactly these technologies, which makes for a case of meta, if nothing else. There is a flood of social media influencers sharing perspectives on data science and machine learning. Paulette actually knows what he's talking about. While most influencers direct their attention slowly towards issues of model building and infrastructure scaling, Paulette also looks at the user view or frameworks for building user interfaces
Starting point is 00:01:17 for applications utilizing machine learning. We were intrigued to discuss with him how building these user interfaces is necessary to unlock AI's true potential. I hope you will enjoy the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn and Facebook. So I have a long, long track of history in regarding to technology or IT projects. So I have done consulting in this area. I have every time specialized on data or databases. So if it's graph databases or SQL databases
Starting point is 00:01:53 and have my own company in this area and do some freelancing works, then starting at KPMG as freelancer and then getting a full-time job there. And at KPMG, we are an internal unit, so we are crunching data of the internal KPMG Germany data. So doing the internal reporting for our management and also building some data and machine learning pipelines to analyze the internal data.
Starting point is 00:02:19 And we are responsible for parts of the internal KPMG data. So if there are projects which are one to access the data, we are between, or we are the layer between it so that they are able to access data, but we are looking into the data streams and knowing explicitly what's going on and what's going in. Yeah, I'm mostly interested in data, also in graph databases. So on LinkedIn, I'm advocating open source and open science projects. And that's where my heart belongs.
Starting point is 00:02:57 So advocating open source projects and spreading the love and also the word of open source. Yeah, and today we want to discuss some front end or how to produce a front end for a machine learning project and in general how it is to develop a machine learning project. Thank you for the introduction and yeah before we actually get to the front end part, let's start by mentioning briefly, because, well, it may be obvious to you and me, but not to everyone, how applications that are powered by machine learning are different than traditional applications,
Starting point is 00:03:41 because I think that will set the scene for the UI part. For sure. That depends on your point of view. Well, most of the time for me, there is not really a difference. The reason here is I have the same steps for a regular development project or software development project,
Starting point is 00:04:00 which I also have for a machine learning project. So there are some special aspects I will talk later on. But first of all, for me, it's the same. I have to do the people allocation. So seeing which engineers are on the project, I have to do the budget parts. I have to bring it into our Azure DevOps environment. Microsoft changed the name before it was
Starting point is 00:04:22 Team Foundation Server. Now it's Microsoft Azure DevOps. And we are doing our project there. We do our scrum planning there or scrum sprints. And for me, it's bringing all the PBIs into Azure DevOps and plan the sprints and talk to the shareholder of the company, to the stakeholders of the company, which is responsible for which part of the project and then having the project in the
Starting point is 00:04:53 pipe and do the planning. What's special for machine learning projects, the special part here is we have to keep track of the of the data and we have also keep track of the models because and model drift is real so it's possible that today our model is fitting perfect into our needs but in six months we have to to evaluate or look into the model if it's also hitting what we are one that is hit there and there are also tools for for do the conscious integration parts because only bringing the project from death to staging and then into production is half of the work
Starting point is 00:05:36 so the real work begins if you're hitting production and having it as continuous integration project running so for for us, it's developing the project and then handing it over to operations because we have special operation units which are keeping track or care of the final projects if they are running. So we have an operations part there. And it's also, we try to keep it as DevOps.
Starting point is 00:06:02 So it's also, it's in production, but it's not fixed there. We have to, we have new scrum iterations or scrum sprints to readjust what's there and also to bring new client feature request into the project so that we are not running, there's not the need to run a full project if it's some days in production. So we try to have a DevOps idea of the project and doing a continuous circle of integration and also a continuous circle of development for the project. Yeah. Yeah. Thank you. So, yeah, that was quite elaborate answer. And you also touched upon what I wanted to ask next, actually, which was going to be the different stages in creating an application powered by machine learning. are pretty much the same and yes you very much correctly highlighted the the main difference which is model development basically so well in in classical devops let's say you also have
Starting point is 00:07:13 code code development and that may change but i guess typically you may have more iterations in machine learning models than you possibly have in code after it reaches some stable state. So I was wondering where in those stages that you mentioned and in what form actually does a user interface come into play and what is the importance of having one and whether this is meant to serve initially at least as a quick way for the people who develop the machine learning models to have a feeling and be able to check what they're developing or whether this can also serve at a later stage as a basis for developing the actual end user interface.
Starting point is 00:08:02 So, yeah, it depends. But I try to have them on board in the first sprint. end user interface? So, yeah, here it depends, but I try to have them on board in the first sprint. So if we have stakeholders, it's a handy trick for project management to have your stakeholders on your first iteration,
Starting point is 00:08:18 it's high-end commitment. So if they're on board and able to interrupt or to bring their features into the project, the commitment, if we are going to the production, deployment is higher of them because they are in the loop, they are responsible for their ideas and they're also, yeah, I always try it,
Starting point is 00:08:35 but there are also situations to break with this view room for sure. Sometimes it's not good to have too many people on board if you are deploying or developing an application. So when your team is knowing what a good application in this case is, sometimes it's better not to have too many people in the people management and stakeholder management to also handle the believing of what is a good product or end product. So sometimes you can break with it, but most of the time I try to have them in loop on the first iteration or the first, or from the first, from the starting point. So and then it's high in the commitment. And also then having a good UI
Starting point is 00:09:27 or a good user interface is needed because if we are only showing them code snippet, it's too abstract. So we can formulate an idea of what's happening there and we can also guide it. But having an interface is changing everything because it's easier for people to understand what's happening there and most of the time machine
Starting point is 00:09:49 learning is really abstract so we have an input and something is there's workflow and then we have the end result so if but if you have in front-end or in user interface you can show direct what's the impact of what I'm doing here so something is going in then there is a modulation and then we have to find a product and having a user interface and doing this with your stakeholders and going on in it guide them in a dashboard and going to your workflow and see what's happening in your machine learning model this is very useful because most stakeholders are not related to doing machine learning engineering stuff.
Starting point is 00:10:28 They're business analysts or managers or directors. So it's very useful to have a good UI or a good frontend from starting the project. So they are able to understand what's happening and can adjust or create PPIs for new requests or changes. Thank you. And I guess a typical way for creating at least some mock user interfaces for people to be able to see is what's called wireframing. And I wonder, well, you're probably aware of the technique, of the practice, and I guess that's also applied in applications powered by machine learning. And so I wonder if you can name, well, I guess first you can confirm whether you use it or not. And then if you can name some key requirements that you have for building this kind of user interface.
Starting point is 00:11:34 So the regular ones are easy to use. So the engineers need no too big boilerplate to deploy or to develop there. And it has to be easy to deploy and so developing and deployment should be easy and Then one of our key aspects is for for us and To have it on-prem because we are not allowed to use and not not allowed but it's It's sometimes a problem to use cloud service so it's we
Starting point is 00:12:07 have confidential data or we have special contracts and we're doing a lot of effort to keep our data safe so for us it's important to have a service which we can host in our cloud because we we have and yeah we try to keep all data in our cloud and because we we have some high effort to keep it safe there and all the data safe so far it's important to have it as a prime solution to host it in our cloud and then also important for us is to have a good charting ability so you can be sure we are using, also obscure charts and like network diagrams. So we are in a need to have a charting ability
Starting point is 00:12:54 to use many charts. Yeah, as I said, easy to develop. And it has to fit in our technology stack so if it's possible to use the framework which is also able to use one of our standard charting libraries which I pointing out later and this is super handy because we can use our we are used to that charting libraries we are using them in our projects so there is no need to train people in a new library so that's one of the main goals to to have a fixed technology stack because then
Starting point is 00:13:33 we have our operations in place which is taking care of the updates and everything is in place and sometimes it's hard to bring a new application or a new technology into our services because then we have to go the full circle of all processes and if we are able to use all technology stack which are we used to, it's handy. Okay. Recently, that was actually the occasion for which picked my attention and drove to this, led to this conversation. Recently you shared the thread with a number of tools that people can use for this purpose and I guess
Starting point is 00:14:16 you and your team may have used as well. And I was wondering if you'd like to say a few words about those tools, if you recall, or if a few words about those tools if you recall or if not I can remind you what those tools were and just the context in which you use those. Yeah for sure so there are many tools we are using and also we are testing because it's every time it's an evolution so and there's many many stuff going on in the ecosystem and because the need is we are not we are not the only company which is in the need for for having a good and UI or a good front and forth for machine
Starting point is 00:14:54 learning or for you for their data science products so and first of all the our easy or short shortcut is always using Streamlit because it's super easy. You have features like a date picker or also you are able to have a frontend with a file upload, which so a business analyst is able to use it as frontend and then to upload their exercise or their CSV and then do some adjustment with a slider or a date picker, what's in the need for the project. So then more advanced, and I really, really love it. It's Gradio because it's more focused for machine learning, and it's super cool.
Starting point is 00:15:38 There are so many features built into it in a short time. So the drive is there, and they're doing an amazing job to have it in your stack as front end for machine learning. I highly recommend it. You can run it out of the Jupyter Notebook. You can run it in a Google Colab. It's super integrable and it's cool. So then switching over to the enterprise stack
Starting point is 00:16:02 because here you're in the need of the server and it's more for big companies because you have to throw in some effort to host the servers and have the infrastructure, but it's the enterprise tech. So it's Plotty with Bash and the advantage here are you can use Plotty, which is super cool. We are using it all the time as charting library. You can use it in JavaScript ecosystem. And you can use it in the Python ecosystem
Starting point is 00:16:35 because there are frameworks for both ecosystems. And you are, yeah, you are able to switch the direction of the project if you're using it because you can develop some web applications with Node.js and you're also able to host your application in your intranet over Python using the Python stack and yeah, you're having a unique API to build your dashboards with Klotten. There's also Drive. There are
Starting point is 00:17:11 many features built into this library and their development. There are also new kits on the block, I will call them like this. It's Panel. Panel is also super awesome. It's panel, panel is also super awesome. It's one of the newer one, and you're able to use chart libraries from bookie, matplotlib and so on. Then there are also cloud services like DeepNode, which is super cool for collaboration. At the moment, it's only in the cloud, but they're working on a trend solution. And it's like Google cool up on steroids and
Starting point is 00:17:47 it's super cool to to collaborate there you are super fast because it's you spin it on and it's running in a cloud and your team is able to do their data science project and collaborate yeah there are also some other tools but at the moment we are focusing on these tools. Yeah, yeah, I think that's already enough to choose from and just to clarify, I guess all of those tools are open source or at least have an open source version and I guess many of them may also offer enterprise editions. Yes. Okay, so yeah what I was going to ask you actually is to if you can think of certain characteristics let's say along which to evaluate such tools and yeah to classify them. So we already kind of mentioned one,
Starting point is 00:18:46 whether they're open source or closed source and licensing and there are different versions and so on. Previously, you also mentioned deployment models, so whether tools can be deployed on premise or they have
Starting point is 00:19:02 a software as a service version. I guess another one would be their APIs and the kind of languages that they support. So we already have three I guess. Do you think there are other ones that are and those that I've mentioned so far are kind of generic so they could apply to any tool category really. Do you think there are others that are more specific for those kind of tool? Depends on the project and what you need. There are also tools like Observable HQ,
Starting point is 00:19:35 which is also running in the cloud and have many, many charting libraries with the click and you can use it, but it's running in the cloud and you have to put your data into their cloud um there and so one criterion for us to to choose one of the tools is depending if we are in a need to host it in our secure environment so then we cannot use at the moment tools like deep node um because we have to run it in our cloud.
Starting point is 00:20:07 So for that, also for Plotly-Enterprise you need a license so you have to pay for the service but it's worse. It's what it costs. And Streamlit and Grad, you can use free and host them in your intranet. If you are in a need for fast UI or fast front end, then choose them because it's super easy. You need nothing to, you install the Python libraries and then you're up to running and you're able to
Starting point is 00:20:40 have your interactive machine learning front end. What about the actual visual capabilities, let's say? Do you think there is something, well, let's put it that way. When you have some specific kind of visual in mind, like, I don't know, I would say pie charts or bar charts, but these are obviously kind of table stakes because I'm guessing... They're included in every tool we... Exactly, exactly, yeah. But if you want something more elaborate,
Starting point is 00:21:17 is there... If you have like a specific diagram requirement for a specific visual, for example, is it a parameter that you take into account and how you classify tools in that parameter for sure in this case most of the time we are using plotly because and then with the dash stack it's it's major but now we have to talk about the new kids on the block it's panel you're able to use in panel bouquet and matplotlib so and holo view so that are also matured charting libraries um so yeah you have to make this decision on other
Starting point is 00:21:58 factors um i'm in love with plotty so i so maybe I'm a little bit biased. That's okay. I mean, as long as you admit it and you just did. Yeah. So most of the time I try to use them because my team is also used to the API. But Panel is there. They're on the move. Keep an eye on them okay okay I will yeah I think I also saw you shared another one recently which has a kind of funny name so it's
Starting point is 00:22:36 called gooey which is kind of wordplay on GUI thereI as we sometimes call these tools. Yeah, these tools are more used for having a GUI for a Python application, not having a charting library in place or doing a business logic or a business intelligent application or a machine learning frontend. It's possible to build it with them. It's also possible to use it and having a front-end for your machine learning application, but regular use for it is to, if you're having a Python script and you're needing a front-end for it, then use one of these libraries. But it's also possible to use Dash for it or to use audio or panel. Yeah that's a good point and a
Starting point is 00:23:28 good distinction actually because I think that both scenarios are quite valid so if you want if you're like developing something I don't know a script or something and you want like a quick and dirty way to either interact with it yourself or I don't know, show it to a colleague. This is a very valid use case, I would say, for developing a kind of rudimentary user interface. Of course, as you say, it's a different case if you want to actually at least build
Starting point is 00:23:57 on what you initially developed so that you can have like an actual end user interface in the end. Yeah, for sure. So it's depending on your use case. But most of the time I will use one of the applications we have talked about also for frontend for Python scripts without the need for machine learning or the need for for a data science project. Okay,. One thing that struck me about all of these tools that you mentioned is that unless I'm missing something,
Starting point is 00:24:34 they all seem to be standalone, at least the ones I know better from this list like Streamlio. So I was wondering if you see offerings in this area from more like, let's say, integrated frameworks, like, I don't know, if people, let's say, if you use the Cloudera platform or, I don't know, the Databricks platform, is there something in there to cover that kind of need? And if there is, is is it I don't know on part let's say with the tools that you prefer to use? Depends on the project so you're able to if you're using DeepNode you can use also PlottyCharts in in the service and in Google CodeApp you're also able to use Gradle. So they're integrated in some kind of meaning. So and then you have to make the decision
Starting point is 00:25:32 if I want the full service or the full product and the full circle, then I'm going with Dash. And in other case, what depends on the project and the need. So if it's possible to have some kind of notebook in the cloud and using Google Colab, then you can also use PlottyShark in what way, if at all, of those tools can be integrated in an MLOps platform? So if you keep having new versions of your model and potentially also new versions of your data sets how are those tools able to keep up so that's one point you can you can have it also in the tools again then you have to most of the time you have to build it by yourself it's possible to to have it in the tools but for a real machine learning DevOps I highly highly recommend that you use the tools
Starting point is 00:26:45 which are built for it. And then having one of the other tools for the front end or the last part of it. But keeping track of your machine learning models, then use machine learning ops tools like MLOps or MLZen. Or I'm a big fan of DexHub because it's like GitHub for data science projects. You can store your data there, you can integrate them in one of the tools which are using for
Starting point is 00:27:14 machine learning ops and use this tool because they are built for it and it's easier to keep track in the tools because you have not to have some boilerplate for doing it your own in one of the other tools yeah yeah absolutely i that makes perfect sense i was more wondering about the the touch points let's say between those two so let me let me give an example so you have from between let's say one version and the next version of your model something changed so you have from between, let's say, one version and the next version of your model, something changed. So you have new features, for example. If you want to show those new features in your user interface, is there a way for these
Starting point is 00:27:56 tools to sort of introspect, let's say, and pick up the fact that, oh, the model now has a new feature or is this something that you have to manually do? Depends if it's a feature which is integrated in some of the of the existent so for Gradio so you can see direct impact and you can adjust what data is going on so it depends on the feature and what your frontend is. So if not you have to rebuild it and bring some features into the new frontend. Okay, I see. So it sounds like at least for some tools and in some cases there is some sort of capability to pick up those new features, for example. Otherwise, it's manual work. Yes, for sure. Okay.
Starting point is 00:28:49 Okay, interesting. Yeah, I think you've highlighted some interesting areas of those tools. And so, yeah, and you also mentioned the fact that for, in your case at least, being able to deploy the tool on-premises is a key factor because of confidentiality reasons basically for your data. That apart, if you could choose among the other criteria for picking a tool which one or which ones would you say are the most important and so not keeping into account that we are used to have it in our own cloud if we exclude this
Starting point is 00:29:33 requirement right did i get a question right yes then it's for me if i can choose it for for a project which is not related for working environment, then having the ability to collaborate without the hustle in hosting an environment and having all the administration stuff running. So DeepNode is really cool in this case. Also, if you're having your own PlotlyDash instance spinning already in your environments, then it's super easy to use it and yeah okay okay
Starting point is 00:30:09 thank you yeah i think uh i think people will find your experience and your uh insights in on using those tools are useful uh when they also want to choose something from this category. So yeah, I think that's it. I think we covered lots of ground and thanks for sharing. So do you have any closing comments or ideas and we can wrap up here I guess. We can wrap up. It was fun to talk to you and to share some knowledge about our development process. Thanks for having me. I hope you enjoyed the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.