Orchestrate all the Things - MLGUI: Building user interfaces for machine learning applications. Featuring KPMG Germany Senior Data Engineer Philip Vollet
Episode Date: July 19, 2021Machine learning is eating the world, and spilling over to established disciplines in software, too. After MLOps, is the world ready to welcome MLGUI? Philip Vollet is somewhat of a celebrity, a...ll things considered. Miley Cyrus or Lebron James he is not, at least not yet, but if data science lives up to the hype, who knows. As the senior data engineer with KPMG Germany, Vollet leads a small team of machine learning and data engineers building the integration layer for internal company data, with access standardization for internal and external stakeholders. Outside of KPMG, Vollet has built a tool chain to find, process, and share content on data science, machine learning, natural language processing, and open source using exactly those technologies, which makes for a case of meta, if nothing else. There is a flood of social media influencers sharing perspectives on data science and machine learning. While most influencers direct their attention solely toward issues of model building and infrastructure scaling, Vollet also looks at the user view, or frameworks for building user interfaces for applications utilizing machine learning. We were intrigued to discuss with him how building these user interfaces is necessary to unlock AI's true potential. Article published on VentureBeat. Photo by Kelly Sikkema on Unsplash
Transcript
Discussion (0)
Welcome to the Orchestrate All the Things podcast.
I'm George Amadiotis and we'll be connecting the dots together.
Machine learning is it in the world and spilling over to established disciplines in software too.
After MLOps, is the world ready to welcome MLGUI?
Philip Follett is somewhat of a celebrity, all things considered.
Miley Cyrus or LeBron James, he is not, at least not yet.
But if data science lives up to the hype, who knows? As the senior data engineer with KPMG
Germany, Paulette leads a small team of machine learning and data engineers, building the
integration layer for Intel and company data, with access standardization for internal and
external stakeholders. Outside of KPMG, Paulette has built a toolchain to find, process and share content on data science,
machine learning, natural language processing and open source using exactly
these technologies, which makes for a case of meta, if nothing else. There is a
flood of social media influencers sharing perspectives on data science and
machine learning. Paulette actually knows what he's talking about. While most influencers direct their
attention slowly towards issues of model building and infrastructure scaling,
Paulette also looks at the user view or frameworks for building user interfaces
for applications utilizing machine learning. We were intrigued to discuss
with him how building these user interfaces is
necessary to unlock AI's true potential.
I hope you will enjoy the podcast. If you like my work, you can follow Link Data Orchestration
on Twitter, LinkedIn and Facebook.
So I have a long, long track of history in regarding to technology or IT projects. So I have done consulting in this area.
I have every time specialized on data or databases.
So if it's graph databases or SQL databases
and have my own company in this area
and do some freelancing works,
then starting at KPMG as freelancer
and then getting a full-time job there.
And at KPMG, we are an internal unit, so we are crunching data of the internal KPMG Germany
data.
So doing the internal reporting for our management and also building some data and machine learning
pipelines to analyze the internal data.
And we are responsible for parts of the internal KPMG data. So if there are projects which are one to access the data,
we are between, or we are the layer between it
so that they are able to access data,
but we are looking into the data streams
and knowing explicitly what's going on and what's going in.
Yeah, I'm mostly interested in data, also in graph databases.
So on LinkedIn, I'm advocating open source and open science projects.
And that's where my heart belongs.
So advocating open source projects and spreading the love and also the word of open source. Yeah, and today we want to discuss some front end or how to produce a front end for a machine
learning project and in general how it is to develop a machine learning project.
Thank you for the introduction and yeah before we actually get to the front end part,
let's start by mentioning briefly,
because, well, it may be obvious to you and me,
but not to everyone,
how applications that are powered by machine learning
are different than traditional applications,
because I think that will set the scene for the UI part.
For sure.
That depends on your point of view.
Well, most of the time for me,
there is not really a difference.
The reason here is I have the same steps
for a regular development project
or software development project,
which I also have for a machine learning project.
So there are some special aspects I will talk later on.
But first of all, for me, it's the same.
I have to do the people allocation.
So seeing which engineers are on the project,
I have to do the budget parts.
I have to bring it into our Azure DevOps environment.
Microsoft changed the name before it was
Team Foundation Server.
Now it's Microsoft Azure DevOps.
And we are doing our project there.
We do our scrum planning there or scrum sprints.
And for me, it's bringing all the PBIs into Azure DevOps
and plan the sprints and talk to the shareholder of the company,
to the stakeholders of the company,
which is responsible for which part of the project and then having the project in the
pipe and do the planning.
What's special for machine learning projects, the special part here is we have to keep track of the of the data and we have also keep
track of the models because and model drift is real so it's possible that
today our model is fitting perfect into our needs but in six months we have to
to evaluate or look into the model if it's also hitting what we are one that
is hit there and there are
also tools for for do the conscious integration parts because only bringing
the project from death to staging and then into production is half of the work
so the real work begins if you're hitting production and having it as
continuous integration project running so for for us, it's developing the project
and then handing it over to operations
because we have special operation units
which are keeping track or care of the final projects
if they are running.
So we have an operations part there.
And it's also, we try to keep it as DevOps.
So it's also, it's in production,
but it's not fixed there.
We have to, we have new scrum iterations or scrum sprints to readjust what's there and
also to bring new client feature request into the project so that we are not running, there's
not the need to run a full project if it's some days in production.
So we try to have a DevOps idea of the project and doing a continuous circle of integration and also a continuous circle of development for the project.
Yeah. Yeah. Thank you. So, yeah, that was quite elaborate answer. And you also touched upon what I wanted to ask next, actually, which was going to be the different stages in creating an application powered by machine learning. are pretty much the same and yes you very much correctly highlighted the the main difference
which is model development basically so well in in classical devops let's say you also have
code code development and that may change but i guess typically you may have more iterations
in machine learning models than you possibly have in code after it reaches some stable state.
So I was wondering where in those stages that you mentioned and in what form actually does a user interface
come into play and
what is the importance of having one and whether this is meant to serve
initially at least as a quick way for the people who develop the machine learning models
to have a feeling and be able to check what they're developing
or whether this can also serve at a later stage as a basis for developing the actual end user interface.
So, yeah, it depends.
But I try to have them on board in the first sprint. end user interface? So, yeah, here it depends, but
I try to have them on board
in the first sprint. So if we have stakeholders,
it's a handy
trick for project
management to have your stakeholders
on your first iteration,
it's high-end commitment. So if
they're on board and able to
interrupt or to bring their features into
the project, the commitment, if we are going to the production,
deployment is higher of them
because they are in the loop,
they are responsible for their ideas
and they're also, yeah, I always try it,
but there are also situations
to break with this view room for sure.
Sometimes it's not good to have too many people on board if you are deploying or developing an application.
So when your team is knowing what a good application in this case is, sometimes it's better not to have too many people in the people management and stakeholder management to also handle
the believing of what is a good product or end product.
So sometimes you can break with it, but most of the time I try to have them in loop on
the first iteration or the first, or from the first, from the starting point.
So and then it's high in the commitment. And also then having a good UI
or a good user interface is needed
because if we are only showing them code snippet,
it's too abstract.
So we can formulate an idea of what's happening there
and we can also guide it.
But having an interface is changing everything
because it's easier for
people to understand what's happening there and most of the time machine
learning is really abstract so we have an input and something is there's
workflow and then we have the end result so if but if you have in front-end or in
user interface you can show direct what's the impact of what I'm doing here
so something is going in then there is a modulation and then we have to find a product and having a user
interface and doing this with your stakeholders and going on in it guide
them in a dashboard and going to your workflow and see what's happening in
your machine learning model this is very useful because most stakeholders are not related to doing machine learning
engineering stuff.
They're business analysts or managers or directors.
So it's very useful to have a good UI or a good frontend from starting the project. So they are able to understand what's happening and can adjust or create PPIs for new requests or changes.
Thank you. And I guess a typical way for creating at least some mock
user interfaces for people to be able to see is what's called wireframing.
And I wonder, well, you're probably aware of the technique, of the practice, and I guess
that's also applied in applications powered by machine learning.
And so I wonder if you can name, well, I guess first you can confirm whether you use it or not.
And then if you can name some key requirements that you have for building this kind of user interface.
So the regular ones are easy to use.
So the engineers need no too big boilerplate to deploy or to develop there.
And it has to be easy to deploy and
so developing and deployment should be easy and
Then one of our key aspects is for for us and
To have it on-prem because we are not allowed to use
and not not allowed but it's
It's sometimes a problem to use cloud service so it's we
have confidential data or we have special contracts and we're doing a lot
of effort to keep our data safe so for us it's important to have a service
which we can host in our cloud because we we have and yeah we try to keep all data in our cloud and because we we have some high
effort to keep it safe there and all the data safe so far it's important to have it as
a prime solution to host it in our cloud and then also important for us is to have a good
charting ability so you can be sure we are using,
also obscure charts and like network diagrams.
So we are in a need to have a charting ability
to use many charts.
Yeah, as I said, easy to develop.
And it has to fit in our technology stack so if it's
possible to use the framework which is also able to use one of our standard
charting libraries which I pointing out later and this is super handy because we
can use our we are used to that charting libraries we are using them in our
projects so there is no need to train people
in a new library so that's one of the main goals to to have a fixed technology stack because then
we have our operations in place which is taking care of the updates and everything is in place
and sometimes it's hard to bring a new application or a new technology into our services because
then we have to go the full circle of all processes and if we are able to use all technology
stack which are we used to, it's handy.
Okay.
Recently, that was actually the occasion for which picked my attention and drove to this, led to this
conversation. Recently you shared the thread with a number
of tools that people can use for this purpose and I guess
you and your team may have used as well. And I was wondering
if you'd like to say a few words about those
tools, if you recall, or if a few words about those tools if you recall or if not I can remind
you what those tools were and just the context in which you use those. Yeah for sure so there are
many tools we are using and also we are testing because it's every time it's an evolution so and
there's many many stuff going on in the ecosystem and
because the need is we are not we are not the only company which is in the
need for for having a good and UI or a good front and forth for machine
learning or for you for their data science products so and first of all the
our easy or short shortcut is always using Streamlit because it's super easy.
You have features like a date picker or also you are able to have a frontend with a file
upload, which so a business analyst is able to use it as frontend and then to upload their
exercise or their CSV and then do some adjustment with a slider or a date picker,
what's in the need for the project.
So then more advanced, and I really, really love it.
It's Gradio because it's more focused for machine learning, and it's super cool.
There are so many features built into it in a short time.
So the drive is there, and they're doing an amazing job
to have it in your stack as front end for machine learning.
I highly recommend it.
You can run it out of the Jupyter Notebook.
You can run it in a Google Colab.
It's super integrable and it's cool.
So then switching over to the enterprise stack
because here you're in the need of the server
and it's more for big companies because you have to throw in some effort to host the servers
and have the infrastructure, but it's the enterprise tech.
So it's Plotty with Bash and the advantage here are you can use Plotty, which is super
cool.
We are using it all the time as charting library.
You can use it in JavaScript ecosystem.
And you can use it in the Python ecosystem
because there are frameworks for both ecosystems.
And you are, yeah, you are able to switch the direction
of the project if you're using it
because you can develop some web
applications with Node.js and you're also able to host your application in your
intranet over Python using the Python stack and
yeah, you're having a unique API to
build your dashboards with Klotten. There's also Drive. There are
many features built into this library and their development. There are also new kits
on the block, I will call them like this. It's Panel. Panel is also super awesome. It's panel, panel is also super awesome. It's one of the newer one, and you're able to use chart libraries
from bookie, matplotlib and so on.
Then there are also cloud services like DeepNode,
which is super cool for collaboration.
At the moment, it's only in the cloud,
but they're working on a trend solution.
And it's like Google cool up on steroids and
it's super cool to to collaborate there you are super fast because it's you spin
it on and it's running in a cloud and your team is able to do their data
science project and collaborate yeah there are also some other tools but at the moment we are focusing on
these tools.
Yeah, yeah, I think that's already enough to choose from and just to clarify, I guess
all of those tools are open source or at least have an open source version and I guess many
of them may also offer enterprise editions.
Yes. Okay, so yeah what I was going to ask you actually is to if you can think of certain characteristics let's say along which to evaluate such tools and yeah to classify them. So we already kind of mentioned one,
whether they're open source
or closed source and licensing
and there are different versions and
so on. Previously, you
also mentioned deployment
models, so whether
tools can be deployed on
premise or they have
a software as a service
version. I guess another one would be their
APIs and the kind of languages that they support. So we already have three I guess. Do you think
there are other ones that are and those that I've mentioned so far are kind of generic so they could
apply to any tool category really. Do you think there are others that are more specific
for those kind of tool?
Depends on the project and what you need.
There are also tools like Observable HQ,
which is also running in the cloud
and have many, many charting libraries with the click
and you can use it, but it's running in the cloud
and you have to put your data into their cloud um there and so
one criterion for us to to choose one of the tools is depending if we are in a
need to host it in our secure environment so then we
cannot use at the moment tools like deep node um because we have
to run it in our cloud.
So for that, also for Plotly-Enterprise you need a license so you have to pay for the
service but it's worse.
It's what it costs.
And Streamlit and Grad, you can use free and
host them in your intranet. If you are in a need for fast UI or fast front end,
then choose them because it's super easy.
You need nothing to, you install the Python libraries
and then you're up to running and you're able to
have your interactive machine learning front end.
What about the actual visual capabilities, let's say?
Do you think there is something, well, let's put it that way.
When you have some specific kind of visual in mind,
like, I don't know, I would say pie charts or bar charts,
but these are obviously kind of table stakes because I'm guessing... They're included in every tool we...
Exactly, exactly, yeah.
But if you want something more elaborate,
is there...
If you have like a specific diagram requirement
for a specific visual, for example,
is it a parameter that you
take into account and how you classify tools in that parameter for sure in this case most of the
time we are using plotly because and then with the dash stack it's it's major but now we have
to talk about the new kids on the block it's panel you're able to use in panel bouquet and matplotlib so and holo view
so that are also matured charting libraries um so yeah you have to make this decision on other
factors um i'm in love with plotty so i so maybe I'm a little bit biased.
That's okay.
I mean, as long as you admit it and you just did.
Yeah.
So most of the time I try to use them because my team is also used to the API.
But Panel is there.
They're on the move.
Keep an eye on them okay okay I will yeah I think I also saw you shared another one recently which has a kind of funny name so it's
called gooey which is kind of wordplay on GUI thereI as we sometimes call these tools.
Yeah, these tools are more used for having a GUI for a Python application,
not having a charting library in place or doing a business logic or a business
intelligent application or a machine learning frontend.
It's possible to build it with them. It's also possible to use it and having a front-end for your machine learning application, but
regular use for it is to, if you're having a Python script and you're needing a front-end for it, then use
one of these libraries. But it's also possible to use
Dash for it or to use audio or panel. Yeah that's a good point and a
good distinction actually because I think that both scenarios are quite
valid so if you want if you're like developing something I don't know a
script or something and you want like a quick and dirty way to either interact
with it yourself or I don't know, show it to a colleague.
This is a very valid use case, I would say,
for developing a kind of rudimentary user interface.
Of course, as you say, it's a different case
if you want to actually at least build
on what you initially developed
so that you can have like an actual end user interface
in the end.
Yeah, for sure. So it's depending on your use case.
But most of the time I will use one of the applications we have talked about
also for frontend for Python scripts
without the need for machine learning or the need for
for a data science project. Okay,. One thing that struck me about all of these tools that you mentioned is that unless I'm missing something,
they all seem to be standalone, at least the ones I know better from this list like Streamlio. So I was wondering if you see offerings in this area from more like, let's say,
integrated frameworks, like, I don't know, if people, let's say, if you use the Cloudera
platform or, I don't know, the Databricks platform, is there something in there to cover that kind of
need? And if there is, is is it I don't know on
part let's say with the tools that you prefer to use? Depends on the project so
you're able to if you're using DeepNode you can use also PlottyCharts in
in the service and in Google CodeApp you're also able to use Gradle. So they're integrated in some kind of meaning.
So and then you have to make the decision
if I want the full service or the full product
and the full circle, then I'm going with Dash.
And in other case, what depends on the project and the need.
So if it's possible to have some kind of notebook in the cloud and using Google Colab,
then you can also use PlottyShark in what way, if at all, of those tools can be integrated in an MLOps platform? So if you keep having new versions of your model and potentially also new versions of your data sets how are those tools able to keep up so that's one point you can
you can have it also in the tools again then you have to most of the time you
have to build it by yourself it's possible to to have it in the tools but
for a real machine learning DevOps I highly highly recommend that you use the tools
which are built for it.
And then having one of the other tools
for the front end or the last part of it.
But keeping track of your machine learning models,
then use machine learning ops tools like MLOps or MLZen.
Or I'm a big fan of DexHub
because it's like GitHub for data science projects.
You can store your data there, you can integrate them in one of the tools which are using for
machine learning ops and use this tool because they are built for it and it's easier to keep
track in the tools because you have not to have some boilerplate for doing it your own in
one of the other tools yeah yeah absolutely i that makes perfect sense i was more wondering about
the the touch points let's say between those two so let me let me give an example so you have from
between let's say one version and the next version of your model something changed so you have from between, let's say, one version and the next version of your model, something
changed.
So you have new features, for example.
If you want to show those new features in your user interface, is there a way for these
tools to sort of introspect, let's say, and pick up the fact that, oh, the model now has
a new feature or is this something that you have to manually do? Depends if it's a feature which is integrated in some of
the of the existent so for Gradio so you can see direct impact and you can
adjust what data is going on so it depends on the feature and what your frontend is. So if not you have to rebuild it and bring some
features into the new frontend. Okay, I see. So it sounds like at least for some
tools and in some cases there is some sort of capability to pick up those new features, for example. Otherwise, it's manual work.
Yes, for sure.
Okay.
Okay, interesting.
Yeah, I think you've highlighted some interesting areas of those tools.
And so, yeah, and you also mentioned the fact that for, in your case at least,
being able to deploy the
tool on-premises is a key factor because of confidentiality reasons basically
for your data. That apart, if you could choose among the other criteria
for picking a tool which one or which ones would you say are the most important
and so not keeping into account that we are used to have it in our own cloud if we exclude this
requirement right did i get a question right yes then it's for me if i can choose it for
for a project which is not related for working environment, then having the ability to collaborate
without the hustle in hosting an environment
and having all the administration stuff running.
So DeepNode is really cool in this case.
Also, if you're having your own PlotlyDash instance
spinning already in your environments,
then it's super easy to use it and yeah okay okay
thank you yeah i think uh i think people will find your experience and your uh insights in on
using those tools are useful uh when they also want to choose something from this category. So yeah, I think that's
it. I think we covered lots of ground and thanks for sharing. So do
you have any closing comments or ideas and we can wrap up here I guess. We can
wrap up. It was fun to talk to you and to share some knowledge about our development process.
Thanks for having me.
I hope you enjoyed the podcast.
If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook.