The Good Tech Companies - Let's Build an MLOps Pipeline With Databricks and Spark - Part 2
Episode Date: December 29, 2024This story was originally published on HackerNoon at: https://hackernoon.com/lets-build-an-mlops-pipeline-with-databricks-and-spark-part-2. Deploy the model for Batch In...ference and model serving using Databricks Unity Catalog Check more stories related to machine-learning at: https://hackernoon.com/c/machine-learning. You can also check exclusive content about #mlops, #databricks, #pyspark, #model-monitoring, #feature-store, #databricks-asset-bundles, #good-company, #hackernoon-top-story, and more. This story was written by: @neshom. Learn more about this writer by checking @neshom's about page, and for more stories, please visit hackernoon.com. In the second part of this blog we see how Databricks enable us for batch deployment and online serving. We spend some time on how to setup data and model monitoring dashboards.
Transcript
Discussion (0)
This audio is presented by Hacker Noon, where anyone can learn anything about any technology.
Let's build an MLOPs pipeline with Databricks and Spark, Part 2.
By Mohsin Jadidi, in the equals equals first part of this tutorial equals equals series,
we took the first steps for building an end-to-end MLOPs pipeline using Databricks and Spark,
guided by Databricks reference architecture.
Here's a recap of the key steps we covered setting
up the Unity Catalog for Medallion architecture. We organized our data into bronze, silver,
and gold layers within the Unity Catalog, establishing a structured and efficient
data management system. Ingesting data into Unity Catalog. We demonstrated how to import
raw data into the system, ensuring consistency and quality for subsequent processing stages. Training the model. Utilizing Databricks, we trained a machine learning
model tailored to our dataset, following best practices for scalable and effective model
development. Hyperparameter tuning with Hyperopt. To enhance model performance, we employed Hyperopt
to automate the search for optimal hyperparameters, improving accuracy and efficiency. Experiment tracking with Databricks MLflow.
We utilized MLflow to log and monitor our experiments, maintaining a comprehensive
record of model versions, metrics, and parameters for easy comparison and reproducibility.
Backslash dot. With these foundational steps completed, your model is now primed for deployment.
In this second part, we'll focus on integrating two critical components into our system.
1. Batch Inference. Implementing batch processing to generate predictions on large
datasets, suitable for applications like bulk scoring and periodic reporting.
2. Online Inference. Model serving, setting up real-time model serving to provide
immediate predictions, essential for interactive applications and services. 3. Model monitoring,
to ensure your deployed models maintain optimal performance and reliability over time.
Let's get into it. Model deployment. The departure point of the last blog was model evaluation.
Now imagine we did the comparison and found that our model shows a higher performance
compared to this production model.
As we want to use the model in production, we want to take advantage of all the data
that we have.
The next step is to train and test the model using the full dataset.
Then persist our model for later use be deploying it as our champion model.
Since this is the final model
that we want to use for inference, we use the feature engineering client to train the model.
This way we are not only track the model lineage easier, but also offload the schema validation
and feature transformation, if any, to the client. We can also use the equals equals feature store
or feature engineering API as equals equals to train and log the models when we use the feature engineering api we can view the models lineage in catalog
explorer now let's update the model description and assign a champion label to it now go ahead
and check the schema that you registered the model you should see all your updates as follows
tip model stages if you use workspace for model registry you should stage so manage your models
using aliases won't work check out equals equals here equals equals to see how it works model inference batch scoring now imagine we want to use our model in production for inference
in this stepway load the champion model and use it to generate 20 movie recommendations for each
users and you can see we used the same training data
for batch scoring. Although in the case of recommender systems it makes sense, in most
application we want use the model to score some unseen data. For example, imaging you're on Netflix
and want to update the user recommendations at the end of day based on their new watch list.
We can schedule job that run the batch scoring at specific time at the end of the
day. Now we can go ahead and generate the recommendations for each user. For this we find
the top 20 items per users here is how the result look like finally we can store the prediction as a
delta label on our UC or publish them to a downstream systems MongoDB or Azure Cosmos DB.
We go with the FERS option streaming streaming, online inference. Now imagine a case
in which we want to update our recommendations based on real-time user interactions. For this
case we can use model serving. When someone wants to use your model, they can send data to the
server. The server then feeds that data to your deployed model, which goes into action, analyzes
the data, and generates a prediction. They can be used in web applications, mobile apps, or even embedded systems.
One of the application of this approach is to enable traffic routing for A-B testing.
ALS algorithm can't be used directly for online inference since it requires their
training the model using the whole data, old plus new, to update the recommendations.
Gradient descent learning
algorithms are examples of model that can be used for online updates. We might look at some of these
algorithms in future post. However, just to illustrate how such a model would work, we are
creating a useless model serving endpoint that predict movie rating based whenever you say rate
of movies. This will create an lunch model serving cluster for us so it takes some time
now if you open the window you should see your endpoint tip we can use one endpoint to serve
multiple model then we can use traffic routing for scenarios such as a b testing or compare the
performance of difference models in the production inference table inference tables in databricks
model serving act as an automatic log for our deployed models. When enabled, they capture incoming requests, data sent for prediction, the corresponding
model outputs, predictions, and some other metadata as a delta table within Unity catalog.
We can use inference table for monitoring and debugging, lineage tracking, and a data
collection procedure for training or fine-tune our models. We can enable the on-r serving endpoint to monitor the model.
We can do it by specifying the properties in the payload when we first create the endpoint.
Or we update our endpoint afterwards using the command and the endpoint URL as follows,
more equals equals here, equals equals.
Now let's feed the endpoint with some dummy user interaction data.
We can check the endpoint logs in the table.
It takes around 10 minutes until you can see the data in the table.
You should see something like this your payload table to understand the schema of this inference
table.
Check unity catalog inference table schema equals equals here.
Equals equals model monitoring.
Model and data monitoring a complex topic that requires a lot of time to
master. Databricks Lakehouse Monitoring, DLM, reduces the overhead of building a proper
monitoring system by providing standard and customizable templates for commonest cases.
However, mastering DLM and model monitoring in general requires a lot of experimentations.
I don't want to give you an extensive overview of model monitoring here but rather give you a starting point. I might dedicate a blog taught this topic
in future. A short summary of DLM functionalities and features now that we have our model up and
running. We can use inference table generated be our serving endpoint to monitor key metrics such
a model performance and drift to detect any deviations or anomalies in our data or model over time. This proactive approach help us to take timely corrective actions, such as retraining the
model or updating its features, to maintain optimal performance and alignment with business
objectives. DLM provides three type of analysis or time series, snapshot and inference. Since we
are interested in analyzing our inference table, we focus on the
latter one. To use a table for monitoring, our primary table, we should make sure that the table
have the right structure. For the equals equals inference table, equals equals each row should
correspond to requests with following columns, model features, model prediction, model id,
timestamp, timestamp of the inference request, ground truth, optional.
The model ID is important for cases when we serve multiple models and we want to track the
performance of each model in one monitoring dashboard. If there are more than one model
ID available, DLM uses it to slice the data and compute metrics and statics for each slice
separately. DLM computes each statistics and metrics for a
specified time interval. For inference analysis, it used the timestamp column plus a user-defined
window size to identify the time windows. More below, DLM supports two for inference tables,
classification or regression. It computes some of the relevant metrics and statistics based on the this specification.
To use DLM, we should create a monitor and attach it to a table.
When we do this DLM create to profile metric table, this table contains summary statistics such as min, max, percentage of null and zeros. It also contains additional metrics based on
the problem type defined by the user. For example, precision, recall and F1 underscore score for the classification models,
and mean underscore squared underscore error and mean underscore average underscore error
for regression models. Drift metric table. It contains statistic that measure how the
distribution of data has changed over time or relative to a baseline value, if provided. It compute measures
such as chi-square test, ks-test. To see the list of complete metrics for each table check
equals equals monitor metric table equals equals documentation page. It is also possible to create
equals equals custom metrics. Equals equals an important aspect of building a monitoring system
is to make sure that our monitoring dashboard has access to the latest inference data as they arrive. For that we can use
delta table streaming to keep track of processed rows in the inference table.
We use the model servings inference table as our source table and the monitoring table as the sync
table. We also make sure the change data change data capture equals equals CDC is enabled on both
tables. It is enabled by default on the inference table. This way we process only changes insert,
update, delete in the source table rather than reprocessing the entire table every refresh.
Hands-on to enable the monitoring over our inference table we take the following steps.
1. Read the inference table
as a streaming table 2. create a new delta table with the right schema by unpacking the inference
table that is generated by our model serving endpoint 3. prepare the baseline table if any
4. create a monitor over the resulting table and refresh the metric 5. schedule a workflow to
unpack the inference table to the right structure and refresh the metrics.
First we need to install the Lakehouse Monitoring API.
It should be already installed if you use Databricks RUM Time 15.
3 LTS and above let's read the inference table as a streaming table
Next we have to put the table in right format as described above.
This table should have one row for each prediction
with relevant the features and prediction value. in write format as described above. This table should have one row for each prediction with
relevant the features and prediction value. The inference table that we get from the model
serving endpoint store the endpoint requests and responses as a nested JSON format. Here is an
example of the JSON payload for the request and response column. To unpack this table to the right
schema we can use the following code that is adapted from databricks documentation
equals equals inference table lakehouse monitoring starter notebook equals equals the resulting table
would look like this next we should initialize our sync table and write the results finally
we create our baseline table dlm uses this table to compute the drift speed comparing the
distribution of similar columns of baseline and primary models the baseline table should compute the drift speed comparing the distribution of similar columns of baseline and primary models. The baseline table should have the same feature column as the primary column as well
as the same model identification column. For baseline table we use the prediction table of
our validation data set that we store earlier after we trained our model using eBest hyperparameter.
To compute the drift metric, Databricks compute the profile metrics for both primary and the baseline table.
Here you can read about the equals equals primary table and baseline table.
Equals equals, now we are read to create our monitoring dashboard.
We can do it either using the equals equals UI equals equals or the lakehouse monitoring API.
Here we use the second option.
After we run the code it takes some time until Databricks
calculate all the metric. To see the dashboard go to the tab of your sync table, i.e. you should
see a page as follow. If you click on the view you see you're running, pending and past refreshes.
Click on the to open your dashboard. So we start with the inference table, process it and save the
result to and past this table along with our baseline table to our monitoring API. The DLM compute the profile metrics for each
table and use the them to compute the drift metrics, there you go. You have everything you
need to serve and monitor your model, in the next part I'll show you how to automate this process
using Databricks Assets Bundle and GitLab. In the equals equals first part of this tutorial
equals equals series, we took the first steps for building an end-to-end MLOPs pipeline using
Databricks and Spark, guided by Databricks reference architecture. Here's a recap of the
key steps we covered setting up the Unity Catalog for Medallion Architecture. We organized our data
into bronze, silver, and gold layers within the Unity Catalog, establishing a
structured and efficient data management system. Ingesting data into Unity Catalog. We demonstrated
how to import raw data into the system, ensuring consistency and quality for subsequent processing
stages. Training the model. Utilizing Databricks, we trained a machine learning model tailored to
our dataset, following best practices for scalable and effective model development.
Hyperparameter Tuning with Hyperopt To enhance model performance, we employed
Hyperopt to automate the search for optimal hyperparameters, improving accuracy and efficiency.
Experiment Tracking with Databricks MLflow We utilized MLflow to log and monitor our
experiments, maintaining a comprehensive record
of model versions, metrics, and parameters for easy comparison and reproducibility.
Backslash dot. With these foundational steps completed, your model is now primed for
deployment. In this second part, we'll focus on integrating two critical components into our
system 1. Batch inference. Implementing batch processing to
generate predictions on large datasets, suitable for applications like bulk scoring and periodic
reporting. 2. Online inference, model serving, setting up real-time model serving to provide
immediate predictions, essential for interactive applications and services. 3. Model monitoring.
To ensure your deployed models maintain optimal performance and
reliability over time. Let's get into it. Model deployment. The departure point of the last blog
was model evaluation. Now imagine we did the comparison and found that our model shows a
higher performance compared to this production model. As we want, assume, to use the model in
production, we want to take advantage of all the
data that we have. The next step is to train and test the model using the full dataset.
Then persist our model for later use be deploying it as our champion model.
Since this is the final model that we want to use for inference, we use the feature engineering
client to train the model. This way we are not only track the model lineage easier,
but also offload the schema validation and feature transformation, if any, to the model. This way we are not only track the model lineage easier, but also offload
the schema validation and feature transformation, if any, to the client. We can also use the
equals equals feature store or feature engineering API as equals equals to train and log the models.
When we use the feature engineering API we can view the model's lineage in catalog explorer.
Now let's update the model description and assign a champion label to it.
Now go ahead and check the schema that you registered the model you should see all your updates as follows tip model stages if you use workspace for model registry you should stage
though manage your models using aliases won't work check out equals equals here equals equals to see
how it works model inference batch scoring now imagine we want to
use our model in production for inference. In this step we load the champion model and use it to
generate 20 movie recommendations for each users. And you can see we use the same training data for
batch scoring. Although in the case of recommender systems it makes sense, in most application we
want use the model to score some unseen data.
For example, imaging you are on Netflix and want to update the user recommendations at the end of day based on their new watched list. We can schedule job that run the batch scoring at
specific time at the end of the day. Now we can go ahead and generate the recommendations for each
user. For this we find the top 20 items per users here is how the result look like finally we
can store the prediction as a delta label on our UC or publish them to a downstream systems MongoDB
or Azure Cosmos DB. We go with the FERS option streaming online inference now imagine a case
in which we want to update our recommendations based on real-time user interactions. For this
case we can use model serving. When someone wants to use
your model, they can send data to the server. The server then feeds that data to your deployed model,
which goes into action, analyzes the data, and generates a prediction. They can be used in web
applications, mobile apps, or even embedded systems. One of the application of this approach
is to enable traffic routing for A-B testing.
ALS algorithm can't be used directly for online inference since it requires their training the model using the whole data, old plus new, to update the recommendations.
Gradient descent learning algorithms are examples of model that can be used for online updates.
We might look at some of these algorithms in future post. However, just to illustrate how
such a model would work, we are creating a useless model serving endpoint that predict
movie rating based whenever a user rate a movies. This will create an lunch model serving cluster
for us so it takes some time. Now if you open the window you should see your endpoint.
Tip we can use one endpoint to serve multiple model. Then we can use traffic routing for
scenarios such as A, B testing or compare the performance of difference models in the production. we can use one endpoint to serve multiple model then we can use traffic routing for scenarios
such as a b testing or compare the performance of difference models in the production
inference table inference tables in databricks model serving act as an automatic log for our
deployed models when enabled they capture incoming requests data sent for prediction
the corresponding model outputs predictions and some other metadata as a delta table within Unity Catalog.
We can use inference table for monitoring and debugging, lineage tracking, and a data
collection procedure for training or fine-tune our models.
We can enable the on-r serving endpoint to monitor the model.
We can do it by specifying the properties in the payload when we first create the endpoint.
Or we update our endpoint afterwards using the command and the payload when we first create the endpoint. Or we update our endpoint
afterwards using the command and the endpoint URL as follows, more equals equals here, equals equals,
now let's feed the endpoint with some dummy user interaction data we can check the endpoint logs
in the table. It takes around 10 minutes until you can see the data in the table. You should see
something like this your payload table to understand the schema of this inference table. Check unity catalog inference table schema equals equals here.
Equals equals model monitoring. Model and data monitoring a complex topic that requires a lot
of time to master. Databricks lakehouse monitoring, DLM, reduces the overhead of building a proper
monitoring system by providing standard and customizable templates for commonest cases. However, mastering DLM and model monitoring
in general requires a lot of experimentations. I don't want to give you an extensive overview
of model monitoring here but rather give you a starting point. I might dedicate a blog taught
this topic in future. A short summary of DLM functionalities and features now that we
have our model up and running. We can use inference table generated be our serving endpoint to monitor
key metrics such a model performance and drift to detect any deviations or anomalies in our data or
model over time. This proactive approach help us to take timely corrective actions such as
retraining the model or updating its features, to maintain optimal performance and alignment with business objectives.
DLM provides three type of analysis are, time series, snapshot and inference.
Since we are interested in analyzing our inference table, we focus on the latter one.
To use a table for monitoring, our primary table, we should make sure that the table have the right structure. For the equals equals inference
table, equals equals each row should correspond to requests with following columns, model features,
model prediction, model id, timestamp, timestamp of the inference request, ground truth, optional.
The model id is important for cases when we serve multiple models and we want to track the
performance of each model in one monitoring dashboard. If there are more than one model it available, DLM uses it to slice the data
and compute metrics and statics for each slice separately. DLM computes each statistics and
metrics for a specified time interval. For inference analysis, it used the timestamp column
plus a user-defined window size to identify the time windows.
More below, DLM supports two for inference tables, classification, or regression. It computes some
of the relevant metrics and statistics based on the this specification. To use DLM, we should
create a monitor and attach it to a table. When we do this DLM create2 profile metric table, this table contains summary
statistics such as min, max, percentage of null and zeros. It also contains additional metrics
based on the problem type defined by the user. For example, precision, recall an F1 underscore
score for the classification models, and mean underscore squared underscore error and mean
underscore average underscore error for regression models. Drift metric table. It contains statistic that measure how the
distribution of data has changed over time or relative to a baseline value if provided. It
compute measures such as chi-square test, ks test. To see the list of complete metrics for each table
check equals equals monitor metric table equals equals documentation page. It is also possible to create equals equals custom metrics.
Equals equals an important aspect of building a monitoring system is to make sure that our
monitoring dashboard has access to the latest inference data as they arrive. For that we can
use equals equals delta table streaming equals equals to keep track of processed rows in the inference table. We use the model servings inference table as our source table and the
monitoring table as the sync table. We also make sure the equals equals change data capture equals
equals CDC is enabled on both tables. It is enabled by default on the inference table.
This way we process only changes insert, update, delete,
in the source table rather than reprocessing the entire table every refresh. Hands-on to enable
the monitoring over our inference table we take the following steps. 1. Read the inference table
as a streaming table. 2. Create a new delta table with the right schema by unpacking the
inference table that is generated by our model serving endpoint. 3. Prepare the baseline table, if any. 4. Create a monitor over the resulting
table and refresh the metric. 5. Schedule a workflow to unpack the inference table to the
right structure and refresh the metrics. First we need to install the lakehouse monitoring API.
It should be already installed if you use databricks rum time 15.
3 lts and above let's read the inference table as a streaming table next we have to put the table in
right format as described above this table should have one row for each prediction with relevant the
features and prediction value the inference table that we get from the model serving endpoint store
the endpoint requests and responses as a nested JSON format.
Here's an example of the JSON payload
for the request and response column.
To unpack this table to the right schema,
we can use the following code
that IS adapted from Databricks documentation
equals equals inference table
lakehouse monitoring starter notebook.
Equals equals, the resulting table would look like this.
Next, we should initialize our sync table and write the results. Finally, we, the resulting table would look like this. Next we should initialize our sync
table and write the results. Finally, we create our baseline table. DLM uses this table to compute
the drift speed comparing the distribution of similar columns of baseline and primary models.
The baseline table should have the same feature column as the primary column as well as the same
model identification column. For baseline table we
use the prediction table of our validation dataset that we store earlier after we trained our model
using eBest hyperparameter. To compute the drift metric, Databricks compute the profile metrics for
both primary and the baseline table. Here you can read about the equals equals primary table and
baseline table. Equals equals, now we are read to create our monitoring dashboard.
We can do it either using the equals equals UI equals equals or the lakehouse monitoring API.
Here we use the second option. After we run the code it takes some time until Databricks
calculate all the metric. To see the dashboard go to the tab of your sync table, IE. You should
see a page as follow. If you click on the view you
see you're running, pending and past refreshes. Click on the to open your dashboard, so we start
with the inference table, process it and save the result to and past this table along with our
baseline table, to our monitoring API. The DLM compute the profile metrics for each table,
and use the them to compute the drift metrics, there you go. You have everything you need to serve and monitor your model. In the next part
I'll show you how to automate this process using Databricks Assets Bundle and GitLab.
Thank you for listening to this HackerNoon story, read by Artificial Intelligence.
Visit HackerNoon.com to read, write, learn and publish.