The Good Tech Companies - Let's Build an MLOps Pipeline With Databricks and Spark - Part 2

Episode Date: December 29, 2024

This story was originally published on HackerNoon at: https://hackernoon.com/lets-build-an-mlops-pipeline-with-databricks-and-spark-part-2. Deploy the model for Batch In...ference and model serving using Databricks Unity Catalog Check more stories related to machine-learning at: https://hackernoon.com/c/machine-learning. You can also check exclusive content about #mlops, #databricks, #pyspark, #model-monitoring, #feature-store, #databricks-asset-bundles, #good-company, #hackernoon-top-story, and more. This story was written by: @neshom. Learn more about this writer by checking @neshom's about page, and for more stories, please visit hackernoon.com. In the second part of this blog we see how Databricks enable us for batch deployment and online serving. We spend some time on how to setup data and model monitoring dashboards.

Transcript
Discussion (0)
Starting point is 00:00:00 This audio is presented by Hacker Noon, where anyone can learn anything about any technology. Let's build an MLOPs pipeline with Databricks and Spark, Part 2. By Mohsin Jadidi, in the equals equals first part of this tutorial equals equals series, we took the first steps for building an end-to-end MLOPs pipeline using Databricks and Spark, guided by Databricks reference architecture. Here's a recap of the key steps we covered setting up the Unity Catalog for Medallion architecture. We organized our data into bronze, silver, and gold layers within the Unity Catalog, establishing a structured and efficient
Starting point is 00:00:35 data management system. Ingesting data into Unity Catalog. We demonstrated how to import raw data into the system, ensuring consistency and quality for subsequent processing stages. Training the model. Utilizing Databricks, we trained a machine learning model tailored to our dataset, following best practices for scalable and effective model development. Hyperparameter tuning with Hyperopt. To enhance model performance, we employed Hyperopt to automate the search for optimal hyperparameters, improving accuracy and efficiency. Experiment tracking with Databricks MLflow. We utilized MLflow to log and monitor our experiments, maintaining a comprehensive record of model versions, metrics, and parameters for easy comparison and reproducibility. Backslash dot. With these foundational steps completed, your model is now primed for deployment.
Starting point is 00:01:26 In this second part, we'll focus on integrating two critical components into our system. 1. Batch Inference. Implementing batch processing to generate predictions on large datasets, suitable for applications like bulk scoring and periodic reporting. 2. Online Inference. Model serving, setting up real-time model serving to provide immediate predictions, essential for interactive applications and services. 3. Model monitoring, to ensure your deployed models maintain optimal performance and reliability over time. Let's get into it. Model deployment. The departure point of the last blog was model evaluation. Now imagine we did the comparison and found that our model shows a higher performance
Starting point is 00:02:07 compared to this production model. As we want to use the model in production, we want to take advantage of all the data that we have. The next step is to train and test the model using the full dataset. Then persist our model for later use be deploying it as our champion model. Since this is the final model that we want to use for inference, we use the feature engineering client to train the model. This way we are not only track the model lineage easier, but also offload the schema validation
Starting point is 00:02:35 and feature transformation, if any, to the client. We can also use the equals equals feature store or feature engineering API as equals equals to train and log the models when we use the feature engineering api we can view the models lineage in catalog explorer now let's update the model description and assign a champion label to it now go ahead and check the schema that you registered the model you should see all your updates as follows tip model stages if you use workspace for model registry you should stage so manage your models using aliases won't work check out equals equals here equals equals to see how it works model inference batch scoring now imagine we want to use our model in production for inference in this stepway load the champion model and use it to generate 20 movie recommendations for each users and you can see we used the same training data
Starting point is 00:03:25 for batch scoring. Although in the case of recommender systems it makes sense, in most application we want use the model to score some unseen data. For example, imaging you're on Netflix and want to update the user recommendations at the end of day based on their new watch list. We can schedule job that run the batch scoring at specific time at the end of the day. Now we can go ahead and generate the recommendations for each user. For this we find the top 20 items per users here is how the result look like finally we can store the prediction as a delta label on our UC or publish them to a downstream systems MongoDB or Azure Cosmos DB. We go with the FERS option streaming streaming, online inference. Now imagine a case
Starting point is 00:04:06 in which we want to update our recommendations based on real-time user interactions. For this case we can use model serving. When someone wants to use your model, they can send data to the server. The server then feeds that data to your deployed model, which goes into action, analyzes the data, and generates a prediction. They can be used in web applications, mobile apps, or even embedded systems. One of the application of this approach is to enable traffic routing for A-B testing. ALS algorithm can't be used directly for online inference since it requires their training the model using the whole data, old plus new, to update the recommendations. Gradient descent learning
Starting point is 00:04:45 algorithms are examples of model that can be used for online updates. We might look at some of these algorithms in future post. However, just to illustrate how such a model would work, we are creating a useless model serving endpoint that predict movie rating based whenever you say rate of movies. This will create an lunch model serving cluster for us so it takes some time now if you open the window you should see your endpoint tip we can use one endpoint to serve multiple model then we can use traffic routing for scenarios such as a b testing or compare the performance of difference models in the production inference table inference tables in databricks model serving act as an automatic log for our deployed models. When enabled, they capture incoming requests, data sent for prediction, the corresponding
Starting point is 00:05:30 model outputs, predictions, and some other metadata as a delta table within Unity catalog. We can use inference table for monitoring and debugging, lineage tracking, and a data collection procedure for training or fine-tune our models. We can enable the on-r serving endpoint to monitor the model. We can do it by specifying the properties in the payload when we first create the endpoint. Or we update our endpoint afterwards using the command and the endpoint URL as follows, more equals equals here, equals equals. Now let's feed the endpoint with some dummy user interaction data. We can check the endpoint logs in the table.
Starting point is 00:06:06 It takes around 10 minutes until you can see the data in the table. You should see something like this your payload table to understand the schema of this inference table. Check unity catalog inference table schema equals equals here. Equals equals model monitoring. Model and data monitoring a complex topic that requires a lot of time to master. Databricks Lakehouse Monitoring, DLM, reduces the overhead of building a proper monitoring system by providing standard and customizable templates for commonest cases.
Starting point is 00:06:36 However, mastering DLM and model monitoring in general requires a lot of experimentations. I don't want to give you an extensive overview of model monitoring here but rather give you a starting point. I might dedicate a blog taught this topic in future. A short summary of DLM functionalities and features now that we have our model up and running. We can use inference table generated be our serving endpoint to monitor key metrics such a model performance and drift to detect any deviations or anomalies in our data or model over time. This proactive approach help us to take timely corrective actions, such as retraining the model or updating its features, to maintain optimal performance and alignment with business objectives. DLM provides three type of analysis or time series, snapshot and inference. Since we are interested in analyzing our inference table, we focus on the
Starting point is 00:07:25 latter one. To use a table for monitoring, our primary table, we should make sure that the table have the right structure. For the equals equals inference table, equals equals each row should correspond to requests with following columns, model features, model prediction, model id, timestamp, timestamp of the inference request, ground truth, optional. The model ID is important for cases when we serve multiple models and we want to track the performance of each model in one monitoring dashboard. If there are more than one model ID available, DLM uses it to slice the data and compute metrics and statics for each slice separately. DLM computes each statistics and metrics for a
Starting point is 00:08:05 specified time interval. For inference analysis, it used the timestamp column plus a user-defined window size to identify the time windows. More below, DLM supports two for inference tables, classification or regression. It computes some of the relevant metrics and statistics based on the this specification. To use DLM, we should create a monitor and attach it to a table. When we do this DLM create to profile metric table, this table contains summary statistics such as min, max, percentage of null and zeros. It also contains additional metrics based on the problem type defined by the user. For example, precision, recall and F1 underscore score for the classification models, and mean underscore squared underscore error and mean underscore average underscore error for regression models. Drift metric table. It contains statistic that measure how the
Starting point is 00:08:59 distribution of data has changed over time or relative to a baseline value, if provided. It compute measures such as chi-square test, ks-test. To see the list of complete metrics for each table check equals equals monitor metric table equals equals documentation page. It is also possible to create equals equals custom metrics. Equals equals an important aspect of building a monitoring system is to make sure that our monitoring dashboard has access to the latest inference data as they arrive. For that we can use delta table streaming to keep track of processed rows in the inference table. We use the model servings inference table as our source table and the monitoring table as the sync table. We also make sure the change data change data capture equals equals CDC is enabled on both
Starting point is 00:09:47 tables. It is enabled by default on the inference table. This way we process only changes insert, update, delete in the source table rather than reprocessing the entire table every refresh. Hands-on to enable the monitoring over our inference table we take the following steps. 1. Read the inference table as a streaming table 2. create a new delta table with the right schema by unpacking the inference table that is generated by our model serving endpoint 3. prepare the baseline table if any 4. create a monitor over the resulting table and refresh the metric 5. schedule a workflow to unpack the inference table to the right structure and refresh the metrics.
Starting point is 00:10:27 First we need to install the Lakehouse Monitoring API. It should be already installed if you use Databricks RUM Time 15. 3 LTS and above let's read the inference table as a streaming table Next we have to put the table in right format as described above. This table should have one row for each prediction with relevant the features and prediction value. in write format as described above. This table should have one row for each prediction with relevant the features and prediction value. The inference table that we get from the model serving endpoint store the endpoint requests and responses as a nested JSON format. Here is an
Starting point is 00:10:55 example of the JSON payload for the request and response column. To unpack this table to the right schema we can use the following code that is adapted from databricks documentation equals equals inference table lakehouse monitoring starter notebook equals equals the resulting table would look like this next we should initialize our sync table and write the results finally we create our baseline table dlm uses this table to compute the drift speed comparing the distribution of similar columns of baseline and primary models the baseline table should compute the drift speed comparing the distribution of similar columns of baseline and primary models. The baseline table should have the same feature column as the primary column as well as the same model identification column. For baseline table we use the prediction table of our validation data set that we store earlier after we trained our model using eBest hyperparameter.
Starting point is 00:11:40 To compute the drift metric, Databricks compute the profile metrics for both primary and the baseline table. Here you can read about the equals equals primary table and baseline table. Equals equals, now we are read to create our monitoring dashboard. We can do it either using the equals equals UI equals equals or the lakehouse monitoring API. Here we use the second option. After we run the code it takes some time until Databricks calculate all the metric. To see the dashboard go to the tab of your sync table, i.e. you should see a page as follow. If you click on the view you see you're running, pending and past refreshes.
Starting point is 00:12:18 Click on the to open your dashboard. So we start with the inference table, process it and save the result to and past this table along with our baseline table to our monitoring API. The DLM compute the profile metrics for each table and use the them to compute the drift metrics, there you go. You have everything you need to serve and monitor your model, in the next part I'll show you how to automate this process using Databricks Assets Bundle and GitLab. In the equals equals first part of this tutorial equals equals series, we took the first steps for building an end-to-end MLOPs pipeline using Databricks and Spark, guided by Databricks reference architecture. Here's a recap of the key steps we covered setting up the Unity Catalog for Medallion Architecture. We organized our data
Starting point is 00:13:00 into bronze, silver, and gold layers within the Unity Catalog, establishing a structured and efficient data management system. Ingesting data into Unity Catalog. We demonstrated how to import raw data into the system, ensuring consistency and quality for subsequent processing stages. Training the model. Utilizing Databricks, we trained a machine learning model tailored to our dataset, following best practices for scalable and effective model development. Hyperparameter Tuning with Hyperopt To enhance model performance, we employed Hyperopt to automate the search for optimal hyperparameters, improving accuracy and efficiency. Experiment Tracking with Databricks MLflow We utilized MLflow to log and monitor our
Starting point is 00:13:44 experiments, maintaining a comprehensive record of model versions, metrics, and parameters for easy comparison and reproducibility. Backslash dot. With these foundational steps completed, your model is now primed for deployment. In this second part, we'll focus on integrating two critical components into our system 1. Batch inference. Implementing batch processing to generate predictions on large datasets, suitable for applications like bulk scoring and periodic reporting. 2. Online inference, model serving, setting up real-time model serving to provide immediate predictions, essential for interactive applications and services. 3. Model monitoring.
Starting point is 00:14:23 To ensure your deployed models maintain optimal performance and reliability over time. Let's get into it. Model deployment. The departure point of the last blog was model evaluation. Now imagine we did the comparison and found that our model shows a higher performance compared to this production model. As we want, assume, to use the model in production, we want to take advantage of all the data that we have. The next step is to train and test the model using the full dataset. Then persist our model for later use be deploying it as our champion model. Since this is the final model that we want to use for inference, we use the feature engineering
Starting point is 00:14:59 client to train the model. This way we are not only track the model lineage easier, but also offload the schema validation and feature transformation, if any, to the model. This way we are not only track the model lineage easier, but also offload the schema validation and feature transformation, if any, to the client. We can also use the equals equals feature store or feature engineering API as equals equals to train and log the models. When we use the feature engineering API we can view the model's lineage in catalog explorer. Now let's update the model description and assign a champion label to it. Now go ahead and check the schema that you registered the model you should see all your updates as follows tip model stages if you use workspace for model registry you should stage though manage your models using aliases won't work check out equals equals here equals equals to see
Starting point is 00:15:41 how it works model inference batch scoring now imagine we want to use our model in production for inference. In this step we load the champion model and use it to generate 20 movie recommendations for each users. And you can see we use the same training data for batch scoring. Although in the case of recommender systems it makes sense, in most application we want use the model to score some unseen data. For example, imaging you are on Netflix and want to update the user recommendations at the end of day based on their new watched list. We can schedule job that run the batch scoring at specific time at the end of the day. Now we can go ahead and generate the recommendations for each user. For this we find the top 20 items per users here is how the result look like finally we
Starting point is 00:16:25 can store the prediction as a delta label on our UC or publish them to a downstream systems MongoDB or Azure Cosmos DB. We go with the FERS option streaming online inference now imagine a case in which we want to update our recommendations based on real-time user interactions. For this case we can use model serving. When someone wants to use your model, they can send data to the server. The server then feeds that data to your deployed model, which goes into action, analyzes the data, and generates a prediction. They can be used in web applications, mobile apps, or even embedded systems. One of the application of this approach is to enable traffic routing for A-B testing.
Starting point is 00:17:09 ALS algorithm can't be used directly for online inference since it requires their training the model using the whole data, old plus new, to update the recommendations. Gradient descent learning algorithms are examples of model that can be used for online updates. We might look at some of these algorithms in future post. However, just to illustrate how such a model would work, we are creating a useless model serving endpoint that predict movie rating based whenever a user rate a movies. This will create an lunch model serving cluster for us so it takes some time. Now if you open the window you should see your endpoint. Tip we can use one endpoint to serve multiple model. Then we can use traffic routing for scenarios such as A, B testing or compare the performance of difference models in the production. we can use one endpoint to serve multiple model then we can use traffic routing for scenarios
Starting point is 00:17:45 such as a b testing or compare the performance of difference models in the production inference table inference tables in databricks model serving act as an automatic log for our deployed models when enabled they capture incoming requests data sent for prediction the corresponding model outputs predictions and some other metadata as a delta table within Unity Catalog. We can use inference table for monitoring and debugging, lineage tracking, and a data collection procedure for training or fine-tune our models. We can enable the on-r serving endpoint to monitor the model. We can do it by specifying the properties in the payload when we first create the endpoint.
Starting point is 00:18:23 Or we update our endpoint afterwards using the command and the payload when we first create the endpoint. Or we update our endpoint afterwards using the command and the endpoint URL as follows, more equals equals here, equals equals, now let's feed the endpoint with some dummy user interaction data we can check the endpoint logs in the table. It takes around 10 minutes until you can see the data in the table. You should see something like this your payload table to understand the schema of this inference table. Check unity catalog inference table schema equals equals here. Equals equals model monitoring. Model and data monitoring a complex topic that requires a lot of time to master. Databricks lakehouse monitoring, DLM, reduces the overhead of building a proper monitoring system by providing standard and customizable templates for commonest cases. However, mastering DLM and model monitoring
Starting point is 00:19:10 in general requires a lot of experimentations. I don't want to give you an extensive overview of model monitoring here but rather give you a starting point. I might dedicate a blog taught this topic in future. A short summary of DLM functionalities and features now that we have our model up and running. We can use inference table generated be our serving endpoint to monitor key metrics such a model performance and drift to detect any deviations or anomalies in our data or model over time. This proactive approach help us to take timely corrective actions such as retraining the model or updating its features, to maintain optimal performance and alignment with business objectives. DLM provides three type of analysis are, time series, snapshot and inference.
Starting point is 00:19:52 Since we are interested in analyzing our inference table, we focus on the latter one. To use a table for monitoring, our primary table, we should make sure that the table have the right structure. For the equals equals inference table, equals equals each row should correspond to requests with following columns, model features, model prediction, model id, timestamp, timestamp of the inference request, ground truth, optional. The model id is important for cases when we serve multiple models and we want to track the performance of each model in one monitoring dashboard. If there are more than one model it available, DLM uses it to slice the data and compute metrics and statics for each slice separately. DLM computes each statistics and metrics for a specified time interval. For inference analysis, it used the timestamp column
Starting point is 00:20:41 plus a user-defined window size to identify the time windows. More below, DLM supports two for inference tables, classification, or regression. It computes some of the relevant metrics and statistics based on the this specification. To use DLM, we should create a monitor and attach it to a table. When we do this DLM create2 profile metric table, this table contains summary statistics such as min, max, percentage of null and zeros. It also contains additional metrics based on the problem type defined by the user. For example, precision, recall an F1 underscore score for the classification models, and mean underscore squared underscore error and mean underscore average underscore error for regression models. Drift metric table. It contains statistic that measure how the
Starting point is 00:21:30 distribution of data has changed over time or relative to a baseline value if provided. It compute measures such as chi-square test, ks test. To see the list of complete metrics for each table check equals equals monitor metric table equals equals documentation page. It is also possible to create equals equals custom metrics. Equals equals an important aspect of building a monitoring system is to make sure that our monitoring dashboard has access to the latest inference data as they arrive. For that we can use equals equals delta table streaming equals equals to keep track of processed rows in the inference table. We use the model servings inference table as our source table and the monitoring table as the sync table. We also make sure the equals equals change data capture equals equals CDC is enabled on both tables. It is enabled by default on the inference table.
Starting point is 00:22:22 This way we process only changes insert, update, delete, in the source table rather than reprocessing the entire table every refresh. Hands-on to enable the monitoring over our inference table we take the following steps. 1. Read the inference table as a streaming table. 2. Create a new delta table with the right schema by unpacking the inference table that is generated by our model serving endpoint. 3. Prepare the baseline table, if any. 4. Create a monitor over the resulting table and refresh the metric. 5. Schedule a workflow to unpack the inference table to the right structure and refresh the metrics. First we need to install the lakehouse monitoring API. It should be already installed if you use databricks rum time 15.
Starting point is 00:23:06 3 lts and above let's read the inference table as a streaming table next we have to put the table in right format as described above this table should have one row for each prediction with relevant the features and prediction value the inference table that we get from the model serving endpoint store the endpoint requests and responses as a nested JSON format. Here's an example of the JSON payload for the request and response column. To unpack this table to the right schema, we can use the following code
Starting point is 00:23:34 that IS adapted from Databricks documentation equals equals inference table lakehouse monitoring starter notebook. Equals equals, the resulting table would look like this. Next, we should initialize our sync table and write the results. Finally, we, the resulting table would look like this. Next we should initialize our sync table and write the results. Finally, we create our baseline table. DLM uses this table to compute the drift speed comparing the distribution of similar columns of baseline and primary models. The baseline table should have the same feature column as the primary column as well as the same
Starting point is 00:24:01 model identification column. For baseline table we use the prediction table of our validation dataset that we store earlier after we trained our model using eBest hyperparameter. To compute the drift metric, Databricks compute the profile metrics for both primary and the baseline table. Here you can read about the equals equals primary table and baseline table. Equals equals, now we are read to create our monitoring dashboard. We can do it either using the equals equals UI equals equals or the lakehouse monitoring API. Here we use the second option. After we run the code it takes some time until Databricks calculate all the metric. To see the dashboard go to the tab of your sync table, IE. You should
Starting point is 00:24:43 see a page as follow. If you click on the view you see you're running, pending and past refreshes. Click on the to open your dashboard, so we start with the inference table, process it and save the result to and past this table along with our baseline table, to our monitoring API. The DLM compute the profile metrics for each table, and use the them to compute the drift metrics, there you go. You have everything you need to serve and monitor your model. In the next part I'll show you how to automate this process using Databricks Assets Bundle and GitLab. Thank you for listening to this HackerNoon story, read by Artificial Intelligence. Visit HackerNoon.com to read, write, learn and publish.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.