Configuring model evaluations with automatic setup

Last updated: Oct 25, 2024

The automatic setup option for machine learning model evaluations sets up a machine learning environment, a database, and a sample model for you. Follow the steps in the guided tour to learn how to evaluate the sample model. After the setup is complete, you can add your own model to the dashboard.

Sample model

The automatic setup uses the sample data set German Credit Risk to demonstrate key features of model evaluations.

Overview of the sample data

The German Credit Risk sample data provides a collection of records for bank customers who were used to train the sample model. It contains 20 attributes for each loan applicant. The sample models provisioned as part of the automatic setup are trained to predict level of credit risk for new customers. Two of the attributes considered for the prediction - sex and age - can be tested for bias to make sure that outcomes are consistent regarding gender or age of customers.

To evaluate the outcomes, results are divided into groups. The Reference groups are the groups that are considered most likely to have positive outcomes. In this case, the Reference groups are male customers and customers over the age of 25. The Monitored groups are the groups that you want to review to ensure that the results do not differ greatly from the results for the monitored groups. In this case, the Monitored groups are females and customers aged 19 - 25.

Running the automatic setup

Follow these steps run the automatic setup:

Launch Watson OpenScale.
Choose the Auto setup option.

The process takes about 10 minutes to complete. Three deployments are configured during the setup:

Model	Binding	Description
GermanCreditRiskModelPreProd	Pre-production, approved	This deployment represents the current approved model that is being evaluated in the internal environment.
GermanCreditRiskModelChallenger	Pre-production	The challenger model is deployed to compare performance and other attributes against the approved pre-production model deployment.
GermanCreditRiskModel	Production	Between the approved pre-production model and the challenger model, the model that delivers more favorable results is selected for production and deployed from the production space.

After the setup is complete, follow the guided tour to learn the features of model evaluations.

Guided tour highlights

The guided tour demonstrates these features:

Introduction to the user interface (UI): The four main areas of the UI include Insights, Explanations, Configuration, and Support.
Monitoring and viewing results for the German credit risk model: Use predefined monitors to evaluate your model for fairness, quality, and drift. You can also use custom monitors for model evaluation.
Exploring Fairness monitor: Use the Fairness monitor to looks for biased outcomes from your model. If a fairness issue is found, an alert is triggered based on configurable thresholds.
Exploring data sets: Toggle between balanced, payload, training, and debiased data sets to see how they affect the fairness score of your model.
Introduction to transactions: Review transactions from the payload data set for group bias and individual bias.
Explaining model outcomes: Understand the features that led to the model prediction to build trust in the model. Additionally, learn how to change feature values to receive more favorable model outcomes.
Exploring Drift monitor: Use the Drift monitor to determine if the processing of data in the model is causing a drop in accuracy.
Reviewing transactions: Review the transactions list to investigate the drop in accuracy.