Creating custom evaluations and metrics

Last updated: Nov 21, 2024

To create custom evaluations, select a set of custom metrics to quantitatively track your model deployment and business application. You can define these custom metrics and use them alongside metrics that are generated by other types of evaluations.

You can use one of the following methods to manage custom evaluations and metrics:

Managing custom metrics with the Python SDK
Managing custom metrics with watsonx.governance

Managing custom metrics with the Python SDK

To manage custom metrics with the Python SDK, you must perform the following tasks:

Register custom monitor with metrics definition.
Enable custom monitor.
Store metric values.

The following advanced tutorial shows how to do this:

Working with IBM watsonx.ai Runtime

You can disable and enable again custom monitoring at any time. You can remove custom monitor if you do not need it anymore.

For more information, see the Python SDK documentation.

Step 1: Register custom monitor with metrics definition.

Before you can start by using custom metrics, you must register the custom monitor, which is the processor that tracks the metrics. You also must define the metrics themselves.

Use the get_definition(monitor_name) method to import the Metric and Tag objects.
Use the metrics method to define the metrics, which require name, thresholds, and type values.
Use the tags method to define metadata.

The following code is from the working sample notebook that was previously mentioned:

def get_definition(monitor_name):
    monitor_definitions = wos_client.monitor_definitions.list().result.monitor_definitions
   
    for definition in monitor_definitions:
        if monitor_name == definition.entity.name:
            return definition
   
    return None
 
 
monitor_name = 'my model performance'
metrics = [MonitorMetricRequest(name='sensitivity',
                                thresholds=[MetricThreshold(type=MetricThresholdTypes.LOWER_LIMIT, default=0.8)]),
          MonitorMetricRequest(name='specificity',
                                thresholds=[MetricThreshold(type=MetricThresholdTypes.LOWER_LIMIT, default=0.75)])]
tags = [MonitorTagRequest(name='region', description='customer geographical region')]
 
existing_definition = get_definition(monitor_name)
 
if existing_definition is None:
    custom_monitor_details = wos_client.monitor_definitions.add(name=monitor_name, metrics=metrics, tags=tags, background_mode=False).result
else:
    custom_monitor_details = existing_definition

To check how you're doing, run the client.data_mart.monitors.list() command to see whether your newly created monitor and metrics are configured properly.

You can also get the monitor ID by running the following command:

custom_monitor_id = custom_monitor_details.metadata.id
 
print(custom_monitor_id)

For a more detailed look, run the following command:

custom_monitor_details = wos_client.monitor_definitions.get(monitor_definition_id=custom_monitor_id).result
print('Monitor definition details:', custom_monitor_details)

Step 2: Enable custom monitor.

Next, you must enable the custom monitor for subscription. This activates the monitor and sets the thresholds.

Use the target method to import the Threshold object.
Use the thresholds method to set the metric lower_limit value. Supply the metric_id value as one of the parameters. If you don't remember, you can always use the custom_monitor_details command to get the details as shown in the previous example.

The following code is from the working sample notebook that was previously mentioned:

target = Target(
        target_type=TargetTypes.SUBSCRIPTION,
        target_id=subscription_id
    )
 
thresholds = [MetricThresholdOverride(metric_id='sensitivity', type = MetricThresholdTypes.LOWER_LIMIT, value=0.9)]
 
custom_monitor_instance_details = wos_client.monitor_instances.create(
            data_mart_id=data_mart_id,
            background_mode=False,
            monitor_definition_id=custom_monitor_id,
            target=target
).result

To check on your configuration details, use the subscription.monitoring.get_details(monitor_uid=monitor_uid) command.

Step 3: Store metric values.

You must store, or save, your custom metrics to the region where your service instance exists.

Use the metrics method to set which metrics you are storing.
Use the subscription.monitoring.store_metrics method to commit the metrics.

The following code is from the working sample notebook that was previously mentioned:

from datetime import datetime, timezone, timedelta
from ibm_watson_openscale.base_classes.watson_open_scale_v2 import MonitorMeasurementRequest
custom_monitoring_run_id = "11122223333111abc"
measurement_request = [MonitorMeasurementRequest(timestamp=datetime.now(timezone.utc),
                                                 metrics=[{"specificity": 0.78, "sensitivity": 0.67, "region": "us-south"}], run_id=custom_monitoring_run_id)]
print(measurement_request[0])
 
published_measurement_response = wos_client.monitor_instances.measurements.add(
    monitor_instance_id=custom_monitor_instance_id,
    monitor_measurement_request=measurement_request).result
published_measurement_id = published_measurement_response[0]["measurement_id"]
print(published_measurement_response)

To list all custom monitors, run the following command:

published_measurement = wos_client.monitor_instances.measurements.get(monitor_instance_id=custom_monitor_instance_id, measurement_id=published_measurement_id).result
print(published_measurement)

Managing custom metrics with watsonx.governance

Step 1: Add metric groups

On the Configure tab, click Add metric group.
If you want to configure a metric group manually, click Configure new group.
a. Specify a name and a description for the metric group.
The length of the name that you specify must be less than or equal to 48 characters.
b. Click the Edit icon on the Input parameters tile and the specify the details for your input parameters.
The parameter name that you specify must match the parameter name that is specified in the metric API.
c. If the parameter is required to configure your custom monitor, select the Required parameter checkbox.
d. Click Add.
After you add the input parameters, click Next.
e. Select the model types that your evaluation supports and click Next.
f. If you don't want to specify an evaluation schedule, click Save.
g. If you want to specify an evaluation schedule, click the toggle.
You must specify the interval for the evaluation schedule and click Save. h. Click Add metric and specify the metric details.
Click Save.
If you want to configure a metric group by using a JSON file, click Import from file.
Upload a JSON file and click Import.

Step 2: Add metric endpoints

In the Metric endpoints section, click Add metric endpoint.
Specify a name and a description for the metric endpoint.
Click the Edit icon on the Connection tile and specify the connection details.
Click Next.
Select the metric groups that you want associate with the metric endpoint and click Save.

Step 3: Configure custom monitors

On the Insights Dashboard page, select Configure monitors on a model deployment tile.
In the Evaluations section, select the name of the metric group that you added.
Select the Edit icon on the Metric endpoint tile.
Select a metric endpoint and click Next.
If you don't want to use a metric endpoint, select None.
Use the toggles to specify the metrics that you want to use to evaluate the model and provide threshold values.
Click Next.
Specify values for the input parameters. If you selected JSON as the data type for the metric group, add the JSON data.
Click Next.

You can now evaluate models with a custom monitor.

Accessing and visualizing custom metrics

To access and visualize custom metrics, you can use programmatic interface. The following advanced tutorial shows how to do this:

Working with IBM watsonx.ai Runtime

For more information, see the Python SDK documentation.

Visualization of your custom metrics appears on the Insights Dashboard.

Learn more

Reviewing evaluation results

Parent topic: Configuring model evaluations