0 / 0
Configuring fairness evaluations for indirect bias in Watson OpenScale

Configuring fairness evaluations for indirect bias in Watson OpenScale

Under certain conditions, you can configure Watson OpenScale fairness evaluations to consider indirect bias as well as direct bias for a model deployment.

Indirect bias occurs when one feature in a data set can be used to stand in for another. For example, in a data set where race is not a known feature, a feature such as postal-code can often track closely to race. Evaluating the postal-code feature for bias is a way of detecting indirect bias. In another example, customer purchasing history might correspond closely with sex. Thus, even a predictive model that does not contain any of the typical protected attributes such as race, age, or sex might indicate biased results.

Watson OpenScale analyzes indirect bias when the following conditions are met:

  • To find correlations, the data set must be sufficiently large (more than 4000 records).
  • The training data must include the meta fields. You must train the model on a subset of data fields. These additional fields, the meta fields, are for determining indirect bias. (Include the meta fields, but don't use them in model training.)
  • Payload logging must contain meta fields and be run before the fairness monitor is configured. You must use this method to upload the meta fields to the Watson OpenScale service. Payload logging for indirect bias requires two types of input: 1) training features with values and 2) meta fields with values.
  • When you configure the fairness monitor, select the additional fields to monitor.

Typical workflow for indirect bias

However, you can determine indirect bias for preproduction and production models the models require different columns. The test data that is used to evaluate preproduction models and the feedback data that is used to evaluate either preproduction or production models differ on the use of meta columns. Meta columns are required for the test data for preproduction and cannot be included in the feedback data that is used for preproduction or production models. A typical workflow, might include the following steps:

  1. Create training data that contains both feature columns and meta columns. The meta columns contain data that is not used to train the model.
  2. In Watson OpenScale, configure the fairness monitor with the meta columns.
  3. During preproduction, upload test data that contains both the feature columns and the meta columns. This test data must be uploaded by using the Import test data CSV option.
  4. During pre-production, you might interate on different versions of the model while using the indirect bias measures to ensure that your final model is free of bias.
  5. After you send the model to production, the feedback data should not have any of the meta columns, only the feature columns that were used to train the model.

Sample JSON payload file with meta fields

The following sample file shows a JSON payload with the fields and values that are used to train the model. The meta fields and values that are used for the indirect bias analysis are also included. The meta fields are not used to train the model, instead they are reserved for a different kind of analysis that attempts to correlate them to bias in the model. Although the meta fields can be any type of data, they are usually protected attributes, such as sex, race, or age.

[request_data = {
    "fields": ["AGE", "SEX", "BP", "CHOLESTEROL", "NA", "K"],
    "values": [[28, "F", "LOW", "HIGH", 0.61, 0.026]]
  }

response_data = {
    "fields": ["AGE", "SEX", "BP", "CHOLESTEROL", "NA", "K", "probability", "prediction", "DRUG"],
    "values": [[28, "F", "LOW", "HIGH", 0.61, 0.026, [0.82, 0.07, 0.0, 0.05, 0.03], 0.0, "drugY"]]
  }

request_data = <put your data here>
response_data = <put your data here>

records = [PayloadRecord(request=request_data, response=response_data, response_time=18), 
                PayloadRecord(request=request_data, response=response_data, response_time=12)]

subscription.payload_logging.store(records=records)

Meta values must be in the format of an array of arrays:

"meta": {
"fields": ["age", "race", "sex"],
"values": [
[32, "Black", "Male"]
]
}

Configuring the Watson OpenScale service for indirect bias

When you set up the fairness monitor, select the fields to monitor. Include both training features and fields that are excluded from model training. If you select a field that is excluded from model training, Watson OpenScale finds correlations between values in that field and values in the training features. The correlated features are used as proxies for the fields that were excluded from model training.

Indirect bias displays

Some fields are training features. Others fields that are not training features are identified as meta fields. For the selected meta fields, Watson OpenScale checks for indirect bias.

Learn more

Parent topic: Configuring model evaluations

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more