Configuring fairness evaluations for indirect bias
Under certain conditions, you can configure fairness evaluations to consider indirect bias as well as direct bias for a model deployment.
Indirect bias occurs when one feature in a data set can be used to stand in for another. For example, in a data set where race is not a known feature, a feature such as postal-code can often track closely to race. Evaluating the postal-code feature for bias is a way of detecting indirect bias. In another example, customer purchasing history might correspond closely with sex. Thus, even a predictive model that does not contain any of the typical protected attributes such as race, age, or sex might indicate biased results.
Indirect bias is analyzed when the following conditions are met:
- To find correlations, the data set must be sufficiently large (more than 4000 records).
- The training data must include the meta fields. You must train the model on a subset of data fields. These additional fields, the meta fields, are for determining indirect bias. (Include the meta fields, but don't use them in model training.)
- Payload logging must contain meta fields and be run before the fairness monitor is configured. You must use this method to upload the meta fields. Payload logging for indirect bias requires two types of input: 1) training features with values and 2) meta fields with values.
- When you configure the fairness monitor, select the additional fields to monitor.
Typical workflow for indirect bias
However, you can determine indirect bias for preproduction and production models the models require different columns. The test data that is used to evaluate preproduction models and the feedback data that is used to evaluate either preproduction or production models differ on the use of meta columns. Meta columns are required for the test data for preproduction and cannot be included in the feedback data that is used for preproduction or production models. A typical workflow, might include the following steps:
- Create training data that contains both feature columns and meta columns. The meta columns contain data that is not used to train the model.
- Configure the fairness monitor with the meta columns.
- During preproduction, upload test data that contains both the feature columns and the meta columns. This test data must be uploaded by using the Import test data CSV option.
- During pre-production, you might interate on different versions of the model while using the indirect bias measures to ensure that your final model is free of bias.
- After you send the model to production, the feedback data should not have any of the meta columns, only the feature columns that were used to train the model.
Sample JSON payload file with meta fields
The following sample file shows a JSON payload with the fields and values that are used to train the model. The meta fields and values that are used for the indirect bias analysis are also included. The meta fields are not used to train the model, instead they are reserved for a different kind of analysis that attempts to correlate them to bias in the model. Although the meta fields can be any type of data, they are usually protected attributes, such as sex, race, or age.
[request_data = {
"fields": ["AGE", "SEX", "BP", "CHOLESTEROL", "NA", "K"],
"values": [[28, "F", "LOW", "HIGH", 0.61, 0.026]]
}
response_data = {
"fields": ["AGE", "SEX", "BP", "CHOLESTEROL", "NA", "K", "probability", "prediction", "DRUG"],
"values": [[28, "F", "LOW", "HIGH", 0.61, 0.026, [0.82, 0.07, 0.0, 0.05, 0.03], 0.0, "drugY"]]
}
request_data = <put your data here>
response_data = <put your data here>
records = [PayloadRecord(request=request_data, response=response_data, response_time=18),
PayloadRecord(request=request_data, response=response_data, response_time=12)]
subscription.payload_logging.store(records=records)
Meta values must be in the format of an array of arrays:
"meta": {
"fields": ["age", "race", "sex"],
"values": [
[32, "Black", "Male"]
]
}
Learn more
- Debiasing options
- For a notebook for indirect bias, Indirect Bias detection
Parent topic: Configuring model evaluations