Software being more and more used to get metrics and insights for critical areas of our societies such as our healthcare system, crime recidivism risk assessment, job application review or loan approval, the question of algorithms fairness is becoming more important than ever. As algorithms learn from human-generated data, they often magnify human bias in decision making, making them prone to judging something in an unfair way. For example, the Amazon CV review program was found to be unfair to women. Because the program learned from already reviewed resumes (with unbalanced genders), it learned to dislike resume of women.

In US courts, the project COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) is a tool giving a crime recidivism risk score. The score is considered by judges when giving their verdicts. ProPublica analyzed more than 10000 records of people arrested in Florida Broward’s County and discovered that the algorithm was racist. Black people were more misclassified as high risk of recidivism than white people. The algorithm’s error rate was not the same for all races. The company behind COMPAS replied that the scores were equally accurate for black and white people. But can algorithms be neutral with regard to a characteristic (the race in that case) for their predictive power and error rate at the same time?

There has been quite a few attempts at quantifying fairness. Turning such a vague concept into a number is powerful because it allows one to compare models according to their respective fairness and can therefore be used during model optimization and model selection. How do we formalize the concept of fairness and build it into our software? To understand fairness, let’s first introduce some vocabulary:

  • Positive class: In a binary classification (that is, a classification with two possible outcomes), this represents a positive output. For example, that corresponds to the classes Credit approved, message is a spam, job applicant is noteworthy,

  • Negative class: You guessed it, right?

  • True positive (TP) When the predicted class is negative and the actual class is positive

  • True negative (TN) When the predicted class is negative and the actual class is negative

  • False positive (FP) When the predicted class is positive and the actual class is negative

  • False negative (FN) When the predicted class is negative and the actual class is positive

Wikipedia has a neat table summarizing the metrics you can draw from those definitions. Also, let’s define the protected attribute. This is sensitive attribute we wish to protect. For example: ethnicity, gender, age, etc. Now let’s go through a few criteria for algorithm fairness. Depending on the fairness criteria, an algorithm may be considered fair or not. Also, some are easier to satisfy than others.

Fairness criteria #

Unawareness (link#

One may state that an algorithm is fair if it doesn’t include the protected attribute in its input data. While convenient, this approach assumes that the predictors are independent, but this is almost never true in practice. For example, if the protected attribute “ethnicity” is excluded from the training data but they still include attributes such as “place of birth”, “surname” or “mother tongue”, then the algorithm can still contain information about the ethnicity, making it possibly unfair with regard to this protected attribute. Therefore this criteria must be used carefully. Don’t worry though, counterfactual fairness comes to the rescue!

Statistical parity (link#

The algorithm can be considered fair if all possible values of the protected attribute have the same probability of being predicted the positive class. This translates to computing $TP + FP$ for each values of a protected attribute.

Predictive parity (link#

The algorithm is fair if it has the same precision for all values of the protected attribute, which means the same ratio between the true positives and all predicted positives. This is called the Positive Predicted Value (PPV):

$$PPV = \frac {TP} {TP + FP}$$

Conditional use accuracy equality (link#

This condition requires the same Positive Predictive Value (PPV) and Negative Predictive Value (NPV). It’s an extension of the Predictive parity which also requires that the negative predicted values that are actually negative (TN) are the same for all possible values of the protected attribute.

$$NPV = \frac {TN} {TN + FN}$$

Equalized odds (link#

The algorithm is fair if it has the same False Positive and True Positive rates (FPR and TPR), meaning that people with an actual positive value (TP) and people with an actual negative value (TN) should have the same classification performance, regardless of the protected attribute.

$$FPR = \frac {FP} {FP + TN}$$ $$TPR = \frac {TP} {TP + FN}$$

Treatment equality (link#

An algorithm is fair is the ratio $FN \over FP$ is the same for all values of the protected attribute.

Overall accuracy #

The algorithm may be considered fair if it has the same overall accuracy for all possible values of the protected attribute.

$$accuracy = \frac {correct;predictions} {all;predictions} = \frac {TP + TN } {TP + TN + FP + FN}$$

Next steps #

Those fairness definitions are compelling because they can be built into the objective the algorithm is trying to achieve. For example, for a Deep Learning model, those metrics may be incorporated into the loss function. The Deep Learning model can try to minimize a loss function and a fairness criteria at the same time, thereby rewarding more unfair cases than the fair ones.

Formally, this can be represented as:

$$loss = loss + \lambda{{\sum_{i=0}^{k}w_i f_i(y_{pred}, y_{true})} \over \min\limits_{ \forall i\in [0,k[} f_i(y_{pred}, y_{true})}$$

With $f$ a fairness function, $w$ the weights of each value of the protected attribute and $k$ the number of values of a protected attribute.

Fairness can also be used during the evaluation on a test set, after the training. If used that way, one might look for models which give the most consistent results when computing the fairness criteria over all possible values of the protected attributes.

Other fairness criteria #

Instead of considering fairness as an aggregated metric, one can consider it at the individual level. This makes the criteria harder to use during the model training, but can give another perspective.

Individual fairness (link#

One may also consider an algorithm as fair if it treats similarly individuals that are similar according to a domain-specific similarity measure. A disadvantage of this method is that it requires a deep knowledge of the data used.

Counterfactual fairness (link#

Counterfactual fairness states that flipping the protected attribute (e.g changing “male” to “female”) as well as its correlated attributes must not impact the predicted class. In other words, the predicted outcome must not depend on the protected attribute and any of its correlated attribute.