Metrics

Most knowledge distillation research primarily focuses on one metric: accuracy. Therefore, we have selected four leading knowledge distillation approaches from the previous research and evaluated them using various metrics, including accuracy, as well as F1 score, precision, and recall. Additionally, we are introducing a new metric for bias reduction through adversarial training.

Base Metrics

The majority of existing studies in knowledge distillation prioritize accuracy as the primary measure for evaluating performance. To offer a broader perspective on model performance, this research includes a variety of additional metrics in its assessment. The base metrics we evaluated encompass top-1 accuracy, precision, recall, and the f1 score.

Disparity

Consider a knowledge distillation framework which maximizes the accuracy of the student model while minimizing bias as defined by disparity. The problem can be framed as follows:

Let the disparity metric, D, be defined as the absolute value of difference in recall, a proxy for bias.

In this study, the authors investigate debiasing models, f , using gender as a protected characteristic, li. Recall signifies the model’s performance in predicting a given class, C, for each gender. The set of images in the class with a specific
attribute (gender) is denoted as .

Example of the Current Problem Without Disparity Measurements

Consider two images from a hypothetical dataset of a person in medical clothing, one male l1 and one female l2. The image class, C is surgeon for both the male and the female, but female medical personnel in the dataset are more frequently labeled as nurses. The teacher model may be prone to predict that females dressed in medical clothing are nurses. The student model is apt to learn this behavior from the teacher, and furthermore, to amplify this bias. KD with adversarial debiasing penalizes this behavior while maximizing accuracy

Debiasing The Network Using Adversarial Feature Attacks

To Reduce Bias, We Incorporate An Adversarial Attack

The traditional Knowledge Distillation (KD) framework, which consists of a teacher-student model, is augmented with an adversary model to effectively minimize bias. The student model is primarily focused on the task of image classification prediction. In parallel, the adversary model is designed to detect biases by predicting sensitive attributes from the student model’s outputs, with a particular emphasis on gender in this case. The adversary model aims to refine the student model to the extent to which the adversary can no longer accurately predict these sensitive attributes, thereby reducing bias.

To regulate the influence of this adversarial method, a hyperparameter known as lambda (λ) is introduced. λ acts as a debiasing weight, determining the strength of the adversarial influence on the student model. During the training process, both the student and adversary models are simultaneously developed. The student model is refined to minimize the divergence and cross-entropy (CE) loss. Concurrently, the adversarial loss is maximized, thereby minimizing the adversaries ability to predict the sensitive attributes of the student model output. As a result, this methodology not only optimizes the student model for accuracy but also ensures fairness, leading to a more balanced and unbiased machine learning model.

Source