Research Results
The following results illustrate how our methodology effectively mitigated bias without sacrificing accuracy as a tradeoff in the process
Without Debiasing, the Student Learns to Stereotype

With no student-level debiasing, the student maintains a high average bias of .1226, which represents a 46% increase over the Teacher’s bias of 0.0840.
With an Adversary Attack, the Model Becomes Less Biased

With a lambda of 0.5, we achieve a mean absolute value disparity of 0.0748.
This represents a 39% reduction in bias over a student model with no debiasing, with only a 0.36% penalty to accuracy.
Knowledge Distillation Student Model with Adversarial Attack Results
Adversary Results - Accuracy and Disparity
​
The study observed that as the λ value increased from 1 to 20, accuracy dropped significantly across all models due to the higher weighting of adversary loss over crossentropy and knowledge distillation losses. However, this did not always lead to a proportional decrease in disparity. The minimum disparity was reached at different λ values for each model: RKD at 5, CTKD at 10, CKD at 15, and KD++ at 20. At the class level, the adversarial approach effectively reduced overall disparity at lower λ values, but it was not precise enough to address disparities in specific classes.
CTKD showed the weakest performance in reducing disparity, while KD++ was sensitive to higher λ values but achieved the lowest disparity at these levels. RKD was the most resilient to adversarial debiasing, reducing disparity at lower λ values with little impact on accuracy.
The study demonstrated that knowledge distillation models can debias effectively with increasing adversarial strength, showing a relationship between adversarial strength, accuracy, and bias. However, the sensitivity of adversarial debiasing to model architectures and hyperparameters necessitates careful tuning for optimal results.
​
3D Plot View