top of page

Research Results

The following results illustrate how our methodology effectively mitigated bias without sacrificing accuracy as a tradeoff in the process

Without Debiasing, the Student Learns to Stereotype

biased_restuls.png

With no student-level debiasing, the student maintains a high average bias of .1226, which represents a 46% increase over the Teacher’s bias of 0.0840.

With an Adversary Attack, the Model Becomes Less Biased

debiased.png

With a lambda of 0.5, we achieve a mean absolute value disparity of 0.0748. 

 

This represents a 39% reduction in bias over a student model with no debiasing, with only a 0.36% penalty to accuracy.

Knowledge Distillation Student Model with Adversarial Attack Results

Adversary Results - Accuracy and Disparity

​

The study observed that as the λ value increased from 1 to 20, accuracy dropped significantly across all models due to the higher weighting of adversary loss over crossentropy and knowledge distillation losses. However, this did not always lead to a proportional decrease in disparity. The minimum disparity was reached at different λ values for each model: RKD at 5, CTKD at 10, CKD at 15, and KD++ at 20. At the class level, the adversarial approach effectively reduced overall disparity at lower λ values, but it was not precise enough to address disparities in specific classes.

 

CTKD showed the weakest performance in reducing disparity, while KD++ was sensitive to higher λ values but achieved the lowest disparity at these levels. RKD was the most resilient to adversarial debiasing, reducing disparity at lower λ values with little impact on accuracy.

 

The study demonstrated that knowledge distillation models can debias effectively with increasing adversarial strength, showing a relationship between adversarial strength, accuracy, and bias. However, the sensitivity of adversarial debiasing to model architectures and hyperparameters necessitates careful tuning for optimal results.

​

3D Plot View

bottom of page