Data | MIDS Capstone

Dataset

To accomplish bias reduction, we deploy the four aforementioned bleeding edge knowledge distillation frameworks on the WIDER Attribute dataset.

The WIDER is human centric dataset which includes over 13,789 images spread over 30 classes. For each bounding box, 14 distinct human attributes are labelled. There are 805,336 labels in total.

For this project, the data is condensed and clustered into 16 classes, with a focus on debiasing images based on gender as a protected class.

Source

Given there are similar classes in the WIDER Attribute dataset, the classes were clustered by the authors to 16 larger classes, joined on common themes. The only protected characteristic in the WIDER Attribute dataset is gender, therefore all other attributes were ignored

Several of the clustered classes are severely imbalanced between perceived male and perceived female. Dataset attribute imbalances can create model bias, which this research seeks to mitigate.