Project 18 · Advanced Deep Learning
Saliency-based Analysis of
Shortcut Learning in CNNs
We train a CNN on the deliberately biased Waterbirds dataset and use Grad-CAM, a foreground/background attention-bias score, and four inference-time interventions to test whether the model is actually looking at the bird — or just at the background.
The shortcut, in one screen
The model looks excellent on average — but the gap to worst-group accuracy and the dramatic effect of intervening on the image make the shortcut undeniable.
Overall test accuracy
83.9%
ResNet18, balanced test split
Worst-group accuracy
59.5%
Waterbird on land · the conflict case
Accuracy after foreground mask
53.4%
31.8% of predictions flip
Accuracy after background mask
86.0%
Removing the background helps — bias confirmed
The pipeline
Following the project brief: train, evaluate by subgroup, run Grad-CAM, score foreground vs. background attention, intervene at inference time, then compare classification and saliency-based metrics.
01
Load Waterbirds
HF grodino/waterbirds · 4 subgroups
→
02
Train ResNet18
Save best by worst-group acc
→
03
Subgroup eval
Acc · P · R · F1 · CM · WG
→
04
Grad-CAM
Class-conditional saliency on layer4[-1]
→
05
Bias score
BG saliency / total · 60% center crop
→
06
Interventions
Blur · mask · shuffle · FG mask
→
07
Compare
Δ accuracy · Δ flips · Δ saliency
Browse the project
Ten sections walk through the project end-to-end, plus a code walkthrough and a live demo.
01Problem
Problem & motivation
Why shortcut learning matters and how it manifests in CNNs.
02Dataset
Waterbirds dataset bias
How the 95/5 train vs. 50/50 test split engineers a known shortcut.
03Methodology
Methodology pipeline
End-to-end view of training, evaluation, Grad-CAM, interventions.
04Training
CNN training
ResNet18 fine-tune with worst-group accuracy as model selector.
05Evaluation
Subgroup evaluation
Overall vs. worst-group metrics, confusion matrix, four subgroups.
06Grad-CAM
Grad-CAM saliency analysis
Class-conditional saliency maps over a balanced test sample.
07Interventions
Foreground / background score & interventions
Center-crop attention bias score and four inference-time interventions.
08Results
Results dashboard
All numbers in one place.
09Conclusion
Conclusion
What the experiments collectively prove.
10Limitations
Limitations & future work
What this study doesn't claim and what would strengthen it.
Code
Code walkthrough
Each step mapped to the responsible source file.
Demo
Live demo
Upload an image and watch the model + Grad-CAM in real time.
References
References
Papers cited and prior work.