What is shortcut learning, and why does it matter?
Deep neural networks frequently solve a task using superficial cues that happen to correlate with the label in the training set, instead of the features humans would use. These shortcuts can hide behind very high average accuracy and only fail when the spurious correlation breaks.
A working definition
Geirhos et al. (2020) describe a shortcut as a decision rule that performs well on standard benchmarks but transfers poorly because it relies on features that are predictive in the training distribution but not causally related to the target.
On Waterbirds the most obvious candidate shortcut is background → bird type: in training, water is almost always paired with waterbirds and land with landbirds. A model that learns this rule looks great on the i.i.d. validation split but breaks when the test set deliberately reverses the correlation.
Why this is more than a curiosity
- Reliability: spurious-feature reliance produces silent failures on minority subgroups (Sagawa et al. 2019).
- Fairness: when subgroups correlate with protected attributes, average metrics mask discriminatory behaviour.
- Generalisation: high test accuracy is no longer evidence that the model is solving the intended task.
What this project investigates
Following the project brief, we ask three concrete questions:
- Does standard accuracy hide subgroup failures on a CNN trained on the biased Waterbirds split?
- When the model gets the right answer, is it actually looking at the bird, or at the background? We measure this with Grad-CAM and a simple foreground/background attention-bias score.
- If we intervene on the image — blur or mask the background, mask the foreground, shuffle background patches — how do classification and saliency change? Causal evidence of shortcut reliance.
References: Sagawa et al. 2019 · Geirhos et al. 2020 · Selvaraju et al. 2017