06 · Grad-CAM

What is the model looking at?

Grad-CAM (Selvaraju et al. 2017) projects gradient-weighted activations from the last convolutional layer back onto the image. We compute it for the predicted class on a balanced sample of the test set — 30 images per subgroup.

Source:src/gradcam_analysis.py outputs/gradcam/gradcam_results.csv

Foreground vs. background saliency by subgroup

Fraction of Grad-CAM heat that falls inside the 60% center crop (foreground heuristic) vs. outside (background).

Per-subgroup attention bias

Subgroup	Foreground	Background (bias)	Acc
Landbird on land (majority)	51.6%	48.4%	93.3%
Landbird on water (conflict)	59.3%	40.7%	60.0%
Waterbird on land (conflict)	59.3%	40.7%	70.0%
Waterbird on water (majority)	65.2%	34.8%	96.7%

Even when correct, ~30–48% of the model's attention falls outside the bird in the center crop. The two majority groups carry the highest "background" share — exactly the failure mode the brief asks us to detect.

Visual gallery

Each image is a three-panel plate produced by src/gradcam_analysis.py: the original image, the Grad-CAM overlay for the predicted class, and the white box showing the 60% center crop used as the foreground heuristic. Shortcut failures are highlighted in red.

Correctwaterbird-watertrue: waterbird · pred: waterbird
Majority case: waterbird on water. Saliency overlaps the bird; background and foreground both look 'consistent'.

Correctwaterbird-watertrue: waterbird · pred: waterbird
Majority case — model is confident and attention spans bird + water.

Correctwaterbird-watertrue: waterbird · pred: waterbird
Another majority example, same pattern.

Correctwaterbird-landtrue: waterbird · pred: waterbird
Conflict group success: a waterbird on land where the model still found the bird.

SHORTCUT FAILUREwaterbird-landtrue: waterbird · pred: landbird
Shortcut failure: a waterbird placed on land. The model misclassifies it as landbird — saliency leaks onto the land background.

SHORTCUT FAILUREwaterbird-landtrue: waterbird · pred: landbird
Another waterbird-on-land failure — strong evidence the background is steering the prediction.

Correctlandbird-landtrue: landbird · pred: landbird
Easy majority case for landbirds.

Correctlandbird-landtrue: landbird · pred: landbird
Landbird on land — accurate prediction.

Correctlandbird-landtrue: landbird · pred: landbird
Landbird majority case.

Correctlandbird-watertrue: landbird · pred: landbird
Conflict success: landbird placed on water but still classified correctly.

SHORTCUT FAILURElandbird-watertrue: landbird · pred: waterbird
Shortcut failure: landbird on water becomes 'waterbird'. Saliency drifts toward the water background.

SHORTCUT FAILURElandbird-watertrue: landbird · pred: waterbird
Another landbird-on-water failure — the background pattern dominates.

Caveat: the heuristic is coarse

Birds are not always perfectly centered, so the 60% center crop is a coarse foreground proxy. We discuss this honestly in Limitations. What matters here is the relative change in background attention across subgroups and across interventions.

← Evaluation Next: Interventions →