Shortcut Learning · Waterbirds
08 · Results dashboard

All numbers in one place

The full project results assembled into one scrollable view, suitable for live presentation.

Headline metrics

Overall test acc
83.9%
Worst-group acc
59.5%
waterbird-land
Foreground-mask acc
53.4%
31.8% flips
Background-mask acc
86.0%
bg removal helps

Subgroup metrics

SubgroupCountAccuracyAvg confidenceNotes
Waterbird on land (conflict)
waterbird-land
64259.5%90.1%worst group · shortcut conflict
Landbird on water (conflict)
landbird-water
225573.6%87.6%shortcut conflict
Waterbird on water (majority)
waterbird-water
64293.3%96.9%majority group
Landbird on land (majority)
landbird-land
225598.6%98.4%majority group

Confusion matrix

Confusion matrix · test split
Rows = true class, columns = predicted class. Cell intensity is proportional to count.
Pred: landbird
Pred: waterbird
True: landbird
3882
86.1% of true row
628
13.9% of true row
True: waterbird
303
23.6% of true row
981
76.4% of true row

Training dynamics

Training history
Validation accuracy by subgroup, per epoch
Notice how overall validation accuracy stays high (~83%) while worst-group accuracy oscillates between 16% and 56% — this gap is the shortcut signal the project investigates.

Grad-CAM saliency

Foreground vs. background saliency by subgroup
Fraction of Grad-CAM heat that falls inside the 60% center crop (foreground heuristic) vs. outside (background).

Interventions

Accuracy by intervention
What happens to test accuracy when we modify the image at inference time?
Prediction flip rate vs. original
Fraction of samples where the predicted class changes after the intervention.
Average background saliency ratio
How much of the Grad-CAM heat sits outside the center foreground box?

Intervention × subgroup table

ConditionSubgroupNAccuracyFlip rateConf. dropFG saliencyBG saliency
OriginalLandbird on land (majority)35298.3%0.0%0.0%54.6%45.4%
OriginalLandbird on water (conflict)38370.0%0.0%0.0%63.6%36.4%
OriginalWaterbird on land (conflict)13950.4%0.0%0.0%55.3%44.7%
OriginalWaterbird on water (majority)12693.7%0.0%0.0%61.7%38.3%
Background blurLandbird on land (majority)35297.7%1.1%1.9%61.9%38.1%
Background blurLandbird on water (conflict)38379.1%12.8%-1.2%71.3%28.7%
Background blurWaterbird on land (conflict)13953.2%11.5%0.6%64.9%35.1%
Background blurWaterbird on water (majority)12690.5%6.3%2.0%70.2%29.8%
Background maskLandbird on land (majority)35296.9%1.4%3.0%71.2%28.8%
Background maskLandbird on water (conflict)38386.2%17.8%-4.4%72.3%27.7%
Background maskWaterbird on land (conflict)13959.7%12.2%0.0%73.2%26.8%
Background maskWaterbird on water (majority)12684.1%11.1%4.1%73.4%26.6%
Background patch shuffleLandbird on land (majority)35298.3%1.7%1.0%57.1%42.9%
Background patch shuffleLandbird on water (conflict)38383.6%19.8%-3.3%61.2%38.8%
Background patch shuffleWaterbird on land (conflict)13948.9%15.8%1.1%58.8%41.2%
Background patch shuffleWaterbird on water (majority)12681.0%12.7%7.8%61.2%38.8%
Foreground maskLandbird on land (majority)35297.2%4.0%4.0%34.9%65.1%
Foreground maskLandbird on water (conflict)38320.6%55.6%6.0%29.8%70.2%
Foreground maskWaterbird on land (conflict)1397.2%47.5%-1.5%33.8%66.2%
Foreground maskWaterbird on water (majority)12681.7%19.8%11.0%33.0%67.0%