07 · Interventions
Causal evidence: edit the image, watch the prediction
The brief asks for at least two interventions. We implement four — three that erase the background in different ways, plus one that erases the foreground. We re-run inference and Grad-CAM under each condition.
Background blur
Gaussian blur (σ from kernel 31) outside the center crop. Removes high-frequency background detail without changing colour.
Background mask
Replace the background with a flat 0.5 grey. Removes essentially all background information.
Background patch shuffle
Permute 16×16 background patches. Keeps the colour distribution but destroys structure.
Foreground mask
Replace the center crop with grey — i.e. erase the bird while keeping the background untouched. The hardest test for shortcut reliance.
Overall effect
Accuracy by intervention
What happens to test accuracy when we modify the image at inference time?
Prediction flip rate vs. original
Fraction of samples where the predicted class changes after the intervention.
Average background saliency ratio
How much of the Grad-CAM heat sits outside the center foreground box?
Reading the bars
- Foreground mask drops accuracy from 80.2% to 53.4% with a 31.8% flip rate — the bird matters, a lot. The model is not a pure background classifier.
- Background mask actually increases overall accuracy (86.0%). When we delete the background, predictions on the conflict groups improve — direct evidence that the background was misleading the model on those subgroups.
- Background patch shuffle is the most disruptive of the three background ablations for prediction flips, even though average colour is preserved — meaning structural background features (water vs. land texture) were carrying signal.
Side-by-side Grad-CAM comparison
For each conflict-group sample, we render the original image and its four intervened versions, each with the Grad-CAM overlay for the model's predicted class. Generated by src/comparison_plates.py.






Subgroup breakdown
Drilling into the 4 × 5 grid is where the story really lands. Note how foreground masking devastates the conflict groups (waterbird-land drops to ~7% accuracy) while leaving the majority group nearly untouched.
| Condition | Subgroup | N | Accuracy | Flip rate | Conf. drop | FG saliency | BG saliency |
|---|---|---|---|---|---|---|---|
| Original | Landbird on land (majority) | 352 | 98.3% | 0.0% | 0.0% | 54.6% | 45.4% |
| Original | Landbird on water (conflict) | 383 | 70.0% | 0.0% | 0.0% | 63.6% | 36.4% |
| Original | Waterbird on land (conflict) | 139 | 50.4% | 0.0% | 0.0% | 55.3% | 44.7% |
| Original | Waterbird on water (majority) | 126 | 93.7% | 0.0% | 0.0% | 61.7% | 38.3% |
| Background blur | Landbird on land (majority) | 352 | 97.7% | 1.1% | 1.9% | 61.9% | 38.1% |
| Background blur | Landbird on water (conflict) | 383 | 79.1% | 12.8% | -1.2% | 71.3% | 28.7% |
| Background blur | Waterbird on land (conflict) | 139 | 53.2% | 11.5% | 0.6% | 64.9% | 35.1% |
| Background blur | Waterbird on water (majority) | 126 | 90.5% | 6.3% | 2.0% | 70.2% | 29.8% |
| Background mask | Landbird on land (majority) | 352 | 96.9% | 1.4% | 3.0% | 71.2% | 28.8% |
| Background mask | Landbird on water (conflict) | 383 | 86.2% | 17.8% | -4.4% | 72.3% | 27.7% |
| Background mask | Waterbird on land (conflict) | 139 | 59.7% | 12.2% | 0.0% | 73.2% | 26.8% |
| Background mask | Waterbird on water (majority) | 126 | 84.1% | 11.1% | 4.1% | 73.4% | 26.6% |
| Background patch shuffle | Landbird on land (majority) | 352 | 98.3% | 1.7% | 1.0% | 57.1% | 42.9% |
| Background patch shuffle | Landbird on water (conflict) | 383 | 83.6% | 19.8% | -3.3% | 61.2% | 38.8% |
| Background patch shuffle | Waterbird on land (conflict) | 139 | 48.9% | 15.8% | 1.1% | 58.8% | 41.2% |
| Background patch shuffle | Waterbird on water (majority) | 126 | 81.0% | 12.7% | 7.8% | 61.2% | 38.8% |
| Foreground mask | Landbird on land (majority) | 352 | 97.2% | 4.0% | 4.0% | 34.9% | 65.1% |
| Foreground mask | Landbird on water (conflict) | 383 | 20.6% | 55.6% | 6.0% | 29.8% | 70.2% |
| Foreground mask | Waterbird on land (conflict) | 139 | 7.2% | 47.5% | -1.5% | 33.8% | 66.2% |
| Foreground mask | Waterbird on water (majority) | 126 | 81.7% | 19.8% | 11.0% | 33.0% | 67.0% |

