Shortcut Learning · Waterbirds
Code walkthrough

Each methodology step → the file that implements it

Click any chip to jump straight to the file on GitHub. The full pipeline runs end-to-end with bash run_all.sh.

01
Load Waterbirds

Wraps the HF grodino/waterbirds split as a torch Dataset. Returns image, label, place, group, idx.

02
Build the model

ResNet18 (or 50) with ImageNet weights and a 2-class head. Architecture set in config.yaml.

03
Train with worst-group selection

15 epochs of Adam, lr 1e-4. Best checkpoint is the one with the highest validation worst-group accuracy.

python -m src.train --config config.yaml
Outputs
  • outputs/checkpoints/best_model.pt
  • outputs/metrics/train_history.csv
04
Subgroup evaluation

Computes overall accuracy + macro P/R/F1, the confusion matrix, per-subgroup accuracy and worst-group accuracy.

python -m src.evaluate --config config.yaml --checkpoint outputs/checkpoints/best_model.pt --split test
Outputs
  • outputs/metrics/test_metrics.json
  • outputs/metrics/test_predictions.csv
  • outputs/figures/test_confusion_matrix.png
05
Grad-CAM saliency analysis

Selvaraju et al. Grad-CAM on layer4[-1]. 30 representative test samples per subgroup; each generates a 3-panel plate.

python -m src.gradcam_analysis --config config.yaml --checkpoint outputs/checkpoints/best_model.pt
Outputs
  • outputs/gradcam/gradcam_results.csv
  • outputs/gradcam/gradcam_group_summary.csv
  • outputs/gradcam/*.png (122 plates)
06
Foreground / background score

60% center crop = foreground proxy. attention_bias_score = sum(saliency outside) / total. Implemented as saliency_ratios() inside gradcam_analysis.py.

07
Interventions

Four edits — background_blur, background_mask, background_patch_shuffle, foreground_mask — re-run inference + Grad-CAM under each.

python -m src.interventions --config config.yaml --checkpoint outputs/checkpoints/best_model.pt --max-samples 1000
Outputs
  • outputs/interventions/intervention_predictions.csv
  • outputs/interventions/intervention_metrics.csv
  • outputs/interventions/intervention_overall_metrics.csv
  • outputs/figures/intervention_accuracy.png
  • outputs/figures/intervention_background_saliency.png
08
Generate the project summary

Aggregates everything into a single markdown file. The website also reads the same CSV/JSON files at build time.

python -m src.report_summary
Outputs
  • outputs/PROJECT_RESULTS_SUMMARY.md

One-shot reproducibility

pip install -r requirements.txt
bash run_all.sh