Each methodology step → the file that implements it
Click any chip to jump straight to the file on GitHub. The full pipeline runs end-to-end with bash run_all.sh.
Wraps the HF grodino/waterbirds split as a torch Dataset. Returns image, label, place, group, idx.
ResNet18 (or 50) with ImageNet weights and a 2-class head. Architecture set in config.yaml.
15 epochs of Adam, lr 1e-4. Best checkpoint is the one with the highest validation worst-group accuracy.
python -m src.train --config config.yaml
- outputs/checkpoints/best_model.pt
- outputs/metrics/train_history.csv
Computes overall accuracy + macro P/R/F1, the confusion matrix, per-subgroup accuracy and worst-group accuracy.
python -m src.evaluate --config config.yaml --checkpoint outputs/checkpoints/best_model.pt --split test
- outputs/metrics/test_metrics.json
- outputs/metrics/test_predictions.csv
- outputs/figures/test_confusion_matrix.png
Selvaraju et al. Grad-CAM on layer4[-1]. 30 representative test samples per subgroup; each generates a 3-panel plate.
python -m src.gradcam_analysis --config config.yaml --checkpoint outputs/checkpoints/best_model.pt
- outputs/gradcam/gradcam_results.csv
- outputs/gradcam/gradcam_group_summary.csv
- outputs/gradcam/*.png (122 plates)
60% center crop = foreground proxy. attention_bias_score = sum(saliency outside) / total. Implemented as saliency_ratios() inside gradcam_analysis.py.
Four edits — background_blur, background_mask, background_patch_shuffle, foreground_mask — re-run inference + Grad-CAM under each.
python -m src.interventions --config config.yaml --checkpoint outputs/checkpoints/best_model.pt --max-samples 1000
- outputs/interventions/intervention_predictions.csv
- outputs/interventions/intervention_metrics.csv
- outputs/interventions/intervention_overall_metrics.csv
- outputs/figures/intervention_accuracy.png
- outputs/figures/intervention_background_saliency.png
Aggregates everything into a single markdown file. The website also reads the same CSV/JSON files at build time.
python -m src.report_summary
- outputs/PROJECT_RESULTS_SUMMARY.md
One-shot reproducibility
pip install -r requirements.txt bash run_all.sh