olmo_tap.final_evals.robustness_sweep¶

Evaluate accuracy across robustness weight checkpoints with PoE.

It was empirically observed that robustness finetuning increased robustness (model flipped its answer with lower frequency when adversarial tokens were appended to the prompt) but accuracy suffered. This file sweeps through checkpoints of robustness LoRA weights to identify the checkpoint with best PoE accuracy and lowest answer flip rate.

Functions

`load_custom_poe`(rob_dir, checkpoint)
`main`()
`precompute_attacks`()

olmo_tap.final_evals.robustness_sweep.load_custom_poe(rob_dir: Path, checkpoint: int | None) → tuple[HydraTransformer, int][source]¶

olmo_tap.final_evals.robustness_sweep.main()[source]¶

olmo_tap.final_evals.robustness_sweep.precompute_attacks()[source]¶