olmo_tap.final_evalsΒΆ

Modules

elo

Tournament 1 Elo evaluation harness.

robustness_sweep

Evaluate accuracy across robustness weight checkpoints with PoE.

uncertainty_sweep

Evaluate the calibration of the uncertainty head using PoE.