olmo_tap.final_evals.elo.scriptsΒΆ

One-off scripts for the Elo evaluation harness.

Modules

smoke_test_generate

Smoke test the four-entrant generation pipeline on a 3-prompt slice.

validate_judge

End-to-end validation of the judge pipeline against a hand-crafted pair.