olmo_tap.final_evals.elo.scripts.smoke_test_generate

Smoke test the four-entrant generation pipeline on a 3-prompt slice.

Confirms that all four entrants load and produce non-empty responses, that the per-prompt seeding aligns the draft head across the three Hydra entrants, that bypass_jury entrants record zero resampled positions, that the full PoE entrant resamples on at least some prompts, and that the vanilla-HF entrant is deterministic across identical-seed runs.

The cache directory is forced to a smoke-test-only location so the real response cache is untouched. Re-runs of the smoke test reuse that scratch cache; delete it to force fresh generation.

Run:

pixi run -e cuda python -m olmo_tap.final_evals.elo.scripts.smoke_test_generate

Functions

main()

olmo_tap.final_evals.elo.scripts.smoke_test_generate.main() None[source]