olmo_tap.final_evals.elo.scripts.smoke_test_generate¶
Smoke test the four-entrant generation pipeline on a 3-prompt slice.
Confirms that all four entrants load and produce non-empty responses, that the per-prompt seeding aligns the draft head across the three Hydra entrants, that bypass_jury entrants record zero resampled positions, that the full PoE entrant resamples on at least some prompts, and that the vanilla-HF entrant is deterministic across identical-seed runs.
The cache directory is forced to a smoke-test-only location so the real response cache is untouched. Re-runs of the smoke test reuse that scratch cache; delete it to force fresh generation.
Run:
pixi run -e cuda python -m olmo_tap.final_evals.elo.scripts.smoke_test_generate
Functions
|