olmo_tap.final_evals.elo.types¶
Shared, dependency-light data types for the Elo tournament pipeline.
These primitives are imported by generate (which adds the GPU /
torch-heavy bits), match_builder, run_tournament, and the
tests. Keeping them in a torch-free module lets the orchestrator and
the unit tests run inside the default pixi env without the cuda stack.
Functions
|
Read a JSONL prompt bank into a list of |
|
Deterministic per-prompt seed used by the generation harness. |
Classes
|
Single (entrant, prompt) generation record persisted to the cache. |
|
One row of the prompt bank, normalised to a small typed shape. |
- class olmo_tap.final_evals.elo.types.GeneratedResponse(entrant_id: str, prompt_id: str, response_text: str, p_correct: float | None, diagnostics: dict[str, Any], timestamp: str)[source]¶
Bases:
objectSingle (entrant, prompt) generation record persisted to the cache.
- p_correct¶
p_correctfrom the uncertainty head when the entrant requests the second-pass capture, otherwiseNone. The configuration-level Elo run leaves thisNonefor all entrants; the field is kept for forward compatibility with uncertainty-aware tournaments.- Type:
float | None
- diagnostics¶
Per-call metadata (PoE diagnostics or the HuggingFace mirror of the same schema).
- class olmo_tap.final_evals.elo.types.Prompt(prompt_id: str, text: str, source: str = '', subject: str | None = None, gold_answer: str | None = None, expected_behavior: str | None = None, tags: tuple[str, ...] = ())[source]¶
Bases:
objectOne row of the prompt bank, normalised to a small typed shape.
- olmo_tap.final_evals.elo.types.load_prompt_bank(path: Path) list[Prompt][source]¶
Read a JSONL prompt bank into a list of
Prompt.
- olmo_tap.final_evals.elo.types.prompt_seed(prompt_id: str) int[source]¶
Deterministic per-prompt seed used by the generation harness.
A SHA-256 hash of the prompt id mod
2**32— gives every prompt its own fixed seed so the random draft-head selection inside PoE lines up across the Hydra entrants on a given prompt while still varying across prompts.