olmo_tap.final_evals.elo.types¶

Shared, dependency-light data types for the Elo tournament pipeline.

These primitives are imported by generate (which adds the GPU / torch-heavy bits), match_builder, run_tournament, and the tests. Keeping them in a torch-free module lets the orchestrator and the unit tests run inside the default pixi env without the cuda stack.

Functions

`load_prompt_bank`(path)	Read a JSONL prompt bank into a list of `Prompt`.
`prompt_seed`(prompt_id)	Deterministic per-prompt seed used by the generation harness.

Classes

`GeneratedResponse`(entrant_id, prompt_id, ...)	Single (entrant, prompt) generation record persisted to the cache.
`Prompt`(prompt_id, text[, source, subject, ...])	One row of the prompt bank, normalised to a small typed shape.

class olmo_tap.final_evals.elo.types.GeneratedResponse(entrant_id: str, prompt_id: str, response_text: str, p_correct: float | None, diagnostics: dict[str, Any], timestamp: str)[source]¶

Bases: object

Single (entrant, prompt) generation record persisted to the cache.

entrant_id¶

Stable id from EntrantSpec.

Type:: str

prompt_id¶

Stable id from the prompt bank.

Type:: str

response_text¶

Decoded response text returned to the judge.

Type:: str

p_correct¶

p_correct from the uncertainty head when the entrant requests the second-pass capture, otherwise None. The configuration-level Elo run leaves this None for all entrants; the field is kept for forward compatibility with uncertainty-aware tournaments.

Type:: float | None

diagnostics¶

Per-call metadata (PoE diagnostics or the HuggingFace mirror of the same schema).

Type:: dict[str, Any]

timestamp¶

ISO-8601 UTC timestamp of generation.

Type:: str

diagnostics: dict[str, Any]¶

entrant_id: str¶

p_correct: float | None¶

prompt_id: str¶

response_text: str¶

timestamp: str¶

class olmo_tap.final_evals.elo.types.Prompt(prompt_id: str, text: str, source: str = '', subject: str | None = None, gold_answer: str | None = None, expected_behavior: str | None = None, tags: tuple[str, ...] = ())[source]¶

Bases: object

One row of the prompt bank, normalised to a small typed shape.

expected_behavior: str | None = None¶

gold_answer: str | None = None¶

prompt_id: str¶

source: str = ''¶

subject: str | None = None¶

tags: tuple[str, ...] = ()¶

text: str¶

olmo_tap.final_evals.elo.types.load_prompt_bank(path: Path) → list[Prompt][source]¶: Read a JSONL prompt bank into a list of Prompt.

olmo_tap.final_evals.elo.types.prompt_seed(prompt_id: str) → int[source]¶

Deterministic per-prompt seed used by the generation harness.

A SHA-256 hash of the prompt id mod 2**32 — gives every prompt its own fixed seed so the random draft-head selection inside PoE lines up across the Hydra entrants on a given prompt while still varying across prompts.