olmo_tap.final_evals.elo.types

Shared, dependency-light data types for the Elo tournament pipeline.

These primitives are imported by generate (which adds the GPU / torch-heavy bits), match_builder, run_tournament, and the tests. Keeping them in a torch-free module lets the orchestrator and the unit tests run inside the default pixi env without the cuda stack.

Functions

load_prompt_bank(path)

Read a JSONL prompt bank into a list of Prompt.

prompt_seed(prompt_id)

Deterministic per-prompt seed used by the generation harness.

Classes

GeneratedResponse(entrant_id, prompt_id, ...)

Single (entrant, prompt) generation record persisted to the cache.

Prompt(prompt_id, text[, source, subject, ...])

One row of the prompt bank, normalised to a small typed shape.

class olmo_tap.final_evals.elo.types.GeneratedResponse(entrant_id: str, prompt_id: str, response_text: str, p_correct: float | None, diagnostics: dict[str, Any], timestamp: str)[source]

Bases: object

Single (entrant, prompt) generation record persisted to the cache.

entrant_id

Stable id from EntrantSpec.

Type:

str

prompt_id

Stable id from the prompt bank.

Type:

str

response_text

Decoded response text returned to the judge.

Type:

str

p_correct

p_correct from the uncertainty head when the entrant requests the second-pass capture, otherwise None. The configuration-level Elo run leaves this None for all entrants; the field is kept for forward compatibility with uncertainty-aware tournaments.

Type:

float | None

diagnostics

Per-call metadata (PoE diagnostics or the HuggingFace mirror of the same schema).

Type:

dict[str, Any]

timestamp

ISO-8601 UTC timestamp of generation.

Type:

str

diagnostics: dict[str, Any]
entrant_id: str
p_correct: float | None
prompt_id: str
response_text: str
timestamp: str
class olmo_tap.final_evals.elo.types.Prompt(prompt_id: str, text: str, source: str = '', subject: str | None = None, gold_answer: str | None = None, expected_behavior: str | None = None, tags: tuple[str, ...] = ())[source]

Bases: object

One row of the prompt bank, normalised to a small typed shape.

expected_behavior: str | None = None
gold_answer: str | None = None
prompt_id: str
source: str = ''
subject: str | None = None
tags: tuple[str, ...] = ()
text: str
olmo_tap.final_evals.elo.types.load_prompt_bank(path: Path) list[Prompt][source]

Read a JSONL prompt bank into a list of Prompt.

olmo_tap.final_evals.elo.types.prompt_seed(prompt_id: str) int[source]

Deterministic per-prompt seed used by the generation harness.

A SHA-256 hash of the prompt id mod 2**32 — gives every prompt its own fixed seed so the random draft-head selection inside PoE lines up across the Hydra entrants on a given prompt while still varying across prompts.