olmo_tap.final_evals.elo.entrants¶

Entrant definitions for the configuration-level Elo tournament.

Defines the four configurations being compared, a dispatch table for the generation harness, and the loader that materialises each entrant on GPU. The EntrantSpec instances are deliberately serialisable (no torch / model objects) so the full tournament configuration can be hashed for cache keys and logged in the run manifest without GPU dependencies; the loaded resources live in LoadedEntrant.

Functions

`build_entrant`(spec[, max_new_tokens])	Materialise the model + tokenizer pair for an entrant.
`get_entrant`(entrant_id)	Look up an `EntrantSpec` by its stable id.

Classes

`EntrantSpec`(entrant_id, loader, ...[, ...])	Structural description of a tournament entrant.
`LoadedEntrant`(spec, tokenizer[, hf_model, ...])	A materialised entrant: spec + the model resources needed to generate.

class olmo_tap.final_evals.elo.entrants.EntrantSpec(entrant_id: Literal['base_olmo', 'security_only', 'security_plus_robustness', 'full_poe'], loader: Literal['vanilla_hf', 'custom_poe'], rob_checkpoint: int | None, bypass_jury: bool, temperature: float | None, needs_uncertainty: bool = False)[source]¶

Bases: object

Structural description of a tournament entrant.

The fields are deliberately serialisable (no torch / model objects) so the full tournament configuration can be hashed for cache keys and logged in the run manifest without GPU dependencies.

entrant_id¶

Stable identifier used in caches, manifests, and reports.

Type:: Literal[‘base_olmo’, ‘security_only’, ‘security_plus_robustness’, ‘full_poe’]

loader¶

Which model-loading recipe to use. vanilla_hf is the plain HuggingFace AutoModelForCausalLM path used by the base OLMo entrant. custom_poe mirrors load_custom_poe() from robustness_sweep.py for the Hydra entrants.

Type:: Literal[‘vanilla_hf’, ‘custom_poe’]

rob_checkpoint¶

Robustness LoRA checkpoint to merge. None skips the robustness merge entirely (security-only entrant). -1 selects the final checkpoint. Otherwise a step index.

Type:: int | None

bypass_jury¶

When True the PoE jury is short-circuited and every draft token is accepted. Used to get a single head’s sampled output through the PoE codepath. Set False for full speculative decoding with verifier acceptance.

Type:: bool

temperature¶

Sampling temperature. None selects greedy decoding (the base OLMo benchmark convention). Hydra entrants use 0.98 to match the production PoE path.

Type:: float | None

needs_uncertainty¶

When True PoE captures the witness hidden state and computes p_correct via the second-pass uncertainty head. The configuration-level Elo report does not consume p_correct directly so all four entrants set this to False for now.

Type:: bool

bypass_jury: bool¶

entrant_id: Literal['base_olmo', 'security_only', 'security_plus_robustness', 'full_poe']¶

loader: Literal['vanilla_hf', 'custom_poe']¶

needs_uncertainty: bool = False¶

rob_checkpoint: int | None¶

temperature: float | None¶

class olmo_tap.final_evals.elo.entrants.LoadedEntrant(spec: EntrantSpec, tokenizer: PreTrainedTokenizerBase, hf_model: PreTrainedModel | None = None, hydra: HydraTransformer | None = None, poe: PoE | None = None)[source]¶

Bases: object

A materialised entrant: spec + the model resources needed to generate.

spec¶

The EntrantSpec this loader instance was built from.

Type:: olmo_tap.final_evals.elo.entrants.EntrantSpec

tokenizer¶

Tokenizer paired with the model.

Type:: transformers.tokenization_utils_base.PreTrainedTokenizerBase

hf_model¶

Vanilla HuggingFace causal-LM, populated only when spec.loader == "vanilla_hf". None otherwise.

Type:: transformers.modeling_utils.PreTrainedModel | None

hydra¶

Underlying HydraTransformer, populated only when spec.loader == "custom_poe". None otherwise.

Type:: olmo_tap.hydra.HydraTransformer | None

poe¶

PoE wrapping hydra, populated only when spec.loader == "custom_poe". None otherwise.

Type:: olmo_tap.inference.poe.PoE | None

hf_model: PreTrainedModel | None = None¶

hydra: HydraTransformer | None = None¶

poe: PoE | None = None¶

spec: EntrantSpec¶

tokenizer: PreTrainedTokenizerBase¶

olmo_tap.final_evals.elo.entrants.build_entrant(spec: EntrantSpec, max_new_tokens: int = 256) → LoadedEntrant[source]¶

Materialise the model + tokenizer pair for an entrant.

Both loader paths land their tensors on CUDA in bfloat16 so generation is comparable across entrants. The custom_poe path additionally wraps the loaded HydraTransformer in PoE so the eval harness can call generate_with_cache uniformly.

olmo_tap.final_evals.elo.entrants.get_entrant(entrant_id: str) → EntrantSpec[source]¶: Look up an EntrantSpec by its stable id.