olmo_tap.final_evals.elo.entrants¶
Entrant definitions for the configuration-level Elo tournament.
Defines the four configurations being compared, a dispatch table for the
generation harness, and the loader that materialises each entrant on
GPU. The EntrantSpec instances are deliberately serialisable
(no torch / model objects) so the full tournament configuration can be
hashed for cache keys and logged in the run manifest without GPU
dependencies; the loaded resources live in LoadedEntrant.
Functions
|
Materialise the model + tokenizer pair for an entrant. |
|
Look up an |
Classes
|
Structural description of a tournament entrant. |
|
A materialised entrant: spec + the model resources needed to generate. |
- class olmo_tap.final_evals.elo.entrants.EntrantSpec(entrant_id: Literal['base_olmo', 'security_only', 'security_plus_robustness', 'full_poe'], loader: Literal['vanilla_hf', 'custom_poe'], rob_checkpoint: int | None, bypass_jury: bool, temperature: float | None, needs_uncertainty: bool = False)[source]¶
Bases:
objectStructural description of a tournament entrant.
The fields are deliberately serialisable (no torch / model objects) so the full tournament configuration can be hashed for cache keys and logged in the run manifest without GPU dependencies.
- entrant_id¶
Stable identifier used in caches, manifests, and reports.
- Type:
Literal[‘base_olmo’, ‘security_only’, ‘security_plus_robustness’, ‘full_poe’]
- loader¶
Which model-loading recipe to use.
vanilla_hfis the plain HuggingFaceAutoModelForCausalLMpath used by the base OLMo entrant.custom_poemirrorsload_custom_poe()fromrobustness_sweep.pyfor the Hydra entrants.- Type:
Literal[‘vanilla_hf’, ‘custom_poe’]
- rob_checkpoint¶
Robustness LoRA checkpoint to merge.
Noneskips the robustness merge entirely (security-only entrant).-1selects the final checkpoint. Otherwise a step index.- Type:
int | None
- bypass_jury¶
When
Truethe PoE jury is short-circuited and every draft token is accepted. Used to get a single head’s sampled output through the PoE codepath. SetFalsefor full speculative decoding with verifier acceptance.- Type:
- temperature¶
Sampling temperature.
Noneselects greedy decoding (the base OLMo benchmark convention). Hydra entrants use0.98to match the production PoE path.- Type:
float | None
- needs_uncertainty¶
When
TruePoE captures the witness hidden state and computesp_correctvia the second-pass uncertainty head. The configuration-level Elo report does not consumep_correctdirectly so all four entrants set this toFalsefor now.- Type:
- class olmo_tap.final_evals.elo.entrants.LoadedEntrant(spec: EntrantSpec, tokenizer: PreTrainedTokenizerBase, hf_model: PreTrainedModel | None = None, hydra: HydraTransformer | None = None, poe: PoE | None = None)[source]¶
Bases:
objectA materialised entrant: spec + the model resources needed to generate.
- spec¶
The
EntrantSpecthis loader instance was built from.
- tokenizer¶
Tokenizer paired with the model.
- Type:
transformers.tokenization_utils_base.PreTrainedTokenizerBase
- hf_model¶
Vanilla HuggingFace causal-LM, populated only when
spec.loader == "vanilla_hf".Noneotherwise.- Type:
transformers.modeling_utils.PreTrainedModel | None
- hydra¶
Underlying
HydraTransformer, populated only whenspec.loader == "custom_poe".Noneotherwise.- Type:
- poe¶
PoEwrappinghydra, populated only whenspec.loader == "custom_poe".Noneotherwise.- Type:
olmo_tap.inference.poe.PoE | None
- hydra: HydraTransformer | None = None¶
- spec: EntrantSpec¶
- tokenizer: PreTrainedTokenizerBase¶
- olmo_tap.final_evals.elo.entrants.build_entrant(spec: EntrantSpec, max_new_tokens: int = 256) LoadedEntrant[source]¶
Materialise the model + tokenizer pair for an entrant.
Both loader paths land their tensors on CUDA in bfloat16 so generation is comparable across entrants. The
custom_poepath additionally wraps the loadedHydraTransformerinPoEso the eval harness can callgenerate_with_cacheuniformly.
- olmo_tap.final_evals.elo.entrants.get_entrant(entrant_id: str) EntrantSpec[source]¶
Look up an
EntrantSpecby its stable id.