olmo_tap.experiments.robustness.data¶

Data loading for robustness head supervised finetuning on MedMCQA.

Functions

`format_example`(question, mcq_options)	Wrap a raw MedMCQA question with preamble.
`load_cached_shard`(config)	Load precomputed clean/poisoned pairs + masks from GCG cache.

Classes

CachedShardDataset(clean, poisoned, ...)

Loads precomputed clean/poisoned token IDs and masks from GCG cache.

class olmo_tap.experiments.robustness.data.CachedShardDataset(clean: Tensor, poisoned: Tensor, clean_mask: Tensor, poisoned_mask: Tensor)[source]¶

Bases: Dataset

Loads precomputed clean/poisoned token IDs and masks from GCG cache.

olmo_tap.experiments.robustness.data.format_example(question: str, mcq_options: list[str]) → str[source]¶: Wrap a raw MedMCQA question with preamble.

olmo_tap.experiments.robustness.data.load_cached_shard(config: TrainingConfig) → DataLoader[source]¶: Load precomputed clean/poisoned pairs + masks from GCG cache.