olmo_tap.experiments.robustness.data

Data loading for robustness head supervised finetuning on MedMCQA.

Functions

format_example(question, mcq_options)

Wrap a raw MedMCQA question with preamble.

load_cached_shard(config)

Load precomputed clean/poisoned pairs + masks from GCG cache.

Classes

CachedShardDataset(clean, poisoned, ...)

Loads precomputed clean/poisoned token IDs and masks from GCG cache.

class olmo_tap.experiments.robustness.data.CachedShardDataset(clean: Tensor, poisoned: Tensor, clean_mask: Tensor, poisoned_mask: Tensor)[source]

Bases: Dataset

Loads precomputed clean/poisoned token IDs and masks from GCG cache.

olmo_tap.experiments.robustness.data.format_example(question: str, mcq_options: list[str]) str[source]

Wrap a raw MedMCQA question with preamble.

olmo_tap.experiments.robustness.data.load_cached_shard(config: TrainingConfig) DataLoader[source]

Load precomputed clean/poisoned pairs + masks from GCG cache.