olmo_tap.experiments.security.data¶

Data loading for security head supervised finetuning on MedMCQA.

Functions

`format_question`(question, mcq_options)	Wrap a raw MedMCQA question with preamble.
`load_shard`(config)	Load a MedMCQA shard, tokenize prompts, return train_dl.
`preprocess_example`(example, tokenizer, ...)	Tokenize the question prompt and store the ground-truth answer token ID.

olmo_tap.experiments.security.data.format_question(question: str, mcq_options: list[str]) → str[source]¶: Wrap a raw MedMCQA question with preamble.

olmo_tap.experiments.security.data.load_shard(config: TrainingConfig) → tuple[DataLoader, int, int, int, int][source]¶: Load a MedMCQA shard, tokenize prompts, return train_dl.

olmo_tap.experiments.security.data.preprocess_example(example: dict[str, str], tokenizer: PreTrainedTokenizerBase, max_seq_len: int, token_ids: list[int]) → dict[source]¶: Tokenize the question prompt and store the ground-truth answer token ID.