olmo_tap.experiments.security.data

Data loading for security head supervised finetuning on MedMCQA.

Functions

format_question(question, mcq_options)

Wrap a raw MedMCQA question with preamble.

load_shard(config)

Load a MedMCQA shard, tokenize prompts, return train_dl.

preprocess_example(example, tokenizer, ...)

Tokenize the question prompt and store the ground-truth answer token ID.

olmo_tap.experiments.security.data.format_question(question: str, mcq_options: list[str]) str[source]

Wrap a raw MedMCQA question with preamble.

olmo_tap.experiments.security.data.load_shard(config: TrainingConfig) tuple[DataLoader, int, int, int, int][source]

Load a MedMCQA shard, tokenize prompts, return train_dl.

olmo_tap.experiments.security.data.preprocess_example(example: dict[str, str], tokenizer: PreTrainedTokenizerBase, max_seq_len: int, token_ids: list[int]) dict[source]

Tokenize the question prompt and store the ground-truth answer token ID.