olmo_tap.experiments.utils.config

Config classes to support training and inference.

Classes

ExperimentConfig(seed, model, train, ...)

Master config to store the HydraLoraConfig and TrainingConfig in training.

HydraLoRAConfig(weights_dir, model_size, ...)

Supports loading Hydra model for inference or training.

TrainingConfig([learning_rate, batch_size, ...])

Config to store training specific parameters.

class olmo_tap.experiments.utils.config.ExperimentConfig(seed: int, model: HydraLoRAConfig = <factory>, train: TrainingConfig = <factory>, wandb_project: str = 'hydra', wandb_run_name: str | None = None, device: str = 'cuda')[source]

Bases: object

Master config to store the HydraLoraConfig and TrainingConfig in training.

device: str = 'cuda'
model: HydraLoRAConfig
seed: int
train: TrainingConfig
wandb_project: str = 'hydra'
wandb_run_name: str | None = None
class olmo_tap.experiments.utils.config.HydraLoRAConfig(weights_dir: str = '/vol/bitbucket/tjt25/olmo2-7b-instruct-weights', model_size: str = '7b', n_heads_final: int = 5, n_heads_training: int = 1, heads_depth: int = 3, vocab_size: int = 100352, lora_r: int = 16, lora_alpha: int = 32, target_modules: list[str] = <factory>, device: str = 'cuda')[source]

Bases: object

Supports loading Hydra model for inference or training. NOTE: n_heads_final is for book-keeping the number of heads the final Hydra model is intended to have; n_heads_training is the actual number loaded at training time.

device: str = 'cuda'
heads_depth: int = 3
lora_alpha: int = 32
lora_r: int = 16
model_size: str = '7b'
n_heads_final: int = 5
n_heads_training: int = 1
target_modules: list[str]
vocab_size: int = 100352
weights_dir: str = '/vol/bitbucket/tjt25/olmo2-7b-instruct-weights'
class olmo_tap.experiments.utils.config.TrainingConfig(learning_rate: float = 0.0001, batch_size: int = 16, num_epochs: int = 1, max_seq_len: int = 256, num_workers: int = 4, shard_id: int = 0, weights_dir: str = '/vol/bitbucket/tjt25/olmo2-7b-instruct-weights', warmup_steps: int = 100, lr_schedule: str = 'cosine', output_dir: str = 'experiments/uncertainty/outputs', checkpoint_every_n_steps: int = 250)[source]

Bases: object

Config to store training specific parameters.

A_token_id: int
B_token_id: int
C_token_id: int
D_token_id: int
batch_size: int = 16
checkpoint_every_n_steps: int = 250
learning_rate: float = 0.0001
lr_schedule: str = 'cosine'
max_seq_len: int = 256
num_epochs: int = 1
num_shards: int
num_workers: int = 4
output_dir: str = 'experiments/uncertainty/outputs'
seed: int
shard_id: int = 0
warmup_steps: int = 100
weights_dir: str = '/vol/bitbucket/tjt25/olmo2-7b-instruct-weights'