olmo_tap.final_evalsΒΆ
Modules
Tournament 1 Elo evaluation harness. |
|
Evaluate accuracy across robustness weight checkpoints with PoE. |
|
Evaluate the calibration of the uncertainty head using PoE. |
Modules
Tournament 1 Elo evaluation harness. |
|
Evaluate accuracy across robustness weight checkpoints with PoE. |
|
Evaluate the calibration of the uncertainty head using PoE. |