olmo_tap.benchmarks.plotting¶
Plot the three-config benchmark output produced by
olmo_tap.benchmarks.inference.
A single figure with three subplots:
TTFT distribution — KDE + histogram of prefill (time-to-first-token) timings for
baselineandhydra_naive. PoE is excluded because prefill is bundled inside its full-generation call.Decode latency vs KV position — per-step decode latency (ms) over the benchmarked KV positions. PoE is rendered as horizontal dashed lines, one per γ, at its per-token equivalent latency (call median / accepted tokens) so it sits on the same axis as the per-step rows.
TPS vs KV position — tokens per second, same shape. PoE rendered as horizontal dashed lines at its effective TPS per γ.
Colour convention: orange = baseline, blue = naive Hydra, green family = PoE (darker greens for larger γ).
Usually invoked indirectly — olmo_tap.benchmarks.inference.main() calls
plot_results() after writing results.json.
Functions
|
|
|
|
|
Render the three-config comparison figure to |
- olmo_tap.benchmarks.plotting.plot_decode_curve(ax_latency, ax_tps, decode_results, label, color)[source]¶
- olmo_tap.benchmarks.plotting.plot_results(results, output_dir)[source]¶
Render the three-config comparison figure to
graph.png.- Parameters:
results – Benchmark results dict as written to
results.jsonbyolmo_tap.benchmarks.inference.main(). Recognised top-level keys:baseline,hydra_naive,hydra_poe. Missing keys are skipped silently — a partial run still plots.output_dir – Directory to write
graph.pnginto. Must exist.