Skip to content

feat: power & thermal benchmark infrastructure#164

Merged
john-rocky merged 1 commit into
mainfrom
feat/power-bench-infra
Apr 30, 2026
Merged

feat: power & thermal benchmark infrastructure#164
john-rocky merged 1 commit into
mainfrom
feat/power-bench-infra

Conversation

@john-rocky
Copy link
Copy Markdown
Owner

Summary

Adds mJ/token energy measurement, 30s thermal trajectory sampling, and CSV export to the sample app's Bench tool — plus the methodology doc and device-agnostic runbook.

Code (Examples/CoreMLLLMChat)

  • BenchmarkResult gains thermalTrajectory, mJPerToken, timeToFair, timeToSerious, drainedPerHour, batteryCapacityWh (iPhone 17 Pro default 14.03 Wh, override per-device), and a csv() exporter with per-30s rows + # summary block.
  • Bench menu adds power-reporting presets: 2 min (speed), 15 min (power), 30 min, 60 min. 15 min is the minimum that gives a defensible mJ/tok given the gauge's 1 % resolution.
  • Result summary now includes Energy/token, Time→fair, Time→serious, CSV filename, and the per-30s thermal trajectory. CSV auto-saved to Documents/bench-<unix_ts>.csv.

Docs

  • docs/POWER_BENCHMARK_PLAN.md — metric tiers, test matrix, head-to-head protocol against other on-device LLM engines
  • docs/POWER_BENCH_RUNBOOK.md — shareable guide for any iPhone 15 Pro+
  • docs/BENCHMARKING.md — new mJ/tok / %/hr / thermal-trajectory section with the API reference. Legacy J/tok derivation kept under a "legacy derivation" header.

Honest caveat documented in both docs: mJ/tok comes from iOS's 1 % battery gauge, not a lab power meter — trust only for runs ≥ 10 min.

Extracted from feature/power-benchmark (commit 78def26). The README Power & Thermal table and the cross-vocab drafter commit (2f3010c) on that branch are intentionally left out — README has been reorganized since, and the table values were _pending_ placeholders.

Test plan

  • Build Examples/CoreMLLLMChat in Release on iPhone (15 Pro / 17 Pro)
  • Run Bench → 2 min (speed) — sanity check tok/s + CSV save
  • Run Bench → 15 min (power) — verify mJ/tok is non-zero and the trajectory has ~30 entries
  • Confirm CSV is retrievable via Files app under the CoreMLLLMChat Documents directory
  • Verify summary shows Time→fair / Time→serious correctly when device is cold (never) vs warm

Add mJ/token energy measurement, 30s thermal trajectory sampling, and
CSV export to the sample app's Bench tool, plus the methodology doc and
device-agnostic runbook.

LLMRunner.BenchmarkResult gains:
- thermalTrajectory: ThermalSample(t, state, batteryLevel) at 30s
- mJPerToken: derived from drainedPercent × batteryCapacityWh
  (iPhone 17 Pro nominal capacity 14.03 Wh; override per-device)
- timeToFair / timeToSerious: first elapsed second the thermal state
  transitioned away from nominal
- drainedPerHour: extrapolated from run duration
- csv(): single-string export with per-30s thermal + battery rows and
  a # summary block; written by the sample app to Documents/

Bench menu (Examples/CoreMLLLMChat) adds presets aimed at power
reporting: 2 min (speed), 15 min (power, the minimum that gives a
defensible mJ/tok given the gauge's 1% resolution), 30 min, 60 min.
The result summary now includes Energy/token, Time→fair, Time→serious,
the CSV filename, and the per-30s thermal trajectory.

Docs:
- docs/POWER_BENCHMARK_PLAN.md: metric tiers, test matrix, head-to-head
  protocol against other on-device LLM engines
- docs/POWER_BENCH_RUNBOOK.md: shareable guide for any iPhone 15 Pro+
- docs/BENCHMARKING.md: new mJ/tok / %/hr / thermal-trajectory section
  with the API reference; legacy J/tok derivation kept

Honest caveat documented in both files: mJ/tok comes from iOS's 1%
battery gauge, not a lab power meter — trust only for runs ≥ 10 min.

Extracted from feature/power-benchmark (commit 78def26). The README
Power & Thermal table and the cross-vocab drafter commit (2f3010c) on
that branch are intentionally left out — README has been reorganized
since and the table values were placeholders pending a real run.
@john-rocky john-rocky force-pushed the feat/power-bench-infra branch from c8a7aa3 to 9a1bd8d Compare April 30, 2026 02:56
@john-rocky john-rocky merged commit 930f45a into main Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant