feat: power & thermal benchmark infrastructure#164
Merged
Conversation
Add mJ/token energy measurement, 30s thermal trajectory sampling, and CSV export to the sample app's Bench tool, plus the methodology doc and device-agnostic runbook. LLMRunner.BenchmarkResult gains: - thermalTrajectory: ThermalSample(t, state, batteryLevel) at 30s - mJPerToken: derived from drainedPercent × batteryCapacityWh (iPhone 17 Pro nominal capacity 14.03 Wh; override per-device) - timeToFair / timeToSerious: first elapsed second the thermal state transitioned away from nominal - drainedPerHour: extrapolated from run duration - csv(): single-string export with per-30s thermal + battery rows and a # summary block; written by the sample app to Documents/ Bench menu (Examples/CoreMLLLMChat) adds presets aimed at power reporting: 2 min (speed), 15 min (power, the minimum that gives a defensible mJ/tok given the gauge's 1% resolution), 30 min, 60 min. The result summary now includes Energy/token, Time→fair, Time→serious, the CSV filename, and the per-30s thermal trajectory. Docs: - docs/POWER_BENCHMARK_PLAN.md: metric tiers, test matrix, head-to-head protocol against other on-device LLM engines - docs/POWER_BENCH_RUNBOOK.md: shareable guide for any iPhone 15 Pro+ - docs/BENCHMARKING.md: new mJ/tok / %/hr / thermal-trajectory section with the API reference; legacy J/tok derivation kept Honest caveat documented in both files: mJ/tok comes from iOS's 1% battery gauge, not a lab power meter — trust only for runs ≥ 10 min. Extracted from feature/power-benchmark (commit 78def26). The README Power & Thermal table and the cross-vocab drafter commit (2f3010c) on that branch are intentionally left out — README has been reorganized since and the table values were placeholders pending a real run.
c8a7aa3 to
9a1bd8d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
mJ/tokenenergy measurement, 30s thermal trajectory sampling, and CSV export to the sample app's Bench tool — plus the methodology doc and device-agnostic runbook.Code (
Examples/CoreMLLLMChat)BenchmarkResultgainsthermalTrajectory,mJPerToken,timeToFair,timeToSerious,drainedPerHour,batteryCapacityWh(iPhone 17 Pro default 14.03 Wh, override per-device), and acsv()exporter with per-30s rows +# summaryblock.mJ/tokgiven the gauge's 1 % resolution.Documents/bench-<unix_ts>.csv.Docs
docs/POWER_BENCHMARK_PLAN.md— metric tiers, test matrix, head-to-head protocol against other on-device LLM enginesdocs/POWER_BENCH_RUNBOOK.md— shareable guide for any iPhone 15 Pro+docs/BENCHMARKING.md— new mJ/tok / %/hr / thermal-trajectory section with the API reference. Legacy J/tok derivation kept under a "legacy derivation" header.Honest caveat documented in both docs:
mJ/tokcomes from iOS's 1 % battery gauge, not a lab power meter — trust only for runs ≥ 10 min.Extracted from
feature/power-benchmark(commit 78def26). The README Power & Thermal table and the cross-vocab drafter commit (2f3010c) on that branch are intentionally left out — README has been reorganized since, and the table values were_pending_placeholders.Test plan
Examples/CoreMLLLMChatin Release on iPhone (15 Pro / 17 Pro)mJ/tokis non-zero and the trajectory has ~30 entriesCoreMLLLMChatDocuments directorynever) vs warm