feat(ci): ane-residency-gate CLI#160
Merged
Merged
Conversation
Standalone executable that loads each chunk{1..4}.mlpackage in a model
directory, queries MLComputePlan for per-op device placement, and exits
non-zero if any chunk's ANE op fraction drops below a threshold
(default 99.5%). Writes/diffs a JSON baseline so PRs can detect silent
ANE→CPU/GPU drift in committed mlpackages.
Why: it's the lossless-conversion regression signal we were missing.
Wall-clock benches catch big drops, but a single op flipping off ANE
can shave 5–10% throughput and survive a PR unnoticed.
Changes:
- Sources/ane-residency-gate/main.swift: argv parser, JSON baseline
read/write, threshold gate, PASS/FAIL exit codes
- Package.swift: add AneResidencyGate executable + product
- Sources/CoreMLLLM/ComputePlanAudit.swift:
* make ComputePlanAudit + ChunkResult public so the CLI can call it
* add audit() that returns [ChunkResult] without console logging
* extract isConstantOp() helper that strips iosNN.* namespace prefix,
so palettized weight-loads (ios18.constexpr_lut_to_dense) are
correctly classified as load-time constants in walkDrafterBlock
(walkBlock already had this; this brings the drafter path in line)
- docs/ane_residency_baseline_2026-04-19.json: 4-chunk Gemma 4 baseline
Usage:
swift run ane-residency-gate \
--model-dir /path/to/gemma4-e2b \
--threshold 0.995 \
--baseline docs/ane_residency_baseline_2026-04-19.json \
--write-baseline /tmp/current.json
03d48dc to
1322a3d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ane-residency-gateexecutable: loads eachchunk{1..4}.mlpackagein a model dir, queriesMLComputePlanper-op, exits non-zero if any chunk's ANE op fraction is below threshold (default 99.5%).ComputePlanAudit+ a newChunkResulttype public, adds a non-loggingaudit()API, and unifies theiosNN.*namespace strip sowalkDrafterBlockmatcheswalkBlock's constant-op detection.Extracted from
feat/litert-perf-adoptions(commit 8598388). The other items in that branch (S1/S2/T1/T3/T4/T5) will land in separate PRs.Test plan
swift buildon macos-15swift run ane-residency-gate --model-dir ~/Documents/CoreMLLLM/gemma4-e2b --baseline docs/ane_residency_baseline_2026-04-19.jsonshows 0 drift on the committed bundle--threshold 1.0fails (chunk3 has 1 non-ANE op, chunk4 has 9) — confirms gate triggers