Skip to content

feat(ci): ane-residency-gate CLI#160

Merged
john-rocky merged 1 commit into
mainfrom
feat/ane-residency-gate-cli
Apr 30, 2026
Merged

feat(ci): ane-residency-gate CLI#160
john-rocky merged 1 commit into
mainfrom
feat/ane-residency-gate-cli

Conversation

@john-rocky
Copy link
Copy Markdown
Owner

Summary

  • New ane-residency-gate executable: loads each chunk{1..4}.mlpackage in a model dir, queries MLComputePlan per-op, exits non-zero if any chunk's ANE op fraction is below threshold (default 99.5%).
  • Reads/writes a JSON baseline so PRs can diff committed mlpackages and detect silent ANE→CPU/GPU drift that wall-clock benches would miss.
  • Makes ComputePlanAudit + a new ChunkResult type public, adds a non-logging audit() API, and unifies the iosNN.* namespace strip so walkDrafterBlock matches walkBlock's constant-op detection.

Extracted from feat/litert-perf-adoptions (commit 8598388). The other items in that branch (S1/S2/T1/T3/T4/T5) will land in separate PRs.

Test plan

  • swift build on macos-15
  • swift run ane-residency-gate --model-dir ~/Documents/CoreMLLLM/gemma4-e2b --baseline docs/ane_residency_baseline_2026-04-19.json shows 0 drift on the committed bundle
  • --threshold 1.0 fails (chunk3 has 1 non-ANE op, chunk4 has 9) — confirms gate triggers

Standalone executable that loads each chunk{1..4}.mlpackage in a model
directory, queries MLComputePlan for per-op device placement, and exits
non-zero if any chunk's ANE op fraction drops below a threshold
(default 99.5%). Writes/diffs a JSON baseline so PRs can detect silent
ANE→CPU/GPU drift in committed mlpackages.

Why: it's the lossless-conversion regression signal we were missing.
Wall-clock benches catch big drops, but a single op flipping off ANE
can shave 5–10% throughput and survive a PR unnoticed.

Changes:
- Sources/ane-residency-gate/main.swift: argv parser, JSON baseline
  read/write, threshold gate, PASS/FAIL exit codes
- Package.swift: add AneResidencyGate executable + product
- Sources/CoreMLLLM/ComputePlanAudit.swift:
  * make ComputePlanAudit + ChunkResult public so the CLI can call it
  * add audit() that returns [ChunkResult] without console logging
  * extract isConstantOp() helper that strips iosNN.* namespace prefix,
    so palettized weight-loads (ios18.constexpr_lut_to_dense) are
    correctly classified as load-time constants in walkDrafterBlock
    (walkBlock already had this; this brings the drafter path in line)
- docs/ane_residency_baseline_2026-04-19.json: 4-chunk Gemma 4 baseline

Usage:
  swift run ane-residency-gate \
    --model-dir /path/to/gemma4-e2b \
    --threshold 0.995 \
    --baseline docs/ane_residency_baseline_2026-04-19.json \
    --write-baseline /tmp/current.json
@john-rocky john-rocky force-pushed the feat/ane-residency-gate-cli branch from 03d48dc to 1322a3d Compare April 30, 2026 02:55
@john-rocky john-rocky merged commit 0a073fb into main Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant