docs(bonsai): post-mortem + ANE decode-state lessons by john-rocky · Pull Request #163 · john-rocky/CoreML-LLM

john-rocky · 2026-04-29T23:53:42Z

Summary

The Qwen3 architecture support shipped in #162 came out of an investigation into porting prism-ml/Ternary-Bonsai-1.7B to ANE. Bonsai didn't ship — its per-(row, block) ternary scales can't be faithfully represented by ANEC (error -14 on per-block LUT palettization, and any stock-API approximation collapses the scales into a rank-1 outer product). This PR captures the post-mortem and the reusable lessons that came out of the failed port.

What lands

docs/TERNARY_BONSAI.md — full post-mortem: what was tried, why each path failed, what the right path is (mlx-lm with prism-ml/Ternary-Bonsai-1.7B-mlx-2bit).
docs/DECODE_STATE_LAYOUTS.md — ANE decode-state catalog. The headline finding: per-step decode cost on ANE is O(state_length), not weight-bandwidth (halving ctx 2048→1024 = 2.56× speedup, halving weights INT8→INT4 at same ctx = +12% only). Includes the mask-based rotating buffer pattern, palettization traps, and ternary-on-ANE checklist.
docs/GEMMA4_ROTATING_BUFFER_PORT.md — design note for applying the mask-based rotating buffer to Gemma 4's full-attention layers.
docs/NEXT_MODELS.md — shortlist: Qwen3-1.7B, Gemma 3 4B QAT, Llama-3.2, SmolLM3.
docs/ADDING_MODELS.md — new §4.5 KV state-layout checklist.
docs/ANE_OPTIMIZATION_SURVEY.md — cross-ref to the ctx > weight-bandwidth finding.
conversion/experiments/bonsai/ — 8 research scripts (oracle, ternary surgery, SWA comparisons, decode-chunks builder) kept as breadcrumbs.
conversion/config.py — NOTE comment in MODEL_REGISTRY explaining why Bonsai is intentionally absent.

Extracted from feat/qwen3-bonsai-investigation (commit 56ee545). Companion to #162 (Qwen3 architecture).

Test plan

Doc links resolve (no broken cross-refs between the four new docs)
python conversion/convert.py --list still works (config.py NOTE is a Python comment, no syntax change)

Adds the post-mortem for the ternary Bonsai investigation and the ANE decode-path lessons it produced. The Qwen3 architecture support that came out of the same investigation lands separately. Why Bonsai didn't ship: - prism-ml/Ternary-Bonsai-1.7B's compression depends on per-(row, block) independent scales (g=64). ANEC rejects that LUT granularity with error -14, and the stock-API per-block approximation factorizes scales into a rank-1 outer product, defeating the model's design. - For Apple Silicon, the GPU path (mlx-lm with the official Ternary-Bonsai-1.7B-mlx-2bit) is the only honest option. What lands as reusable infrastructure: - docs/TERNARY_BONSAI.md: full post-mortem (what was tried, why each failed, what the right path is) - docs/DECODE_STATE_LAYOUTS.md: ANE decode-state catalog — mask-based rotating buffer pattern, ctx > weight bandwidth result, palettization traps, ternary-on-ANE checklist - docs/GEMMA4_ROTATING_BUFFER_PORT.md: design note for porting our mask-based rotating buffer to Gemma 4's full-attention layers - docs/NEXT_MODELS.md: shortlist (Qwen3-1.7B, Gemma 3 4B QAT, Llama-3.2, SmolLM3) for the next port - docs/ADDING_MODELS.md: §4.5 KV state-layout checklist - docs/ANE_OPTIMIZATION_SURVEY.md: cross-reference to the ctx>weights finding - conversion/experiments/bonsai/: research scripts (oracle, ternary surgery, SWA comparisons, decode-chunks builder) retained as breadcrumbs in case anyone retraces the path - conversion/config.py: NOTE comment in MODEL_REGISTRY explaining why Bonsai is intentionally absent and pointing readers to the doc Extracted from feat/qwen3-bonsai-investigation (commit 56ee545).

john-rocky merged commit 48a245c into main Apr 30, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(bonsai): post-mortem + ANE decode-state lessons#163

docs(bonsai): post-mortem + ANE decode-state lessons#163
john-rocky merged 1 commit into
mainfrom
docs/bonsai-postmortem

john-rocky commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

john-rocky commented Apr 29, 2026

Summary

What lands

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant