Release v0.3.6 by youssofal · Pull Request #66 · youssofal/MTPLX

youssofal · 2026-05-14T23:47:48Z

Summary

Release v0.3.6 as the production patch over v0.3.5.

This branch carries the bounded-memory/OpenCode fixes, ports Tune into the packaged CLI, fixes verified-default onboarding/model labeling, and hardens mtplx bench tune so chip diagnostics show the exact model path and generation-window telemetry when available.

Core Pillars

Decode TPS: protected by the existing runtime KPI and focused MTP depth tests; no release change intentionally alters draft/verify semantics after the measured memory fix.
Prefill/TTFT: bounded KV reservation preserves prompt-context allocation and only caps huge initial new-token reserve.
Memory: AIME-shaped max_tokens=65536 no longer reserves the full decode window up front, and anonymous one-off sessions do not retain full-capacity live cache refs.
CLI UX: checked through actual packaged commands, including mtplx, mtplx-tune, OpenCode, Pi, and bench tune dry-run paths.

User-Facing Changes

mtplx tune, mtplx-tune, and mtplx bench tune are available from the release package.
mtplx start verified default now points at the installed Optimized Speed/Q4 artifact instead of prompting a bogus install.
Tune advice is shown before measurements start, not after the benchmark has already run.
bench tune prints exact model source notes, has --no-telemetry, waits between candidates, and scopes telemetry to generation windows when samples land inside decode.
README now avoids claiming the speed multiplier is hardware-independent.

Validation

python3 -m compileall -q mtplx tests scripts
uv run --extra dev python -m ruff check
uv run --extra dev python -m pytest -q
uv run --extra dev python -m twine check dist/*
scripts/fresh_venv_smoke.sh
git diff --check
mtplx --version -> mtplx 0.3.6 (0.3.6)
mtplx start opencode --dry-run --json --model models/example --yes
mtplx start pi --dry-run --json --model models/example --yes
mtplx-tune --model models/not-loaded-in-dry-run --dry-run --yes
mtplx bench tune --model models/not-loaded-in-dry-run --dry-run --json --yes --no-telemetry

Real Hardware Evidence Already Run On This Branch

bench tune against /Users/youssof/.mtplx/hf-upload/Qwen3.6-27B-MTPLX-Optimized
D3 selected at 54.51 tok/s, 2.22x AR
D3 telemetry: scope=generation, gpu=72.0W, GPU=98.4%, window=3.5s
Fans restored to auto afterward.

Known Non-Claims

This PR does not claim the unrelated draft PR test: lock in real-world 3-consecutive-tool_call streaming regression #62 is ready.
Issue Reproducing the 2.24× speedup: documented MLX patch alone doesn't get there — please publish the actual fork commit #64 remains a release-copy honesty topic; README claims were tightened here, but unavailable fork/patch proof is not invented.
Issues GLM-4.7 PRISM loads and passes inspect after fixing num_nextn_predict_layers, but MTP generation crashes with AttributeError: LanguageModel has no attribute fa_idx #65, Add PFlash/DFlash Support #56, and Gemma compatibility #16 are not fixed by this release unless their paths are incidentally covered by existing tests.

youssofal added 8 commits May 14, 2026 22:19

Prepare v0.3.6 release candidate

a15b81c

Fix verified default onboarding for local speed model

cccd6c8

Fix tune artifact paths and speed model id

e48046e

Move tune pre-run advice before measurements

1a1ee9a

Add bench tune hardware telemetry diagnostics

13b8759

Fix bench tune model and telemetry reporting

f3e91dc

Scope bench tune telemetry to generation

9213c68

Tighten v0.3.6 release copy

07e2c56

youssofal merged commit 1d6a7b7 into main May 14, 2026
3 checks passed

youssofal deleted the codex/release-v0.3.6 branch May 14, 2026 23:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v0.3.6#66

Release v0.3.6#66
youssofal merged 8 commits into
mainfrom
codex/release-v0.3.6

youssofal commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

youssofal commented May 14, 2026

Summary

Core Pillars

User-Facing Changes

Validation

Real Hardware Evidence Already Run On This Branch

Known Non-Claims

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant