Skip to content

Release v0.3.6#66

Merged
youssofal merged 8 commits into
mainfrom
codex/release-v0.3.6
May 14, 2026
Merged

Release v0.3.6#66
youssofal merged 8 commits into
mainfrom
codex/release-v0.3.6

Conversation

@youssofal
Copy link
Copy Markdown
Owner

Summary

Release v0.3.6 as the production patch over v0.3.5.

This branch carries the bounded-memory/OpenCode fixes, ports Tune into the packaged CLI, fixes verified-default onboarding/model labeling, and hardens mtplx bench tune so chip diagnostics show the exact model path and generation-window telemetry when available.

Core Pillars

  • Decode TPS: protected by the existing runtime KPI and focused MTP depth tests; no release change intentionally alters draft/verify semantics after the measured memory fix.
  • Prefill/TTFT: bounded KV reservation preserves prompt-context allocation and only caps huge initial new-token reserve.
  • Memory: AIME-shaped max_tokens=65536 no longer reserves the full decode window up front, and anonymous one-off sessions do not retain full-capacity live cache refs.
  • CLI UX: checked through actual packaged commands, including mtplx, mtplx-tune, OpenCode, Pi, and bench tune dry-run paths.

User-Facing Changes

  • mtplx tune, mtplx-tune, and mtplx bench tune are available from the release package.
  • mtplx start verified default now points at the installed Optimized Speed/Q4 artifact instead of prompting a bogus install.
  • Tune advice is shown before measurements start, not after the benchmark has already run.
  • bench tune prints exact model source notes, has --no-telemetry, waits between candidates, and scopes telemetry to generation windows when samples land inside decode.
  • README now avoids claiming the speed multiplier is hardware-independent.

Validation

  • python3 -m compileall -q mtplx tests scripts
  • uv run --extra dev python -m ruff check
  • uv run --extra dev python -m pytest -q
  • uv run --extra dev python -m twine check dist/*
  • scripts/fresh_venv_smoke.sh
  • git diff --check
  • mtplx --version -> mtplx 0.3.6 (0.3.6)
  • mtplx start opencode --dry-run --json --model models/example --yes
  • mtplx start pi --dry-run --json --model models/example --yes
  • mtplx-tune --model models/not-loaded-in-dry-run --dry-run --yes
  • mtplx bench tune --model models/not-loaded-in-dry-run --dry-run --json --yes --no-telemetry

Real Hardware Evidence Already Run On This Branch

  • bench tune against /Users/youssof/.mtplx/hf-upload/Qwen3.6-27B-MTPLX-Optimized
  • D3 selected at 54.51 tok/s, 2.22x AR
  • D3 telemetry: scope=generation, gpu=72.0W, GPU=98.4%, window=3.5s
  • Fans restored to auto afterward.

Known Non-Claims

@youssofal youssofal merged commit 1d6a7b7 into main May 14, 2026
3 checks passed
@youssofal youssofal deleted the codex/release-v0.3.6 branch May 14, 2026 23:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant