Bump dflash-mlx from 0.1.0 to 0.1.6 by dependabot[bot] · Pull Request #69 · youssofal/MTPLX

dependabot · 2026-05-16T10:33:00Z

Bumps dflash-mlx from 0.1.0 to 0.1.6.

Release notes

dflash-mlx v0.1.6

Large runtime, server, and agentic-workflow release since v0.1.5, including the v0.1.5.1 fixes.

Highlights

Reworked runtime ownership around typed runtime config, RuntimeBundle, ServerRuntime, target adapters, draft loading, cache management, and observability.

Default verify policy is now adaptive; fixed DFlash verification is available as --verify-mode dflash.

Added explicit verify modes: adaptive, dflash, ddtree, and off.

Added DDTree branch verification mode for Qwen target paths.

Added internal CopySpec candidate reuse for repeated-token continuation from prompt/generated history.

Added target-owned Qwen and Gemma4 backend routing, with unknown model families failing closed instead of falling into generic logic.

Added Gemma4 adapter support for cache construction, logits, hidden capture, GQA routing, and guarded prefix snapshots.

Added minimal Qwen3-Next fused-GDN projection support in Qwen target verification paths. This is source-level support, not a fully optimized public target claim.

Moved long-context attention routing behind target adapters; public split-SDPA switches are gone.

Productized verify_qmm through runtime config and target capabilities, with stock MLX fallback for unsupported shapes.

Large registered DFlash drafts now default to in-memory w4; use --draft-quant none for bf16/non-quant A/B.

Added old Apple chip handling so quantized DFlash drafts use fp16 floating tensors on BF16-emulated chips.

Prefix cache is now a managed L1+L2 snapshot service with stable-prefix lookup, L2 promotion, validation, budgets, and server metrics.

Added explicit target-only fallback when DFlash context limits are exceeded, with fallback state and physical prefill accounting.

Hardened the OpenAI-compatible server for OpenCode, aider, Continue, Open WebUI, LM Studio through its OpenAI-compatible adapter, and other OpenAI-compatible clients.

Added stricter Chat Completions tool-call handling, including streamed delta.tool_calls, Qwen XML spans, Gemma4 spans, JSON fallback, and fail-fast validation for malformed or undeclared tool calls.

Added minimal non-streaming /v1/responses compatibility for text input and function-call tools.

Added live /metrics, structured diagnostics, memory reporting, request summaries, and prefix-cache observability.

Added agentic trace/replay lab tooling for real OpenAI-compatible client sessions such as OpenCode/pi.

Short-output target-only AR fast path is now opt-in with --fastpath-max-tokens N; default serving keeps requests on the DFlash path.

Switched license to Apache-2.0.

Breaking Changes

--verify-mode auto was removed. Use --verify-mode dflash for fixed DFlash verification.

Public --split-sdpa and --no-split-sdpa controls were removed; attention routing is now target-owned.

dflash profiles and old profile/env resolution behavior were removed.

Old top-level generation invocation is rejected; use dflash generate.

Removed legacy benchmark modes and old diagnostic aliases; use documented benchmark flags and --diagnostics.

Runtime internals moved under dflash_mlx/runtime/; old runtime import paths are gone.

/v1/responses is intentionally limited: no streaming, multimodal input, Responses-native reasoning/text/truncation controls, tool_choice, parallel_tool_calls, previous_response_id, or persistent store.

Function-specific Chat Completions tool_choice and parallel_tool_calls: false are rejected.

target_fa_window > 0 disables prefix cache/L2 by design.

Bumping to 0.1.6 invalidates older L2 prefix snapshots through runtime-version validation, so they rebuild.

Upgrade Notes

Use explicit runtime flags instead of old profiles.

Use --fastpath-max-tokens N only when you intentionally want target-only AR for very short server responses.

Treat tools/benchmarks/agentic_trace as diagnostic/lab tooling, not as the public benchmark surface.

Public benchmark claims should continue to come from dflash benchmark.

v0.1.5.1 — benchmark hotfix

Hotfix

... (truncated)

Commits

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [dflash-mlx](https://github.com/bstnxbt/dflash-mlx) from 0.1.0 to 0.1.6. - [Release notes](https://github.com/bstnxbt/dflash-mlx/releases) - [Commits](https://github.com/bstnxbt/dflash-mlx/commits/v0.1.6) --- updated-dependencies: - dependency-name: dflash-mlx dependency-version: 0.1.6 dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 16, 2026

dependabot Bot requested a review from youssofal as a code owner May 16, 2026 10:33

dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump dflash-mlx from 0.1.0 to 0.1.6#69

Bump dflash-mlx from 0.1.0 to 0.1.6#69
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/dflash-mlx-0.1.6

dependabot Bot commented on behalf of github May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

dependabot Bot commented on behalf of github May 16, 2026

dflash-mlx v0.1.6

Highlights

Breaking Changes

Upgrade Notes

v0.1.5.1 — benchmark hotfix

Hotfix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants