Skip to content

Bump dflash-mlx from 0.1.0 to 0.1.6#69

Open
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/dflash-mlx-0.1.6
Open

Bump dflash-mlx from 0.1.0 to 0.1.6#69
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/dflash-mlx-0.1.6

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot Bot commented on behalf of github May 16, 2026

Bumps dflash-mlx from 0.1.0 to 0.1.6.

Release notes

Sourced from dflash-mlx's releases.

dflash-mlx v0.1.6

Large runtime, server, and agentic-workflow release since v0.1.5, including the v0.1.5.1 fixes.

Highlights

  • Reworked runtime ownership around typed runtime config, RuntimeBundle, ServerRuntime, target adapters, draft loading, cache management, and observability.
  • Default verify policy is now adaptive; fixed DFlash verification is available as --verify-mode dflash.
  • Added explicit verify modes: adaptive, dflash, ddtree, and off.
  • Added DDTree branch verification mode for Qwen target paths.
  • Added internal CopySpec candidate reuse for repeated-token continuation from prompt/generated history.
  • Added target-owned Qwen and Gemma4 backend routing, with unknown model families failing closed instead of falling into generic logic.
  • Added Gemma4 adapter support for cache construction, logits, hidden capture, GQA routing, and guarded prefix snapshots.
  • Added minimal Qwen3-Next fused-GDN projection support in Qwen target verification paths. This is source-level support, not a fully optimized public target claim.
  • Moved long-context attention routing behind target adapters; public split-SDPA switches are gone.
  • Productized verify_qmm through runtime config and target capabilities, with stock MLX fallback for unsupported shapes.
  • Large registered DFlash drafts now default to in-memory w4; use --draft-quant none for bf16/non-quant A/B.
  • Added old Apple chip handling so quantized DFlash drafts use fp16 floating tensors on BF16-emulated chips.
  • Prefix cache is now a managed L1+L2 snapshot service with stable-prefix lookup, L2 promotion, validation, budgets, and server metrics.
  • Added explicit target-only fallback when DFlash context limits are exceeded, with fallback state and physical prefill accounting.
  • Hardened the OpenAI-compatible server for OpenCode, aider, Continue, Open WebUI, LM Studio through its OpenAI-compatible adapter, and other OpenAI-compatible clients.
  • Added stricter Chat Completions tool-call handling, including streamed delta.tool_calls, Qwen XML spans, Gemma4 spans, JSON fallback, and fail-fast validation for malformed or undeclared tool calls.
  • Added minimal non-streaming /v1/responses compatibility for text input and function-call tools.
  • Added live /metrics, structured diagnostics, memory reporting, request summaries, and prefix-cache observability.
  • Added agentic trace/replay lab tooling for real OpenAI-compatible client sessions such as OpenCode/pi.
  • Short-output target-only AR fast path is now opt-in with --fastpath-max-tokens N; default serving keeps requests on the DFlash path.
  • Switched license to Apache-2.0.

Breaking Changes

  • --verify-mode auto was removed. Use --verify-mode dflash for fixed DFlash verification.
  • Public --split-sdpa and --no-split-sdpa controls were removed; attention routing is now target-owned.
  • dflash profiles and old profile/env resolution behavior were removed.
  • Old top-level generation invocation is rejected; use dflash generate.
  • Removed legacy benchmark modes and old diagnostic aliases; use documented benchmark flags and --diagnostics.
  • Runtime internals moved under dflash_mlx/runtime/; old runtime import paths are gone.
  • /v1/responses is intentionally limited: no streaming, multimodal input, Responses-native reasoning/text/truncation controls, tool_choice, parallel_tool_calls, previous_response_id, or persistent store.
  • Function-specific Chat Completions tool_choice and parallel_tool_calls: false are rejected.
  • target_fa_window > 0 disables prefix cache/L2 by design.
  • Bumping to 0.1.6 invalidates older L2 prefix snapshots through runtime-version validation, so they rebuild.

Upgrade Notes

  • Use explicit runtime flags instead of old profiles.
  • Use --fastpath-max-tokens N only when you intentionally want target-only AR for very short server responses.
  • Treat tools/benchmarks/agentic_trace as diagnostic/lab tooling, not as the public benchmark surface.
  • Public benchmark claims should continue to come from dflash benchmark.

v0.1.5.1 — benchmark hotfix

Hotfix

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [dflash-mlx](https://github.com/bstnxbt/dflash-mlx) from 0.1.0 to 0.1.6.
- [Release notes](https://github.com/bstnxbt/dflash-mlx/releases)
- [Commits](https://github.com/bstnxbt/dflash-mlx/commits/v0.1.6)

---
updated-dependencies:
- dependency-name: dflash-mlx
  dependency-version: 0.1.6
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 16, 2026
@dependabot dependabot Bot requested a review from youssofal as a code owner May 16, 2026 10:33
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants