Restrict think-state scan to assistant prefill tail by eilidhmae · Pull Request #1277 · ml-explore/mlx-lm

eilidhmae · 2026-05-15T16:31:47Z

ResponseGenerator._tokenize chose initial_state for the generation
state machine by scanning the entire tokenized prompt for the model's
<think> / </think> tokens. When a user message contained a literal
<think> with no matching </think>, the scan flipped initial_state
to "reasoning" for the whole generation, routing every emitted token
to message.reasoning and leaving message.content absent.

The intent of the scan is to detect whether the chat template injected
a <think> in the assistant prefill — a tail-of-prompt question. The
same function already has a tail-bounded scan (last 11 tokens) below
this one for segment finalisation, so the unbounded form was the
outlier. Bound the initial-state scan with start=max(0, len(prompt)-11)
to match.

Repro without this fix: send a chat completion to a server running a
model whose tokenizer has <think>/</think> in its vocab (e.g. any
Qwen3-family checkpoint), with a user message containing the literal
string <think> anywhere. finish_reason is "stop", completion_tokens
is nonzero, but message.content is absent and message.reasoning carries
the output. With the fix, output lands in message.content as expected.

Adds two regression tests in tests/test_server.py:

test_tokenize_user_think_does_not_flip_initial_state (the bug case)
test_tokenize_prefill_think_flips_initial_state (positive guard so
a future over-bound doesn't silently break the intended behaviour).

Generated with Claude Code

ResponseGenerator._tokenize chose initial_state for the generation state machine by scanning the entire tokenized prompt for the model's <think> / </think> tokens. When a user message contained a literal "<think>" with no matching "</think>", the scan flipped initial_state to "reasoning" for the whole generation, routing every emitted token to message.reasoning and leaving message.content absent. The intent of the scan is to detect whether the chat template injected a <think> in the assistant prefill — a tail-of-prompt question. The same function already has a tail-bounded scan (last 11 tokens) below this one for segment finalisation, so the unbounded form was the outlier. Bound the initial-state scan with start=max(0, len(prompt)-11) to match. Repro without this fix: send a chat completion to a server running a model whose tokenizer has <think>/</think> in its vocab (e.g. any Qwen3-family checkpoint), with a user message containing the literal string "<think>" anywhere. finish_reason is "stop", completion_tokens is nonzero, but message.content is absent and message.reasoning carries the output. With the fix, output lands in message.content as expected. Adds two regression tests in tests/test_server.py: - test_tokenize_user_think_does_not_flip_initial_state (the bug case) - test_tokenize_prefill_think_flips_initial_state (positive guard so a future over-bound doesn't silently break the intended behaviour).

Three verdict-extracting grep pipelines in tools/bash/adversary-pass.sh and tools/bash/adversary-scan.sh die when $REVIEW / $OUT are empty: grep returns 1, pipefail propagates the failure through the command substitution, and `set -e` aborts the script before the `[[ -z "$VERDICT" ]] && VERDICT="UNKNOWN"` fallbacks can run. In practice this silently dropped one review per affected file AND killed the rest of the scan batch — Phase 3's install.sh case hit this because mlx_lm.server 0.31.3 bifurcated output into message.reasoning for prompts containing literal `<think>` tokens (see ml-explore/mlx-lm#1277). Append `|| true` to each grep pipeline so empty input cleanly yields an empty assignment, and the existing UNKNOWN-default fallbacks become reachable. PEER_VERDICT (adversary-pass.sh:333) already had the equivalent `|| echo "UNKNOWN"` guard; this commit brings the other three sites in line.

eilidhmae mentioned this pull request May 15, 2026

UX feedback: TaskCreate reminder cadence + permission-block opacity anthropics/claude-code#59479

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restrict think-state scan to assistant prefill tail#1277

Restrict think-state scan to assistant prefill tail#1277
eilidhmae wants to merge 1 commit into
ml-explore:mainfrom
eilidhmae:fix-think-state-user-content

eilidhmae commented May 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eilidhmae commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eilidhmae commented May 15, 2026 •

edited

Loading