Beginning of output truncated when --thinking

If I add the `--thinking` flag to this, then the client doesn't get the beginning `<think>` tag from the model.

For tool calls the client actually gets the right output at first, but then model doesn't see it in its history that it sent the first line or so, and then begins trying to omit the first line intentionally to match what it sees that it thinks worked.

Without thinking, tool calls succeed and the output looks correct in the client.

```sh
./SwiftLM \
   --model ~/.lmstudio/models/mlx-community/Qwen3.6-27B-OptiQ-4bit/ \
   --host 0.0.0.0 \
   --port 11234 \
   --api-key "$(head -n 1 ~/.config/agents/tokens.txt | tr -d ' \t\r\n')" \
   --gpu-layers auto \
   --prefill-size 64 \
   --ctx-size 130000
```

I doubt it's relevant, but I'm on a Mac Studio M2 Max 64gb (also `--prefill-size 128` was too high and I was getting GPU timeouts).

**sidenote**: vllm-mlx has the same problem. I wonder if it's a config or usage issue of MLX. llama w/ metal and LM Studio (GGUF or MLX) don't have this problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Beginning of output truncated when --thinking #108

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Beginning of output truncated when --thinking #108

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions