Skip to content

fix(llm): add cap to tool-call JSON and thinking accumulation buffers in SSE drainer #4727

@bug-ops

Description

@bug-ops

Description

The SSE stream drainer in crates/zeph-llm/src/sse.rs accumulates tool-call JSON input and thinking text without any size bound:

  • state.tool_block (line 322): json.push_str(&delta.partial_json) — no cap
  • state.thinking_block (line 333): t.push_str(&delta.thinking) — no cap

By contrast, the compaction buffer has an explicit 32 KiB cap (MAX_COMPACTION_BUF = 32 * 1024, lines 170 and 309) with a warning log and excess discard.

A malicious or pathological server response could stream an unbounded input_json_delta or thinking_delta sequence, causing unbounded heap growth.

For compaction buffers this is already protected. The same protection should be applied to tool-call JSON and thinking accumulation buffers.

Expected Behavior

Add a MAX_TOOL_JSON_BUF (e.g., 4 MiB) and MAX_THINKING_BUF (e.g., 1 MiB) constants with the same guard/discard pattern used for the compaction buffer.

Actual Behavior

input_json_delta and thinking_delta accumulate without bound.

Environment

Logs / Evidence

// crates/zeph-llm/src/sse.rs:322
if let Some((_, _, _, ref mut json)) = state.tool_block {
    json.push_str(&delta.partial_json);  // no cap
}
// line 333
if let Some((ref mut t, _)) = state.thinking_block {
    t.push_str(&delta.thinking);  // no cap
}

Compare with capped compaction at line 170.

Metadata

Metadata

Assignees

Labels

P3Research — medium-high complexitybugSomething isn't workingllmzeph-llm crate (Ollama, Claude)

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions