Skip to content

fix(harmony): normalize gpt-oss tool transcripts#1547

Open
ai-jz wants to merge 4 commits into
mainfrom
ai-jz/gpt-oss-harmony-tool-transcript
Open

fix(harmony): normalize gpt-oss tool transcripts#1547
ai-jz wants to merge 4 commits into
mainfrom
ai-jz/gpt-oss-harmony-tool-transcript

Conversation

@ai-jz
Copy link
Copy Markdown
Collaborator

@ai-jz ai-jz commented May 26, 2026

Description

Problem

In GPT-OSS tool-calling flows through the SMG gRPC path, the request can succeed with HTTP 200 and finish_reason="stop", but the final assistant message after a tool result can be malformed. The visible message.content may contain repeated fragments, invisible-character loops, or model self-repair / analysis-style text that should not be exposed as the final answer.

The same request shape did not reproduce through the native vLLM OpenAI HTTP path with the same vLLM version. That points to a transcript construction / parsing difference in the SMG gRPC path rather than a backend crash or transport error.

Solution

This PR keeps the fix as a focused Harmony round-trip update. For the reproduced SMG+vLLM GPT-OSS failure mode, the changes address distinct parts of the broken tool round trip:

  • Normalize JSON tool-call arguments so the assistant tool-call message carried into the next turn matches the compact OpenAI/vLLM-style shape.
  • Mark Chat and Responses tool-result messages with the Harmony commentary channel so SMG does not feed GPT-OSS a channel-less tool result.
  • Drop channel-less or unknown-channel parser residuals so malformed post-tool continuations do not escape as successful final text.

In failure-chain terms, non-canonical tool-call args plus a channel-less tool result can produce a malformed post-tool continuation, and the old parser could surface malformed residual text as normal final content. A separate reasoning_effort ablation did not affect this failure mode, so that change was removed.

I considered splitting these into smaller PRs. The latest leave-one-out testing shows argument normalization is required for the reproduced symptom, and earlier parser-side-only testing showed that parser fixes alone were not enough without the builder-side tool-result channel fix. Parser hardening is included as the response-boundary guard for the same malformed Harmony residual class; a follow-up leave-one-out run with argument normalization plus tool-result channeling passed 3/3 without that hardening, so I am open to splitting the hardening into a separate PR if maintainers prefer.

Changes

  • Add JSON argument normalization in the Harmony parser for assistant function calls and incomplete commentary tool calls.
  • Preserve non-JSON or malformed argument strings unchanged.
  • Set channel="commentary" on Chat and Responses tool-result messages.
  • Drop channel-less or unknown-channel parser residuals instead of appending them to final text.
  • Add targeted unit tests for parser normalization, parser hardening, and tool-result metadata.

Test Plan

cargo +nightly fmt --all -- --check
cargo test -p smg --lib routers::grpc::harmony::parser::tests -- --nocapture
cargo test -p smg --lib routers::grpc::harmony::builder::tests -- --nocapture

Additional validation was run on an H100 node against GPT-OSS 120B served by vLLM gRPC:

  • Unpatched SMG in front of vLLM gRPC reproduced the malformed post-tool final response in 7/7 runs.
  • Direct vLLM OpenAI HTTP with the same vLLM version did not reproduce the failure in 7/7 runs.
  • A replay ablation isolated JSON argument formatting as the strongest trigger: replaying SMG-style pretty JSON tool arguments failed 3/3, while compacting only tool_calls[].function.arguments passed 3/3.
  • A reasoning_effort-only ablation failed 3/3 runs, so case-insensitive Chat reasoning-effort parsing is not included in this PR.
  • An args-only patch still produced malformed or empty final output after the tool result.
  • A parser-only patch still produced an unstable post-tool response, including one run that hit finish_reason="length" with a large leaked reasoning_content.
  • The final PR patch without reasoning_effort normalization passed 3/3 runs using lowercase reasoning_effort input.
  • A full-minus-argument-normalization patch reproduced malformed post-tool output, including invisible/non-breaking-space artifacts and self-analysis leakage.
  • A full-minus-parser-hardening patch passed 3/3 in the same run, so parser hardening is treated as a response-boundary safety fix rather than a demonstrated requirement for this exact sampled repro.

Reproduction artifacts are intentionally not included in this public PR.

Notes For Reviewers

The main behavior question is parser hardening: this PR drops channel-less or unknown-channel residual Harmony content instead of treating it as normal assistant final content. That is intentional because malformed Harmony residuals should not be exposed as a successful final answer.

The H100 ablations support argument normalization as the strongest trigger fix and tool-result commentary as part of transcript fidelity. Parser hardening is intentionally defensive: malformed channel-less or unknown-channel Harmony residuals should not be returned as successful final text, even if the prompt-side fixes prevent the sampled repro.

ai-jz added 2 commits May 25, 2026 19:10
Signed-off-by: ai-jz <ai-jz@users.noreply.github.com>
Signed-off-by: ai-jz <ai-jz@users.noreply.github.com>
@github-actions github-actions Bot added grpc gRPC client and router changes model-gateway Model gateway crate changes labels May 26, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

📝 Walkthrough

Walkthrough

This PR adds case-insensitive chat reasoning-effort parsing, ensures tool-result Harmony messages use channel="commentary", normalizes tool/function-call arguments as compact JSON when appropriate, and enforces strict channel-based routing so untagged content is dropped rather than treated as final output.

Changes

Harmony Message Lifecycle: Request Building and Response Parsing

Layer / File(s) Summary
Reasoning effort parsing
model_gateway/src/routers/grpc/harmony/builder.rs
Adds parse_chat_reasoning_effort helper that normalizes case, maps "minimal" to Low, and defaults unknown values to Medium. Replaces inline matching in ChatCompletion system-message construction.
Tool-result channel assignment and tests
model_gateway/src/routers/grpc/harmony/builder.rs
Updates tool-result Harmony messages from Responses FunctionToolCall and FunctionCallOutput, and from ChatCompletion ChatMessage::Tool and ChatMessage::Function to set channel="commentary" while preserving recipient="assistant" and author.name as functions.{...}. Tests added to assert these metadata values and include required protocol imports.
JSON argument normalization for tool calls
model_gateway/src/routers/grpc/harmony/parser.rs
Adds normalize_tool_arguments to parse and compact JSON tool/function-call arguments when content_type indicates JSON; on parse/serialize failure or non-JSON, original string is preserved. Applied when building FunctionCallResponse and extracting incomplete commentary. Unit tests added.
Channel-based routing and streaming tightening
model_gateway/src/routers/grpc/harmony/parser.rs
Refactors parsing and streaming handling to only accept explicit analysis, commentary, and final channels; drops and logs messages/deltas with None or unknown channels instead of appending them to final_text as a fallback. Incomplete content is appended only when on the final channel.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • lightseekorg/smg#592: Overlaps on parser streaming/channel delta accumulation and commentary/tool-call handling changes.

Suggested labels

tests

Suggested reviewers

  • CatherineSue
  • key4ng
  • slin1237

Poem

I hop through lines of tidy lore,
Compacting JSON at the core,
Tool voices now in "commentary" sing,
Reasoning effort dressed in spring,
Untagged echoes softly disappear. 🐇✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(harmony): normalize gpt-oss tool transcripts' accurately describes the main changes: normalizing tool transcripts (JSON argument normalization and tool-result channel metadata) in the Harmony module for GPT-OSS handling.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ai-jz/gpt-oss-harmony-tool-transcript

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ai-jz ai-jz changed the title Fix GPT-OSS Harmony tool transcript handling fix(harmony): normalize gpt-oss tool transcripts May 26, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors reasoning effort parsing, updates tool result messages to use the commentary channel, introduces JSON tool argument compaction, and drops channel-less or unknown channel content instead of treating it as final text. The review feedback suggests optimizing performance by avoiding heap allocations from to_ascii_lowercase() in both parse_chat_reasoning_effort and normalize_tool_arguments using case-insensitive checks. Additionally, it recommends downgrading the log level from warn! to debug! when dropping incomplete content to prevent log noise in production and maintain consistency.

Comment on lines +236 to +245
fn parse_chat_reasoning_effort(effort: &str) -> ReasoningEffort {
match effort.to_ascii_lowercase().as_str() {
"high" => ReasoningEffort::High,
"medium" => ReasoningEffort::Medium,
"low" => ReasoningEffort::Low,
// Harmony does not support minimal reasoning effort.
"minimal" => ReasoningEffort::Low,
_ => ReasoningEffort::Medium,
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The parse_chat_reasoning_effort function converts the input string to lowercase using to_ascii_lowercase(), which performs a heap allocation. Since this function is called on the request path, we can avoid this allocation entirely by using eq_ignore_ascii_case to perform case-insensitive comparisons.

Suggested change
fn parse_chat_reasoning_effort(effort: &str) -> ReasoningEffort {
match effort.to_ascii_lowercase().as_str() {
"high" => ReasoningEffort::High,
"medium" => ReasoningEffort::Medium,
"low" => ReasoningEffort::Low,
// Harmony does not support minimal reasoning effort.
"minimal" => ReasoningEffort::Low,
_ => ReasoningEffort::Medium,
}
}
fn parse_chat_reasoning_effort(effort: &str) -> ReasoningEffort {
if effort.eq_ignore_ascii_case("high") {
ReasoningEffort::High
} else if effort.eq_ignore_ascii_case("medium") {
ReasoningEffort::Medium
} else if effort.eq_ignore_ascii_case("low") {
ReasoningEffort::Low
} else if effort.eq_ignore_ascii_case("minimal") {
ReasoningEffort::Low
} else {
ReasoningEffort::Medium
}
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I am leaving this one as-is for readability. This request-path helper only handles one small enum-like string per request, and the allocation is not material compared with the surrounding request parsing/building work. The current match form is easier to scan and mirrors the previous logic.

Comment on lines +67 to +70
let should_parse_as_json = match content_type {
Some(content_type) => content_type.to_ascii_lowercase().contains("json"),
None => true,
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The normalize_tool_arguments function calls to_ascii_lowercase() on the content_type string, which performs a heap allocation. Since this is called for every tool call during response parsing, we can optimize this by performing a case-insensitive substring check without any allocations.

        let should_parse_as_json = match content_type {
            Some(ct) => ct.as_bytes().windows(4).any(|w| {
                w[0].to_ascii_lowercase() == b'j'
                    && w[1].to_ascii_lowercase() == b's'
                    && w[2].to_ascii_lowercase() == b'o'
                    && w[3].to_ascii_lowercase() == b'n'
            }),
            None => true,
        };

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I am going to skip this micro-optimization for now. Keeping the content-type check as a straightforward case-insensitive substring check is more readable, and this path immediately does JSON parsing/serialization when it matches, so avoiding this small allocation is unlikely to matter without profiling evidence.

Comment on lines +105 to +115
None => {
tracing::warn!(
"Dropping incomplete Harmony content without a channel instead of treating it as final text"
);
}
_ => {
tracing::warn!(
channel = ?current_channel,
"Dropping incomplete Harmony content with unknown channel instead of treating it as final text"
);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using tracing::warn! for dropping incomplete content when finalizing a stream can lead to excessive log noise in production. Incomplete content is a common and expected occurrence when a stream is cut off (e.g., due to max_tokens limit or client cancellation). Maintain consistency with existing logging practices within the same crate by using tracing::debug! or tracing::info! instead, matching the behavior in parse_chunk.

Suggested change
None => {
tracing::warn!(
"Dropping incomplete Harmony content without a channel instead of treating it as final text"
);
}
_ => {
tracing::warn!(
channel = ?current_channel,
"Dropping incomplete Harmony content with unknown channel instead of treating it as final text"
);
}
None => {
tracing::debug!(
"Dropping incomplete Harmony content without a channel instead of treating it as final text"
);
}
_ => {
tracing::debug!(
channel = ?current_channel,
"Dropping incomplete Harmony content with unknown channel instead of treating it as final text"
);
}
References
  1. Maintain consistency with existing logging practices within the same crate for similar functionalities, even if a different logging level might seem more appropriate in isolation.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in ac6d57e. I kept the behavior change but lowered incomplete residual logging from warn to debug to avoid expected stream-finalization noise.

Comment on lines +420 to +423
None => {
tracing::debug!(
"Dropping Harmony streaming delta without a channel instead of treating it as final text"
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Nit: This logs at debug! level, but the equivalent paths in parse_messages (line 243) and handle_incomplete_content (line 106) log at warn!. In production, streaming is the common path, so channel-less content arriving here would be the hardest to spot in logs. Consider using warn! for consistency so all three drop sites surface at the same level.

Suggested change
None => {
tracing::debug!(
"Dropping Harmony streaming delta without a channel instead of treating it as final text"
);
None => {
tracing::warn!(
"Dropping Harmony streaming delta without a channel instead of treating it as final text"
);
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I am going to keep this at debug for now. The non-streaming incomplete residual drops were also lowered to debug in ac6d57e, so the levels are now consistent in the lower-noise direction. If maintainers prefer warning on malformed Harmony residuals, that can be adjusted together across the parser paths.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-focused PR that fixes Harmony transcript fidelity across four related issues. The changes are logically consistent and well-tested.

Summary: 0 🔴 Important, 1 🟡 Nit, 1 🟣 Pre-existing

  • 🟡 parser.rs streaming log level: The channel-less drop in parse_chunk logs at debug! while the equivalent non-streaming paths use warn!. Suggested elevating for consistency.
  • 🟣 parser.rs streaming unknown-channel silent drop: The _ => {} catch-all in parse_chunk silently drops unknown channels, unlike parse_messages which now warns. Pre-existing but worth adding a log for parity.

The commentary-channel annotation, JSON normalization, case-insensitive reasoning effort, and channel-less content dropping all look correct and well-motivated.

Signed-off-by: ai-jz <ai-jz@users.noreply.github.com>
@ai-jz
Copy link
Copy Markdown
Collaborator Author

ai-jz commented May 26, 2026

Updated the PR body to use the repository template headers and kept the public description free of private reproduction artifacts.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
model_gateway/src/routers/grpc/harmony/parser.rs (1)

98-103: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Append analysis fragments instead of replacing them.

Both branches overwrite analysis, so only the last "analysis" segment survives when Harmony emits multiple analysis messages or leaves trailing incomplete analysis content. That drops reasoning in both parse_complete() and finalize().

💡 Proposed fix
                 Some("analysis") => {
-                    *analysis = Some(current_content);
+                    match analysis.as_mut() {
+                        Some(existing) => {
+                            existing.push('\n');
+                            existing.push_str(&current_content);
+                        }
+                        None => *analysis = Some(current_content),
+                    }
                 }
...
                 Some("analysis") => {
                     // Process each content item
                     // For Chat API, we join them into a single reasoning_content
                     let text = Self::extract_text_from_content(&msg.content);

                     if !text.is_empty() {
-                        analysis = Some(text);
+                        match analysis.as_mut() {
+                            Some(existing) => {
+                                existing.push('\n');
+                                existing.push_str(&text);
+                            }
+                            None => analysis = Some(text),
+                        }
                     }
                 }

Also applies to: 205-213

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@model_gateway/src/routers/grpc/harmony/parser.rs` around lines 98 - 103, The
code currently replaces the analysis buffer instead of appending when handling
chunks (matching current_channel == Some("analysis")), which causes earlier
analysis fragments to be lost; in the branch that sets/updates the analysis
variable (the code that assigns to *analysis = Some(current_content)) change it
to append to the existing Option<String> (e.g., if analysis.is_some() then
push_str(&current_content) into the existing String else set
Some(current_content)), and apply the same append-fix to the other occurrence
referenced (the parse_complete()/finalize() related branch around the 205-213
region) so all analysis fragments are concatenated rather than overwritten.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@model_gateway/src/routers/grpc/harmony/parser.rs`:
- Around line 98-103: The code currently replaces the analysis buffer instead of
appending when handling chunks (matching current_channel == Some("analysis")),
which causes earlier analysis fragments to be lost; in the branch that
sets/updates the analysis variable (the code that assigns to *analysis =
Some(current_content)) change it to append to the existing Option<String> (e.g.,
if analysis.is_some() then push_str(&current_content) into the existing String
else set Some(current_content)), and apply the same append-fix to the other
occurrence referenced (the parse_complete()/finalize() related branch around the
205-213 region) so all analysis fragments are concatenated rather than
overwritten.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4e6eed76-f5ff-4321-8447-60f56de2ae77

📥 Commits

Reviewing files that changed from the base of the PR and between d983a8e and ff36136.

📒 Files selected for processing (2)
  • model_gateway/src/routers/grpc/harmony/builder.rs
  • model_gateway/src/routers/grpc/harmony/parser.rs

@ai-jz
Copy link
Copy Markdown
Collaborator Author

ai-jz commented May 26, 2026

Review round summary: updated the PR description to match the repository template, accepted the incomplete-residual log-level suggestion in ac6d57e, and skipped the two allocation micro-optimizations because they reduce readability without profiling evidence. I also kept the streaming channel-less drop at debug so all residual-drop paths stay low-noise; happy to adjust all of them together if maintainers prefer warnings.

@ai-jz
Copy link
Copy Markdown
Collaborator Author

ai-jz commented May 26, 2026

Follow-up on the CodeRabbit analysis-fragment note: I verified the code path. The overwrite behavior for multiple analysis fragments was pre-existing, and this PR's behavior change is focused on tool transcript fidelity, tool-result channel metadata, JSON argument normalization, reasoning_effort parsing, and dropping malformed channel-less final text. I am leaving analysis-fragment aggregation for a separate follow-up so this PR stays focused on the reproduced GPT-OSS tool-call symptom.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 26, 2026

Hi @ai-jz, the DCO sign-off check has failed. All commits must include a Signed-off-by line.

To fix existing commits:

# Sign off the last N commits (replace N with the number of unsigned commits)
git rebase HEAD~N --signoff
git push --force-with-lease

To sign off future commits automatically:

  • Use git commit -s every time, or
  • VSCode: enable Git: Always Sign Off in Settings
  • PyCharm: enable Sign-off commit in the Commit tool window

Signed-off-by: ai-jz <ai-jz@users.noreply.github.com>
@ai-jz ai-jz force-pushed the ai-jz/gpt-oss-harmony-tool-transcript branch from 03210a1 to 517f64a Compare May 26, 2026 05:39
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 517f64a1a1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +242 to +246
None => {
tracing::warn!(
"Dropping Harmony message without a channel instead of treating it as final text"
);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve fallback for non-canonical final channels

Dropping assistant messages when channel is None or unrecognized causes successful generations to return empty user-visible output. parse_messages used to fall back to appending these texts to final_text, which is important because this parser already handles other known channel-format deviations (see the recipient-first tool-call workaround in the same function). With this change, any model output that uses a variant like an unchanneled/unknown final segment will now produce finish_reason="stop" but no message.content (and similarly empty Responses output).

Useful? React with 👍 / 👎.

@CatherineSue
Copy link
Copy Markdown
Member

CatherineSue commented May 26, 2026

I agree the channel fix and the incomplete message drop is correct. I don't agree on compact json as that is pure an cosmetic fix for the model capabilities and sglang doesn't do it, but it is ok to add such aesthetic fix.

Couple of things need to be addressed:

  1. Reference the official sources of gpt-oss and harmony stating something like # Valid channels: analysis, commentary, final. Channel must be included for every message.
  2. Regarding the compact JSON, citing the vLLM code file where it does the compact JSON.
  3. Cite the exact code in vLLM that drops the in-complete messages or unknown channels.
  4. The fix is missing a e2e test case.

);
}

fn assert_tool_result_metadata(message: &HarmonyMessage, function_name: &str) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think all the test cases here are necessary.

CatherineSue added a commit that referenced this pull request May 26, 2026
Adds an integration test in TestChatCompletionGptOss (inherited by the
120B variant) that exercises the actual user-visible symptom of the
SMG harmony bug: zero-width-space runs, NBSP runs, long whitespace
runs, ellipsis loops, or English self-analysis text leaking into
choices[].message.content on a successful HTTP 200 response after a
tool round trip.

The test drives a multi-turn flow that mirrors the structural shape
of the customer-derived empirical trigger documented in the harmony
bug report:

- Style-constrained system prompt (no markdown / parens / digits)
- 6-tool nested-args schema (cancel_subscription_items shape matches
  the empirical trigger)
- Multi-turn confirmation pattern
- Minimal {\"ok\": true} tool result
- reasoning_effort=high

Detection regex is tuned for zero false positives against the direct-
vLLM HTTP control corpus (0/7) at the cost of some catch rate against
the broken-path corpus (6/11); the 5 missed broken-path cases use
scattered NBSP between words, which is ambiguous with normal model
output and cannot be distinguished without false positives.

Reliability is best-effort: a synthetic English prompt may not trigger
as deterministically as the customer's 78KB Portuguese prompt. The
test asserts hard properties of the post-tool content that the SMG
harmony fix in #1547 is meant to guarantee; when it fails, it fails
loudly with the exact symptom string for triage.

Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
CatherineSue added a commit that referenced this pull request May 26, 2026
Adds an integration test in TestChatCompletionGptOss (inherited by
TestChatCompletionGptOss120B) that drives a multi-turn tool-call flow
through the gateway and asserts the post-tool choices[].message.content
is non-empty and contains no:

- Runs of invisible characters
- Long whitespace runs
- Ellipsis loops
- Leaked English self-analysis markers

These are the integrity properties the gpt-oss Harmony round-trip fix
in PR #1547 is meant to guarantee. On failure the assertion reports
the matched marker for triage.

Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

grpc gRPC client and router changes model-gateway Model gateway crate changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants