Skip to content

fix(openai): ensure "json" keyword in messages for strict providers (fixes #663)#671

Closed
shellybotmoyer wants to merge 4 commits into
plastic-labs:mainfrom
shellybotmoyer:fix/openai-json-keyword
Closed

fix(openai): ensure "json" keyword in messages for strict providers (fixes #663)#671
shellybotmoyer wants to merge 4 commits into
plastic-labs:mainfrom
shellybotmoyer:fix/openai-json-keyword

Conversation

@shellybotmoyer
Copy link
Copy Markdown
Contributor

@shellybotmoyer shellybotmoyer commented May 11, 2026

Summary

Alibaba/Qwen strictly enforce the OpenAI spec requirement that the word "json" must appear somewhere in the messages whenever response_format is set (whether json_object or json_schema). Most providers are lenient, so this has been a hidden compatibility issue that only surfaces when routing json_mode=True calls through Qwen models.

Proposed Fix

  • Add _ensure_json_keyword() static method to OpenAIBackend
  • When json_mode=True in complete() and stream(), call _ensure_json_keyword(messages) before the API call
  • If any message already contains "json" (case-insensitive), return messages unchanged (fast path)
  • Otherwise, append a brief instruction to the last system message, or add a new system message if none exists

Changes

  • src/llm/backends/openai.py: added _ensure_json_keyword() and wired it into both complete() and stream() when json_mode=True

Verification

  • Fast path preserves existing messages when "json" is already present
  • System-message injection avoids mutating the messages list in-place
  • No effect on non-JSON calls (gated on extra_params.get("json_mode"))

Closes #663

Summary by CodeRabbit

Release Notes

  • New Features

    • Configure alternate API endpoints for Anthropic, OpenAI, and Gemini providers without changing model settings.
    • Improved document processing to gracefully handle embedding generation failures while preserving data.
  • Bug Fixes

    • Fixed OpenAI embedding dimension alignment with configured vector settings.
    • Enhanced rate-limit detection for embedding requests.
    • Corrected JSON mode formatting for OpenAI responses.

Review Change Stack

Hermes Agent and others added 4 commits May 7, 2026 19:14
Catch embedding errors in representation pipeline

Make embedding Optional in DocumentCreate schema

Skip dedup when embedding is None

Fail-fast on 429 RESOURCE_EXHAUSTED
…) calls (plastic-labs#625)

OpenAI-compatible embedding providers that support the dimensions parameter (e.g., DashScope/Bailian text-embedding-v4) received wrong-dimensionality embeddings because dimensions=self.vector_dimensions was never passed. This caused Embedding dimension mismatch errors (e.g., Expected 1536, got 1024). Added dimensions=self.vector_dimensions to all 3 embeddings.create() call sites in the OpenAI transport path.
…_URL to LLMSettings and default clients

The override client path (ModelOverrideSettings.base_url) already supported
per-service base_url overrides, but the top-level default client constructors
(get_openai_client, get_anthropic_client, get_gemini_client, CLIENTS dict)
ignored base_url env vars entirely.

This caused self-hosted / OpenRouter / vLLM users to silently hit the
hardcoded OpenAI endpoint (api.openai.com) even when LLM_OPENAI_BASE_URL
was set, producing 401 errors.

Changes:
- Add ANTHROPIC_BASE_URL, OPENAI_BASE_URL, GEMINI_BASE_URL fields to
  LLMSettings (env_prefix=LLM_, so LLM_OPENAI_BASE_URL etc. work)
- Pass the base_url into all default client constructors (cached and
  module-level CLIENTS entries)
- Use genai_types.HttpOptions for Gemini base_url wiring

Fixes plastic-labs#641
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 11, 2026

Walkthrough

PR adds per-provider base URL configuration for LLM clients (Anthropic, OpenAI, Gemini), makes document embeddings optional throughout the pipeline to enable later backfilling, enhances OpenAI backend to enforce JSON keyword for strict providers like Qwen/Alibaba, and adds dimension specification and quota detection to the embedding client.

Changes

Provider Configuration & Graceful Embedding Handling

Layer / File(s) Summary
Configuration & Schema
src/config.py, src/schemas/internal.py
LLMSettings adds optional ANTHROPIC_BASE_URL, OPENAI_BASE_URL, and GEMINI_BASE_URL fields. DocumentCreate.embedding changed from required list[float] to optional list[float] | None with default None.
Provider Registry & Client Initialization
src/llm/registry.py
Default client builders (get_anthropic_client, get_openai_client, get_gemini_client) and module-level CLIENTS registry now wire base URLs from settings. Gemini uses conditional HttpOptions when base URL is configured.
Embedding Client Enhancements
src/embedding_client.py
OpenAI embedding requests now include dimensions=self.vector_dimensions in embed(), simple_batch_embed(), and _process_batch(). simple_batch_embed() detects quota exhaustion errors ("429" / "RESOURCE_EXHAUSTED") and re-raises immediately.
OpenAI JSON Mode Enforcement
src/llm/backends/openai.py
New _ensure_json_keyword() helper scans messages for "json" keyword. When json_mode=True in complete() and stream(), messages are transformed to append "Respond in valid JSON format." instruction to system message if keyword is absent, satisfying strict provider requirements.
Document & Representation Handling
src/crud/document.py, src/crud/representation.py
Document deduplication now skips when embedding is None. RepresentationManager.save_representation() treats non-ValueError embedding generation failures as non-fatal, logging warning and proceeding with embeddings=None for later backfilling. _save_representation_internal() accepts optional embeddings; document construction uses conditional assignment via enumerate() instead of strict zip().

🎯 3 (Moderate) | ⏱️ ~25 minutes


🐰 Hop along now, with URLs so fine,
Embeddings optional, they'll align,
JSON words whispered to Qwen's ear,
Dimensions dancing, graceful and clear—
Configuration flows like rabbit's cheer! 🎉

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The PR introduces several out-of-scope changes beyond the linked issue #663: base URL configuration for providers (config.py, registry.py), embedding handling changes (crud/*, embedding_client.py), and document schema modifications (schemas/internal.py). Scope this PR to address only issue #663 (json keyword enforcement for OpenAI backend). Defer unrelated changes (base URLs, embedding logic, document schema) to separate PRs with their own issue tracking and documentation.
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'fix(openai): ensure "json" keyword in messages for strict providers (fixes #663)' clearly and specifically describes the main change: adding logic to ensure the 'json' keyword appears in messages for OpenAI-compatible providers that strictly enforce it.
Linked Issues check ✅ Passed The PR comprehensively addresses issue #663 by implementing the _ensure_json_keyword() helper that detects the 'json' keyword in messages and injects it if missing, exactly as proposed in the issue for strict providers like Qwen.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/crud/representation.py (1)

166-190: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Potential IndexError when embedding count is shorter than observation count.

Line 189 assumes a 1:1 embedding list. If the provider returns a partial embedding set, this crashes and defeats the new “save without embeddings” resilience path.

💡 Suggested fix
         try:
             embeddings = await embedding_client.simple_batch_embed(observation_texts)
@@
             embeddings = None
+
+        if embeddings is not None and len(embeddings) != len(all_observations):
+            logger.warning(
+                "Embedding count mismatch (%d/%d); saving without embeddings for backfill",
+                len(embeddings),
+                len(all_observations),
+            )
+            embeddings = None
-                    embedding=embeddings[i] if embeddings is not None else None,
+                    embedding=embeddings[i] if embeddings is not None else None,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/crud/representation.py` around lines 166 - 190, The loop building
schemas.DocumentCreate entries can raise IndexError when embeddings is shorter
than all_observations; change the embedding selection used when constructing
schemas.DocumentCreate to safely check embeddings length (e.g., compute
embedding_value = embeddings[i] if embeddings is not None and i <
len(embeddings) else None) and pass that embedding_value into
schemas.DocumentCreate so missing embeddings simply become None and the "save
without embeddings" path remains functional; update the block that references
embeddings[i] in the loop that iterates over
all_observations/documents_to_create accordingly.
src/llm/backends/openai.py (1)

148-206: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

json_mode is skipped on structured-output paths.

When response_format is a model/type (and in stream when any response_format is set), the code path bypasses _ensure_json_keyword(...). That leaves strict providers vulnerable to the same 400 rejection this PR is fixing.

💡 Suggested fix
         params = self._build_params(
             model=model,
             messages=messages,
@@
             extra_params=extra_params,
         )
+
+        if extra_params and extra_params.get("json_mode"):
+            messages = self._ensure_json_keyword(messages)
+            params["messages"] = messages

         if isinstance(response_format, type):
             params["response_format"] = response_format
@@
-        if extra_params and extra_params.get("json_mode"):
+        if extra_params and extra_params.get("json_mode"):
             params["response_format"] = {"type": "json_object"}
-            messages = self._ensure_json_keyword(messages)
-            params["messages"] = messages
         params["stream"] = True
         params["stream_options"] = {"include_usage": True}
+        if extra_params and extra_params.get("json_mode"):
+            messages = self._ensure_json_keyword(messages)
+            params["messages"] = messages
+
         if isinstance(response_format, type):
@@
-        elif extra_params and extra_params.get("json_mode"):
+        elif extra_params and extra_params.get("json_mode"):
             params["response_format"] = {"type": "json_object"}
-            messages = self._ensure_json_keyword(messages)
-            params["messages"] = messages

Also applies to: 244-260

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llm/backends/openai.py` around lines 148 - 206, The structured-output
branch (when response_format is a type) and the streaming/structured paths skip
honoring extra_params["json_mode"], so callers with json_mode can still get
400s; fix by checking extra_params.get("json_mode") early in both structured
paths and, when true, set params["response_format"] = {"type":"json_object"},
call messages = self._ensure_json_keyword(messages) and assign
params["messages"]=messages before calling self._client.chat.completions.parse
or before falling back to
_create_structured_response/_parse_or_repair_structured_content; update the same
logic around the non-type response_format/streaming section (the later block
referenced at lines ~244-260) so both branches consistently handle json_mode.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/llm/backends/openai.py`:
- Around line 349-355: _in _ensure_json_keyword, the code mutates caller-owned
message dicts and can crash when a message's "content" is not a string; to fix,
build and return a new list of message dicts (e.g., copy each message dict
before modifying) instead of only copying the list, locate the loop that
iterates messages and replace the in-place update with creating a fresh dict for
the system message and preserving others as-is, and ensure you coerce or safely
handle non-string content (use str(content) or treat non-string as empty before
appending the suffix) when appending the suffix variable to avoid TypeError.

---

Outside diff comments:
In `@src/crud/representation.py`:
- Around line 166-190: The loop building schemas.DocumentCreate entries can
raise IndexError when embeddings is shorter than all_observations; change the
embedding selection used when constructing schemas.DocumentCreate to safely
check embeddings length (e.g., compute embedding_value = embeddings[i] if
embeddings is not None and i < len(embeddings) else None) and pass that
embedding_value into schemas.DocumentCreate so missing embeddings simply become
None and the "save without embeddings" path remains functional; update the block
that references embeddings[i] in the loop that iterates over
all_observations/documents_to_create accordingly.

In `@src/llm/backends/openai.py`:
- Around line 148-206: The structured-output branch (when response_format is a
type) and the streaming/structured paths skip honoring
extra_params["json_mode"], so callers with json_mode can still get 400s; fix by
checking extra_params.get("json_mode") early in both structured paths and, when
true, set params["response_format"] = {"type":"json_object"}, call messages =
self._ensure_json_keyword(messages) and assign params["messages"]=messages
before calling self._client.chat.completions.parse or before falling back to
_create_structured_response/_parse_or_repair_structured_content; update the same
logic around the non-type response_format/streaming section (the later block
referenced at lines ~244-260) so both branches consistently handle json_mode.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 62f27ad5-82f0-42d5-92d2-010869820ed8

📥 Commits

Reviewing files that changed from the base of the PR and between a4ae372 and b91c439.

📒 Files selected for processing (7)
  • src/config.py
  • src/crud/document.py
  • src/crud/representation.py
  • src/embedding_client.py
  • src/llm/backends/openai.py
  • src/llm/registry.py
  • src/schemas/internal.py

Comment on lines +349 to +355
for i in range(len(messages) - 1, -1, -1):
if messages[i].get("role") == "system":
messages = list(messages)
existing = messages[i].get("content", "")
suffix = "\n\nRespond in valid JSON format."
messages[i]["content"] = existing + suffix if existing else suffix
return messages
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

_ensure_json_keyword still mutates caller-owned dicts and can crash on non-string content.

Line 351 copies only the list, not message dicts, so updating Line 354 mutates the original payload. Also, existing + suffix fails if content is not a string.

💡 Suggested fix
         for i in range(len(messages) - 1, -1, -1):
             if messages[i].get("role") == "system":
-                messages = list(messages)
-                existing = messages[i].get("content", "")
+                copied = [dict(msg) for msg in messages]
+                existing = copied[i].get("content")
                 suffix = "\n\nRespond in valid JSON format."
-                messages[i]["content"] = existing + suffix if existing else suffix
-                return messages
+                if isinstance(existing, str):
+                    copied[i]["content"] = existing + suffix if existing else suffix
+                    return copied
+                # Non-string system content: avoid type errors and append a safe hint message.
+                return copied + [
+                    {"role": "system", "content": "Respond in valid JSON format."}
+                ]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llm/backends/openai.py` around lines 349 - 355, _in _ensure_json_keyword,
the code mutates caller-owned message dicts and can crash when a message's
"content" is not a string; to fix, build and return a new list of message dicts
(e.g., copy each message dict before modifying) instead of only copying the
list, locate the loop that iterates messages and replace the in-place update
with creating a fresh dict for the system message and preserving others as-is,
and ensure you coerce or safely handle non-string content (use str(content) or
treat non-string as empty before appending the suffix) when appending the suffix
variable to avoid TypeError.

@shellybotmoyer
Copy link
Copy Markdown
Contributor Author

Superseded by #664 (kevin-ho), which is a cleaner single-commit fix for the same issue. Closing in favor of the focused PR.

@shellybotmoyer
Copy link
Copy Markdown
Contributor Author

Superseded by #664.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] json_mode=True fails on Alibaba/Qwen providers — "messages must contain the word 'json'"

1 participant