fix(openai): ensure "json" keyword in messages for strict providers (fixes #663)#671
fix(openai): ensure "json" keyword in messages for strict providers (fixes #663)#671shellybotmoyer wants to merge 4 commits into
Conversation
Catch embedding errors in representation pipeline Make embedding Optional in DocumentCreate schema Skip dedup when embedding is None Fail-fast on 429 RESOURCE_EXHAUSTED
…) calls (plastic-labs#625) OpenAI-compatible embedding providers that support the dimensions parameter (e.g., DashScope/Bailian text-embedding-v4) received wrong-dimensionality embeddings because dimensions=self.vector_dimensions was never passed. This caused Embedding dimension mismatch errors (e.g., Expected 1536, got 1024). Added dimensions=self.vector_dimensions to all 3 embeddings.create() call sites in the OpenAI transport path.
…_URL to LLMSettings and default clients The override client path (ModelOverrideSettings.base_url) already supported per-service base_url overrides, but the top-level default client constructors (get_openai_client, get_anthropic_client, get_gemini_client, CLIENTS dict) ignored base_url env vars entirely. This caused self-hosted / OpenRouter / vLLM users to silently hit the hardcoded OpenAI endpoint (api.openai.com) even when LLM_OPENAI_BASE_URL was set, producing 401 errors. Changes: - Add ANTHROPIC_BASE_URL, OPENAI_BASE_URL, GEMINI_BASE_URL fields to LLMSettings (env_prefix=LLM_, so LLM_OPENAI_BASE_URL etc. work) - Pass the base_url into all default client constructors (cached and module-level CLIENTS entries) - Use genai_types.HttpOptions for Gemini base_url wiring Fixes plastic-labs#641
WalkthroughPR adds per-provider base URL configuration for LLM clients (Anthropic, OpenAI, Gemini), makes document embeddings optional throughout the pipeline to enable later backfilling, enhances OpenAI backend to enforce JSON keyword for strict providers like Qwen/Alibaba, and adds dimension specification and quota detection to the embedding client. ChangesProvider Configuration & Graceful Embedding Handling
🎯 3 (Moderate) | ⏱️ ~25 minutes
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/crud/representation.py (1)
166-190:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPotential
IndexErrorwhen embedding count is shorter than observation count.Line 189 assumes a 1:1 embedding list. If the provider returns a partial embedding set, this crashes and defeats the new “save without embeddings” resilience path.
💡 Suggested fix
try: embeddings = await embedding_client.simple_batch_embed(observation_texts) @@ embeddings = None + + if embeddings is not None and len(embeddings) != len(all_observations): + logger.warning( + "Embedding count mismatch (%d/%d); saving without embeddings for backfill", + len(embeddings), + len(all_observations), + ) + embeddings = None- embedding=embeddings[i] if embeddings is not None else None, + embedding=embeddings[i] if embeddings is not None else None,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/crud/representation.py` around lines 166 - 190, The loop building schemas.DocumentCreate entries can raise IndexError when embeddings is shorter than all_observations; change the embedding selection used when constructing schemas.DocumentCreate to safely check embeddings length (e.g., compute embedding_value = embeddings[i] if embeddings is not None and i < len(embeddings) else None) and pass that embedding_value into schemas.DocumentCreate so missing embeddings simply become None and the "save without embeddings" path remains functional; update the block that references embeddings[i] in the loop that iterates over all_observations/documents_to_create accordingly.src/llm/backends/openai.py (1)
148-206:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
json_modeis skipped on structured-output paths.When
response_formatis a model/type (and in stream when anyresponse_formatis set), the code path bypasses_ensure_json_keyword(...). That leaves strict providers vulnerable to the same 400 rejection this PR is fixing.💡 Suggested fix
params = self._build_params( model=model, messages=messages, @@ extra_params=extra_params, ) + + if extra_params and extra_params.get("json_mode"): + messages = self._ensure_json_keyword(messages) + params["messages"] = messages if isinstance(response_format, type): params["response_format"] = response_format @@ - if extra_params and extra_params.get("json_mode"): + if extra_params and extra_params.get("json_mode"): params["response_format"] = {"type": "json_object"} - messages = self._ensure_json_keyword(messages) - params["messages"] = messagesparams["stream"] = True params["stream_options"] = {"include_usage": True} + if extra_params and extra_params.get("json_mode"): + messages = self._ensure_json_keyword(messages) + params["messages"] = messages + if isinstance(response_format, type): @@ - elif extra_params and extra_params.get("json_mode"): + elif extra_params and extra_params.get("json_mode"): params["response_format"] = {"type": "json_object"} - messages = self._ensure_json_keyword(messages) - params["messages"] = messagesAlso applies to: 244-260
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llm/backends/openai.py` around lines 148 - 206, The structured-output branch (when response_format is a type) and the streaming/structured paths skip honoring extra_params["json_mode"], so callers with json_mode can still get 400s; fix by checking extra_params.get("json_mode") early in both structured paths and, when true, set params["response_format"] = {"type":"json_object"}, call messages = self._ensure_json_keyword(messages) and assign params["messages"]=messages before calling self._client.chat.completions.parse or before falling back to _create_structured_response/_parse_or_repair_structured_content; update the same logic around the non-type response_format/streaming section (the later block referenced at lines ~244-260) so both branches consistently handle json_mode.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/llm/backends/openai.py`:
- Around line 349-355: _in _ensure_json_keyword, the code mutates caller-owned
message dicts and can crash when a message's "content" is not a string; to fix,
build and return a new list of message dicts (e.g., copy each message dict
before modifying) instead of only copying the list, locate the loop that
iterates messages and replace the in-place update with creating a fresh dict for
the system message and preserving others as-is, and ensure you coerce or safely
handle non-string content (use str(content) or treat non-string as empty before
appending the suffix) when appending the suffix variable to avoid TypeError.
---
Outside diff comments:
In `@src/crud/representation.py`:
- Around line 166-190: The loop building schemas.DocumentCreate entries can
raise IndexError when embeddings is shorter than all_observations; change the
embedding selection used when constructing schemas.DocumentCreate to safely
check embeddings length (e.g., compute embedding_value = embeddings[i] if
embeddings is not None and i < len(embeddings) else None) and pass that
embedding_value into schemas.DocumentCreate so missing embeddings simply become
None and the "save without embeddings" path remains functional; update the block
that references embeddings[i] in the loop that iterates over
all_observations/documents_to_create accordingly.
In `@src/llm/backends/openai.py`:
- Around line 148-206: The structured-output branch (when response_format is a
type) and the streaming/structured paths skip honoring
extra_params["json_mode"], so callers with json_mode can still get 400s; fix by
checking extra_params.get("json_mode") early in both structured paths and, when
true, set params["response_format"] = {"type":"json_object"}, call messages =
self._ensure_json_keyword(messages) and assign params["messages"]=messages
before calling self._client.chat.completions.parse or before falling back to
_create_structured_response/_parse_or_repair_structured_content; update the same
logic around the non-type response_format/streaming section (the later block
referenced at lines ~244-260) so both branches consistently handle json_mode.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 62f27ad5-82f0-42d5-92d2-010869820ed8
📒 Files selected for processing (7)
src/config.pysrc/crud/document.pysrc/crud/representation.pysrc/embedding_client.pysrc/llm/backends/openai.pysrc/llm/registry.pysrc/schemas/internal.py
| for i in range(len(messages) - 1, -1, -1): | ||
| if messages[i].get("role") == "system": | ||
| messages = list(messages) | ||
| existing = messages[i].get("content", "") | ||
| suffix = "\n\nRespond in valid JSON format." | ||
| messages[i]["content"] = existing + suffix if existing else suffix | ||
| return messages |
There was a problem hiding this comment.
_ensure_json_keyword still mutates caller-owned dicts and can crash on non-string content.
Line 351 copies only the list, not message dicts, so updating Line 354 mutates the original payload. Also, existing + suffix fails if content is not a string.
💡 Suggested fix
for i in range(len(messages) - 1, -1, -1):
if messages[i].get("role") == "system":
- messages = list(messages)
- existing = messages[i].get("content", "")
+ copied = [dict(msg) for msg in messages]
+ existing = copied[i].get("content")
suffix = "\n\nRespond in valid JSON format."
- messages[i]["content"] = existing + suffix if existing else suffix
- return messages
+ if isinstance(existing, str):
+ copied[i]["content"] = existing + suffix if existing else suffix
+ return copied
+ # Non-string system content: avoid type errors and append a safe hint message.
+ return copied + [
+ {"role": "system", "content": "Respond in valid JSON format."}
+ ]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/llm/backends/openai.py` around lines 349 - 355, _in _ensure_json_keyword,
the code mutates caller-owned message dicts and can crash when a message's
"content" is not a string; to fix, build and return a new list of message dicts
(e.g., copy each message dict before modifying) instead of only copying the
list, locate the loop that iterates messages and replace the in-place update
with creating a fresh dict for the system message and preserving others as-is,
and ensure you coerce or safely handle non-string content (use str(content) or
treat non-string as empty before appending the suffix) when appending the suffix
variable to avoid TypeError.
|
Superseded by #664 (kevin-ho), which is a cleaner single-commit fix for the same issue. Closing in favor of the focused PR. |
|
Superseded by #664. |
Summary
Alibaba/Qwen strictly enforce the OpenAI spec requirement that the word "json" must appear somewhere in the messages whenever
response_formatis set (whetherjson_objectorjson_schema). Most providers are lenient, so this has been a hidden compatibility issue that only surfaces when routingjson_mode=Truecalls through Qwen models.Proposed Fix
_ensure_json_keyword()static method toOpenAIBackendjson_mode=Trueincomplete()andstream(), call_ensure_json_keyword(messages)before the API callChanges
src/llm/backends/openai.py: added_ensure_json_keyword()and wired it into bothcomplete()andstream()whenjson_mode=TrueVerification
extra_params.get("json_mode"))Closes #663
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes