fix(openai): ensure "json" keyword in messages for strict providers (fixes #663) by shellybotmoyer · Pull Request #671 · plastic-labs/honcho

shellybotmoyer · 2026-05-11T14:21:05Z

Summary

Alibaba/Qwen strictly enforce the OpenAI spec requirement that the word "json" must appear somewhere in the messages whenever response_format is set (whether json_object or json_schema). Most providers are lenient, so this has been a hidden compatibility issue that only surfaces when routing json_mode=True calls through Qwen models.

Proposed Fix

Add _ensure_json_keyword() static method to OpenAIBackend
When json_mode=True in complete() and stream(), call _ensure_json_keyword(messages) before the API call
If any message already contains "json" (case-insensitive), return messages unchanged (fast path)
Otherwise, append a brief instruction to the last system message, or add a new system message if none exists

Changes

src/llm/backends/openai.py: added _ensure_json_keyword() and wired it into both complete() and stream() when json_mode=True

Verification

Fast path preserves existing messages when "json" is already present
System-message injection avoids mutating the messages list in-place
No effect on non-JSON calls (gated on extra_params.get("json_mode"))

Closes #663

Summary by CodeRabbit

Release Notes

New Features
- Configure alternate API endpoints for Anthropic, OpenAI, and Gemini providers without changing model settings.
- Improved document processing to gracefully handle embedding generation failures while preserving data.
Bug Fixes
- Fixed OpenAI embedding dimension alignment with configured vector settings.
- Enhanced rate-limit detection for embedding requests.
- Corrected JSON mode formatting for OpenAI responses.

Catch embedding errors in representation pipeline Make embedding Optional in DocumentCreate schema Skip dedup when embedding is None Fail-fast on 429 RESOURCE_EXHAUSTED

…) calls (plastic-labs#625) OpenAI-compatible embedding providers that support the dimensions parameter (e.g., DashScope/Bailian text-embedding-v4) received wrong-dimensionality embeddings because dimensions=self.vector_dimensions was never passed. This caused Embedding dimension mismatch errors (e.g., Expected 1536, got 1024). Added dimensions=self.vector_dimensions to all 3 embeddings.create() call sites in the OpenAI transport path.

…_URL to LLMSettings and default clients The override client path (ModelOverrideSettings.base_url) already supported per-service base_url overrides, but the top-level default client constructors (get_openai_client, get_anthropic_client, get_gemini_client, CLIENTS dict) ignored base_url env vars entirely. This caused self-hosted / OpenRouter / vLLM users to silently hit the hardcoded OpenAI endpoint (api.openai.com) even when LLM_OPENAI_BASE_URL was set, producing 401 errors. Changes: - Add ANTHROPIC_BASE_URL, OPENAI_BASE_URL, GEMINI_BASE_URL fields to LLMSettings (env_prefix=LLM_, so LLM_OPENAI_BASE_URL etc. work) - Pass the base_url into all default client constructors (cached and module-level CLIENTS entries) - Use genai_types.HttpOptions for Gemini base_url wiring Fixes plastic-labs#641

…ixes plastic-labs#663)

coderabbitai · 2026-05-11T14:23:09Z

Walkthrough

PR adds per-provider base URL configuration for LLM clients (Anthropic, OpenAI, Gemini), makes document embeddings optional throughout the pipeline to enable later backfilling, enhances OpenAI backend to enforce JSON keyword for strict providers like Qwen/Alibaba, and adds dimension specification and quota detection to the embedding client.

Changes

Provider Configuration & Graceful Embedding Handling

Layer / File(s)	Summary
Configuration & Schema `src/config.py`, `src/schemas/internal.py`	`LLMSettings` adds optional `ANTHROPIC_BASE_URL`, `OPENAI_BASE_URL`, and `GEMINI_BASE_URL` fields. `DocumentCreate.embedding` changed from required `list[float]` to optional `list[float] \| None` with default `None`.
Provider Registry & Client Initialization `src/llm/registry.py`	Default client builders (`get_anthropic_client`, `get_openai_client`, `get_gemini_client`) and module-level `CLIENTS` registry now wire base URLs from settings. Gemini uses conditional `HttpOptions` when base URL is configured.
Embedding Client Enhancements `src/embedding_client.py`	OpenAI embedding requests now include `dimensions=self.vector_dimensions` in `embed()`, `simple_batch_embed()`, and `_process_batch()`. `simple_batch_embed()` detects quota exhaustion errors (`"429"` / `"RESOURCE_EXHAUSTED"`) and re-raises immediately.
OpenAI JSON Mode Enforcement `src/llm/backends/openai.py`	New `_ensure_json_keyword()` helper scans messages for "json" keyword. When `json_mode=True` in `complete()` and `stream()`, messages are transformed to append "Respond in valid JSON format." instruction to system message if keyword is absent, satisfying strict provider requirements.
Document & Representation Handling `src/crud/document.py`, `src/crud/representation.py`	Document deduplication now skips when `embedding is None`. `RepresentationManager.save_representation()` treats non-`ValueError` embedding generation failures as non-fatal, logging warning and proceeding with `embeddings=None` for later backfilling. `_save_representation_internal()` accepts optional embeddings; document construction uses conditional assignment via `enumerate()` instead of strict `zip()`.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🐰 Hop along now, with URLs so fine,
Embeddings optional, they'll align,
JSON words whispered to Qwen's ear,
Dimensions dancing, graceful and clear—
Configuration flows like rabbit's cheer! 🎉

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Out of Scope Changes check	⚠️ Warning	The PR introduces several out-of-scope changes beyond the linked issue `#663`: base URL configuration for providers (config.py, registry.py), embedding handling changes (crud/*, embedding_client.py), and document schema modifications (schemas/internal.py).	Scope this PR to address only issue `#663` (json keyword enforcement for OpenAI backend). Defer unrelated changes (base URLs, embedding logic, document schema) to separate PRs with their own issue tracking and documentation.
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title 'fix(openai): ensure "json" keyword in messages for strict providers (fixes `#663`)' clearly and specifically describes the main change: adding logic to ensure the 'json' keyword appears in messages for OpenAI-compatible providers that strictly enforce it.
Linked Issues check	✅ Passed	The PR comprehensively addresses issue `#663` by implementing the _ensure_json_keyword() helper that detects the 'json' keyword in messages and injects it if missing, exactly as proposed in the issue for strict providers like Qwen.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/crud/representation.py (1)

166-190: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Potential IndexError when embedding count is shorter than observation count.

Line 189 assumes a 1:1 embedding list. If the provider returns a partial embedding set, this crashes and defeats the new “save without embeddings” resilience path.

💡 Suggested fix

         try:
             embeddings = await embedding_client.simple_batch_embed(observation_texts)
@@
             embeddings = None
+
+        if embeddings is not None and len(embeddings) != len(all_observations):
+            logger.warning(
+                "Embedding count mismatch (%d/%d); saving without embeddings for backfill",
+                len(embeddings),
+                len(all_observations),
+            )
+            embeddings = None

-                    embedding=embeddings[i] if embeddings is not None else None,
+                    embedding=embeddings[i] if embeddings is not None else None,

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/crud/representation.py` around lines 166 - 190, The loop building
schemas.DocumentCreate entries can raise IndexError when embeddings is shorter
than all_observations; change the embedding selection used when constructing
schemas.DocumentCreate to safely check embeddings length (e.g., compute
embedding_value = embeddings[i] if embeddings is not None and i <
len(embeddings) else None) and pass that embedding_value into
schemas.DocumentCreate so missing embeddings simply become None and the "save
without embeddings" path remains functional; update the block that references
embeddings[i] in the loop that iterates over
all_observations/documents_to_create accordingly.

src/llm/backends/openai.py (1)

148-206: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

json_mode is skipped on structured-output paths.

When response_format is a model/type (and in stream when any response_format is set), the code path bypasses _ensure_json_keyword(...). That leaves strict providers vulnerable to the same 400 rejection this PR is fixing.

💡 Suggested fix

         params = self._build_params(
             model=model,
             messages=messages,
@@
             extra_params=extra_params,
         )
+
+        if extra_params and extra_params.get("json_mode"):
+            messages = self._ensure_json_keyword(messages)
+            params["messages"] = messages

         if isinstance(response_format, type):
             params["response_format"] = response_format
@@
-        if extra_params and extra_params.get("json_mode"):
+        if extra_params and extra_params.get("json_mode"):
             params["response_format"] = {"type": "json_object"}
-            messages = self._ensure_json_keyword(messages)
-            params["messages"] = messages

         params["stream"] = True
         params["stream_options"] = {"include_usage": True}
+        if extra_params and extra_params.get("json_mode"):
+            messages = self._ensure_json_keyword(messages)
+            params["messages"] = messages
+
         if isinstance(response_format, type):
@@
-        elif extra_params and extra_params.get("json_mode"):
+        elif extra_params and extra_params.get("json_mode"):
             params["response_format"] = {"type": "json_object"}
-            messages = self._ensure_json_keyword(messages)
-            params["messages"] = messages

Also applies to: 244-260

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llm/backends/openai.py` around lines 148 - 206, The structured-output
branch (when response_format is a type) and the streaming/structured paths skip
honoring extra_params["json_mode"], so callers with json_mode can still get
400s; fix by checking extra_params.get("json_mode") early in both structured
paths and, when true, set params["response_format"] = {"type":"json_object"},
call messages = self._ensure_json_keyword(messages) and assign
params["messages"]=messages before calling self._client.chat.completions.parse
or before falling back to
_create_structured_response/_parse_or_repair_structured_content; update the same
logic around the non-type response_format/streaming section (the later block
referenced at lines ~244-260) so both branches consistently handle json_mode.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/llm/backends/openai.py`:
- Around line 349-355: _in _ensure_json_keyword, the code mutates caller-owned
message dicts and can crash when a message's "content" is not a string; to fix,
build and return a new list of message dicts (e.g., copy each message dict
before modifying) instead of only copying the list, locate the loop that
iterates messages and replace the in-place update with creating a fresh dict for
the system message and preserving others as-is, and ensure you coerce or safely
handle non-string content (use str(content) or treat non-string as empty before
appending the suffix) when appending the suffix variable to avoid TypeError.

---

Outside diff comments:
In `@src/crud/representation.py`:
- Around line 166-190: The loop building schemas.DocumentCreate entries can
raise IndexError when embeddings is shorter than all_observations; change the
embedding selection used when constructing schemas.DocumentCreate to safely
check embeddings length (e.g., compute embedding_value = embeddings[i] if
embeddings is not None and i < len(embeddings) else None) and pass that
embedding_value into schemas.DocumentCreate so missing embeddings simply become
None and the "save without embeddings" path remains functional; update the block
that references embeddings[i] in the loop that iterates over
all_observations/documents_to_create accordingly.

In `@src/llm/backends/openai.py`:
- Around line 148-206: The structured-output branch (when response_format is a
type) and the streaming/structured paths skip honoring
extra_params["json_mode"], so callers with json_mode can still get 400s; fix by
checking extra_params.get("json_mode") early in both structured paths and, when
true, set params["response_format"] = {"type":"json_object"}, call messages =
self._ensure_json_keyword(messages) and assign params["messages"]=messages
before calling self._client.chat.completions.parse or before falling back to
_create_structured_response/_parse_or_repair_structured_content; update the same
logic around the non-type response_format/streaming section (the later block
referenced at lines ~244-260) so both branches consistently handle json_mode.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 62f27ad5-82f0-42d5-92d2-010869820ed8

📥 Commits

Reviewing files that changed from the base of the PR and between a4ae372 and b91c439.

📒 Files selected for processing (7)

src/config.py
src/crud/document.py
src/crud/representation.py
src/embedding_client.py
src/llm/backends/openai.py
src/llm/registry.py
src/schemas/internal.py

coderabbitai · 2026-05-11T14:27:38Z

+        for i in range(len(messages) - 1, -1, -1):
+            if messages[i].get("role") == "system":
+                messages = list(messages)
+                existing = messages[i].get("content", "")
+                suffix = "\n\nRespond in valid JSON format."
+                messages[i]["content"] = existing + suffix if existing else suffix
+                return messages


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

_ensure_json_keyword still mutates caller-owned dicts and can crash on non-string content.

Line 351 copies only the list, not message dicts, so updating Line 354 mutates the original payload. Also, existing + suffix fails if content is not a string.

💡 Suggested fix

for i in range(len(messages) - 1, -1, -1): if messages[i].get("role") == "system": - messages = list(messages) - existing = messages[i].get("content", "") + copied = [dict(msg) for msg in messages] + existing = copied[i].get("content") suffix = "\n\nRespond in valid JSON format." - messages[i]["content"] = existing + suffix if existing else suffix - return messages + if isinstance(existing, str): + copied[i]["content"] = existing + suffix if existing else suffix + return copied + # Non-string system content: avoid type errors and append a safe hint message. + return copied + [ + {"role": "system", "content": "Respond in valid JSON format."} + ]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llm/backends/openai.py` around lines 349 - 355, _in _ensure_json_keyword, the code mutates caller-owned message dicts and can crash when a message's "content" is not a string; to fix, build and return a new list of message dicts (e.g., copy each message dict before modifying) instead of only copying the list, locate the loop that iterates messages and replace the in-place update with creating a fresh dict for the system message and preserving others as-is, and ensure you coerce or safely handle non-string content (use str(content) or treat non-string as empty before appending the suffix) when appending the suffix variable to avoid TypeError.

shellybotmoyer · 2026-05-13T11:04:33Z

Superseded by #664 (kevin-ho), which is a cleaner single-commit fix for the same issue. Closing in favor of the focused PR.

shellybotmoyer · 2026-05-13T11:04:34Z

Superseded by #664.

Hermes Agent and others added 4 commits May 7, 2026 19:14

fix: embedding resilience - save docs without embeddings on API failure

6959895

Catch embedding errors in representation pipeline Make embedding Optional in DocumentCreate schema Skip dedup when embedding is None Fail-fast on 429 RESOURCE_EXHAUSTED

fix(openai): ensure "json" keyword in messages for strict providers (f…

b91c439

…ixes plastic-labs#663)

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

shellybotmoyer closed this May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(openai): ensure "json" keyword in messages for strict providers (fixes #663)#671

fix(openai): ensure "json" keyword in messages for strict providers (fixes #663)#671
shellybotmoyer wants to merge 4 commits into
plastic-labs:mainfrom
shellybotmoyer:fix/openai-json-keyword

shellybotmoyer commented May 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 11, 2026 •

edited

Loading

❌ Failed checks (2 warnings)

Review ran into problems

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 11, 2026

Uh oh!

shellybotmoyer commented May 13, 2026

Uh oh!

shellybotmoyer commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shellybotmoyer commented May 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Proposed Fix

Changes

Verification

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (2 warnings)

Review ran into problems

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

shellybotmoyer commented May 13, 2026

Uh oh!

shellybotmoyer commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shellybotmoyer commented May 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 11, 2026 •

edited

Loading