fix: Add fallback for OpenAI-compatible APIs without Structured Outputs#990
fix: Add fallback for OpenAI-compatible APIs without Structured Outputs#990ambicuity wants to merge 2 commits intodatalab-to:masterfrom
Conversation
Fixes datalab-to#985 When using OpenAI-compatible APIs (e.g., DeepSeek) that don't support the response_format parameter (Structured Outputs), the LLM step was failing with 'response_format type is unavailable' error. Changes: - Add openai_disable_structured_output config option for users who know their API doesn't support Structured Outputs - Implement validate_response() to parse JSON from plain text responses - Automatically detect when Structured Outputs fails and fallback to plain chat.completions.create() with schema in the prompt - Backward compatible: existing OpenAI API users are unaffected
|
I have read the CLA Document and I hereby sign the CLA |
There was a problem hiding this comment.
Pull request overview
Adds resilience when using OpenAI-compatible providers that reject Structured Outputs (response_format) by introducing a configurable and automatic fallback to plain chat completions plus JSON parsing.
Changes:
- Added
openai_disable_structured_outputconfig flag to force-disable Structured Outputs for incompatible providers. - Implemented a plain-text JSON parsing fallback (
validate_response) and a fallback call path that embeds the schema in the prompt. - Added automatic fallback on certain 400 BadRequest errors indicating
response_formatis unsupported.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| not use_fallback | ||
| and "response_format" in error_msg | ||
| or "unavailable" in error_msg |
There was a problem hiding this comment.
The BadRequestError fallback condition has incorrect operator precedence: ... and "response_format" in error_msg or "unavailable" in error_msg will trigger whenever "unavailable" appears, even when use_fallback is already true. This can cause misleading logs and repeated retries in fallback mode for unrelated BadRequestError causes (e.g., model unavailable). Group the condition so the fallback only triggers when not already in fallback, e.g. if (not use_fallback) and ("response_format" in error_msg or "unavailable" in error_msg): (and consider tightening the "unavailable" match to response_format-specific messages).
| not use_fallback | |
| and "response_format" in error_msg | |
| or "unavailable" in error_msg | |
| (not use_fallback) | |
| and ( | |
| "response_format" in error_msg | |
| or "unavailable" in error_msg | |
| ) |
| def _call_with_structured_output( | ||
| self, | ||
| client: openai.OpenAI, | ||
| messages: list, | ||
| response_schema: type[BaseModel], | ||
| timeout: int, | ||
| ) -> dict | None: | ||
| """Call OpenAI API using Structured Outputs (response_format).""" |
There was a problem hiding this comment.
_call_with_structured_output is annotated to return dict | None, but it actually returns a 2-tuple (parsed_result, total_tokens). This makes the helper misleading for callers/type checkers and inconsistent with how it’s used in __call__ via tuple-unpacking. Update the return type annotation (and docstring) to reflect the tuple return (e.g., tuple[dict | None, int]) and keep it consistent with _call_with_fallback.
| def _call_with_fallback( | ||
| self, | ||
| client: openai.OpenAI, | ||
| messages: list, | ||
| response_schema: type[BaseModel], | ||
| timeout: int, | ||
| ) -> dict | None: | ||
| """ |
There was a problem hiding this comment.
_call_with_fallback is annotated to return dict | None, but it returns a 2-tuple (parsed, total_tokens) (see the return at the end of the method). Align the return type annotation (and docstring) with the actual return value (e.g., tuple[dict | None, int]) to avoid confusion and improve static analysis.
| # Track if we should use fallback mode | ||
| use_fallback = self.openai_disable_structured_output | ||
|
|
||
| total_tries = max_retries + 1 | ||
| for tries in range(1, total_tries + 1): | ||
| try: | ||
| response = client.beta.chat.completions.parse( | ||
| extra_headers={ | ||
| "X-Title": "Marker", | ||
| "HTTP-Referer": "https://github.com/datalab-to/marker", | ||
| }, | ||
| model=self.openai_model, | ||
| messages=messages, | ||
| timeout=timeout, | ||
| response_format=response_schema, | ||
| ) | ||
| response_text = response.choices[0].message.content | ||
| total_tokens = response.usage.total_tokens | ||
| if use_fallback: | ||
| result, total_tokens = self._call_with_fallback( | ||
| client, messages, response_schema, timeout | ||
| ) | ||
| else: | ||
| result, total_tokens = self._call_with_structured_output( | ||
| client, messages, response_schema, timeout | ||
| ) | ||
|
|
||
| if result is None: | ||
| logger.warning("LLM did not return a valid response") | ||
| return {} | ||
|
|
||
| if block: | ||
| block.update_metadata( | ||
| llm_tokens_used=total_tokens, llm_request_count=1 | ||
| ) | ||
| return json.loads(response_text) | ||
| return result | ||
|
|
||
| except BadRequestError as e: | ||
| # Check if this is a "response_format not supported" error | ||
| error_msg = str(e).lower() | ||
| if ( | ||
| not use_fallback | ||
| and "response_format" in error_msg | ||
| or "unavailable" in error_msg | ||
| ): | ||
| logger.warning( | ||
| f"Structured Outputs not supported by this API, falling back to plain completions: {e}" | ||
| ) | ||
| use_fallback = True | ||
| # Retry immediately with fallback |
There was a problem hiding this comment.
This PR introduces new fallback behavior (disable structured outputs via config and automatic fallback on specific 400s), but there are no unit tests covering: (1) --openai_disable_structured_output forcing the plain-completions path, (2) a BadRequestError on structured output switching to fallback, and (3) validate_response() correctly parsing fenced JSON. Adding tests with a mocked OpenAI client would help prevent regressions across providers.
Addresses Copilot review feedback. Adds tests covering: - validate_response() JSON parsing with various formats - openai_disable_structured_output config option - Automatic fallback on BadRequestError for unsupported response_format
|
ok, verify correct, can mark it 'finished',thx |
Summary
Fixes #985
When using OpenAI-compatible APIs (e.g., DeepSeek) that don't support the
response_formatparameter (Structured Outputs), the LLM step was failing withresponse_format type is unavailableerror.Changes
openai_disable_structured_outputconfig option for users who know their API doesn't support Structured Outputsvalidate_response()method to parse JSON from plain text responses (pattern fromClaudeService)chat.completions.create()with schema in the promptBackward Compatibility
Existing OpenAI API users are unaffected - Structured Outputs is still tried first. The fallback only activates when:
--openai_disable_structured_output, ORUsage
# For APIs that don't support Structured Outputs marker_single document.pdf --use_llm --openai_disable_structured_output \ --openai_base_url https://api.deepseek.com \ --openai_model deepseek-reasoner \ --openai_api_key YOUR_KEY