Skip to content

fix: support multimodal image content in OpenAI provider#14

Open
buuzzy wants to merge 1 commit intocodeany-ai:mainfrom
buuzzy:fix/openai-multimodal-image
Open

fix: support multimodal image content in OpenAI provider#14
buuzzy wants to merge 1 commit intocodeany-ai:mainfrom
buuzzy:fix/openai-multimodal-image

Conversation

@buuzzy
Copy link
Copy Markdown

@buuzzy buuzzy commented Apr 17, 2026

The OpenAI provider drops { type: 'image' } content blocks during message conversion, so images are never sent to the model when using OpenAI-compatible endpoints.

Changes:

  • convertUserMessage() now converts Anthropic-style image blocks ({ type: 'base64', media_type, data }) to OpenAI's image_url format
  • Also supports URL-based image sources
  • Text-only messages keep simple string format for backward compatibility
  • query() prompt type widened to string | any[] for multimodal content arrays

The OpenAI provider's convertUserMessage() only handled 'text' and
'tool_result' content blocks, silently dropping 'image' blocks.

Changes:
- Convert Anthropic-style image blocks to OpenAI image_url format
- Support base64 and URL image sources
- Use detail:'high' for best recognition quality
- Fall back to string content when no images present
- Accept string | any[] prompt in query() for multimodal content
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant