Skip to content

fix: pass images as multimodal content instead of file-based Read tool#54

Open
buuzzy wants to merge 1 commit intoworkany-ai:devfrom
buuzzy:fix/multimodal-image-passthrough
Open

fix: pass images as multimodal content instead of file-based Read tool#54
buuzzy wants to merge 1 commit intoworkany-ai:devfrom
buuzzy:fix/multimodal-image-passthrough

Conversation

@buuzzy
Copy link
Copy Markdown

@buuzzy buuzzy commented Apr 17, 2026

Problem: Images saved to disk → agent told to Read them → SDK Read tool returns [Image file: path (size bytes)] placeholder → model never sees the image.

Fix: Pass images directly as { type: 'image', source: { type: 'base64', ... } } content blocks. The LLM provider converts these to the appropriate API format (OpenAI image_url / Anthropic native).

Depends on open-agent-sdk#14 for OpenAI-compatible endpoints.

Images were saved to disk and the agent was instructed to use the Read
tool to view them. But the SDK's Read tool returns only a placeholder
string for image files, so the model never sees the actual image.

Now images are passed directly as multimodal content blocks in the
prompt, which the LLM provider converts to the appropriate API format.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant