fix: pass images as multimodal content instead of file-based Read tool by buuzzy · Pull Request #54 · workany-ai/workany

buuzzy · 2026-04-17T01:20:01Z

Problem: Images saved to disk → agent told to Read them → SDK Read tool returns [Image file: path (size bytes)] placeholder → model never sees the image.

Fix: Pass images directly as { type: 'image', source: { type: 'base64', ... } } content blocks. The LLM provider converts these to the appropriate API format (OpenAI image_url / Anthropic native).

Depends on open-agent-sdk#14 for OpenAI-compatible endpoints.

Images were saved to disk and the agent was instructed to use the Read tool to view them. But the SDK's Read tool returns only a placeholder string for image files, so the model never sees the actual image. Now images are passed directly as multimodal content blocks in the prompt, which the LLM provider converts to the appropriate API format.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pass images as multimodal content instead of file-based Read tool#54

fix: pass images as multimodal content instead of file-based Read tool#54
buuzzy wants to merge 1 commit intoworkany-ai:devfrom
buuzzy:fix/multimodal-image-passthrough

buuzzy commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

buuzzy commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant