Skip to content

fix: vision_load crash recovery and robust image message handling#1020

Open
hurtdidit wants to merge 1 commit intoagent0ai:developmentfrom
hurtdidit:fix/vision-load-crash-recovery
Open

fix: vision_load crash recovery and robust image message handling#1020
hurtdidit wants to merge 1 commit intoagent0ai:developmentfrom
hurtdidit:fix/vision-load-crash-recovery

Conversation

@hurtdidit
Copy link
Contributor

Summary

Fixes a critical bug where the vision_load tool crashes chat sessions permanently when a provider rejects image content (HTTP 400). Once crashed, the chat cannot recover — every subsequent message re-triggers the same error because the raw image data persists in serialized history.

This 4-part fix adds defensive handling at multiple layers to prevent crashes and enable self-healing recovery.

Problem

Root Cause

When vision_load processes an image, it creates a RawMessage with base64-encoded image data in OpenAI multimodal format (image_url content blocks). This works fine for providers that support it natively, but fails catastrophically when:

  1. The provider rejects the image format (HTTP 400)
  2. The image is too large or in an unsupported format
  3. An OpenAI-compatible proxy routing to a non-OpenAI model doesn't translate the image format correctly

Crash Sequence

vision_load adds RawMessage with base64 image to chat history
    ↓
Next monologue loop: call_chat_model() sends entire history to LLM
    ↓
Provider returns HTTP 400 ("Bad Request" / format rejection)
    ↓  
retry_critical_exception() retries with SAME history → SAME 400
    ↓
handle_critical_exception() kills the monologue loop
    ↓
Image data persists in history JSON on disk
    ↓
Next user message → new monologue → sends same history → same 400
    ↓
💥 PERMANENT CRASH LOOP — chat is unrecoverable

Additional Issues Found

  • summarize_messages() (history.py): Had a FIXME noting that vision bytes get sent to the utility LLM during context compression. While output_text() returns preview strings for RawMessage, there was no explicit guard.
  • _merge_outputs() (history.py): When group_messages_abab() merges consecutive same-role messages, RawMessage dicts can get mixed with text content, creating invalid message structures.

Fix (4 Parts)

Part 1: Explicit raw message handling in summarize_messages() (history.py)

  • Replaced the FIXME with proper handling
  • Before building text for the utility model, checks _is_raw_message() on each message
  • Raw messages are replaced with their preview string (e.g., "<non-text content>")

Part 2: Safe message merging in _merge_outputs() (history.py)

  • Added guard at the top: if either operand is a RawMessage, convert to text preview via _stringify_content() before merging
  • Prevents group_messages_abab() from creating invalid mixed-format message structures

Part 3: Error recovery in vision_load.py

  • Wrapped hist_add_message() call in try/except
  • On failure: logs error and adds a descriptive text fallback via hist_add_tool_result()
  • Prevents corrupted image data from persisting if the history add itself fails

Part 4: Self-healing crash loop recovery in agent.py

This is the critical fix that breaks the permanent crash loop.

  • Added _strip_raw_images_from_history() method that scans history.current.messages and history.topics for RawMessage content, replacing each with a text preview via msg.set_summary()
  • Modified both exception handlers in monologue() (inner message loop + outer monologue loop):
    • On BadRequestError / 400 status → calls _strip_raw_images_from_history()
    • If images were stripped → continue (retry without bad images)
    • If no images found → falls through to existing retry_critical_exception() logic

Files Changed

File Changes Description
python/helpers/history.py ~+23 Parts 1 & 2: summarize_messages guard, _merge_outputs guard
python/tools/vision_load.py ~+21/-5 Part 3: try/except error recovery in after_execution
agent.py ~+58 Part 4: _strip_raw_images_from_history + exception handler guards

Testing

Test Case Result
Fresh chat + image load ✅ Works normally
Image that triggers provider 400 ✅ Part 4 catches error, strips image, chat continues
Previously crashed/stuck chat Recovered — user can continue chatting
Long conversation after image (compression trigger) ✅ Parts 1 & 2 handle summarization/merging safely
Multiple images in conversation ✅ All stripped on error, chat survives

Tested with OpenAI-compatible proxy routing to a non-OpenAI model (triggers the 400 rejection).

Reproduction Steps (Without Fix)

  1. Configure Agent Zero with an LLM provider that rejects certain image formats (e.g., OpenAI-compatible proxy to a non-OpenAI model)
  2. Start a new chat
  3. Use vision_load to attach an image
  4. Observe: provider returns 400 → chat crashes
  5. Try sending another message → same 400 → permanent crash
  6. Restart container → same chat still crashes on any message

Backward Compatibility

  • No API changes
  • No configuration changes required
  • Purely additive error handling — existing functionality unchanged for providers that accept images normally
  • The fix gracefully degrades: if images work fine, none of these code paths activate

Related

  • Resolves the FIXME comment at history.py line ~218 ("vision bytes are sent to utility LLM")
  • Improves resilience for all LLM providers, not just the specific proxy configuration that exposed the bug

Fixes a critical bug where vision_load permanently crashes chat sessions
when a provider rejects image content (HTTP 400). The raw image data
persists in serialized history, causing every subsequent message to
re-trigger the same error in an unrecoverable loop.

4-part fix:
1. history.py: Explicit raw message handling in summarize_messages()
   - Replaces FIXME; guards against vision bytes reaching utility LLM
2. history.py: Safe merge guard in _merge_outputs()
   - Prevents RawMessage/text content mixing during ABAB grouping
3. vision_load.py: Error recovery with text fallback
   - try/except around hist_add_message with graceful degradation
4. agent.py: Self-healing crash loop recovery (key fix)
   - _strip_raw_images_from_history() scans and replaces RawMessages
   - Both monologue exception handlers detect BadRequestError/400,
     strip images, and retry cleanly instead of entering crash loop

Tested with OpenAI-compatible proxy routing to non-OpenAI model.
Confirmed: fresh chats work, crash-triggering images recover gracefully,
and previously stuck chats become usable again.
@hurtdidit
Copy link
Contributor Author

I've an image that consistently causes the crash (a screenshot, I suspect characters in the file name are to blame) -- if you'd like me to send it to you for easier testing/replication, just DM me in Discord.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant