Add image-to-image editing support for ByteDance models by adambalogh · Pull Request #107 · OpenGradient/tee-gateway

adambalogh · 2026-06-23T21:41:30Z

Summary

Adds support for image-to-image editing on ByteDance Seedream/Seedance image generation models by extracting reference images from user messages and forwarding them to the provider's /images/generations endpoint.

Changes

Enhanced _extract_image_prompt(): Refactored to properly handle multimodal message content (lists of content blocks). Now correctly extracts only text parts from messages with attached images, avoiding base64 image data being spliced into the prompt.
New _extract_reference_images() function: Collects reference image URLs and base64 data URIs from user message content blocks. Supports both OpenAI-style image_url blocks and LangChain-style image blocks with inline base64 or URLs.
Updated generate_images() signature: Added optional reference_images parameter to accept a list of image URLs/data URIs. Implements ByteDance-specific logic to forward these as the image field in the request payload (single image as string, multiple as array, clamped to 10 max per provider limits). Non-ByteDance providers ignore the parameter.
Updated call sites: Both _create_image_generation_response() and generate() now extract and pass reference images to generate_images().
Test coverage: Added four new tests validating:
- Single reference image forwarded as bare string
- Multiple reference images forwarded as array
- Reference images clamped to 10-image limit
- Reference images ignored for non-ByteDance providers (xAI/Z.ai)

Implementation Details

Reference images are extracted from the latest user turns in the message history; in practice only the most recent turn carries active references.
Base64 images are converted to data: URIs with appropriate MIME type (defaults to image/png).
The image field is only added to ByteDance payloads to avoid rejection by other providers' text-to-image endpoints.
Enables image-to-image editing workflows where follow-up prompts like "add a hat" build on previously generated images rather than generating fresh images from prompt text alone.

https://claude.ai/code/session_01AwcCXVgsXGqGQGzKWXhMfj

Endpoint-based image-generation models (Seedream, Seedance) are served via the dedicated /images/generations endpoint, which previously only received a text prompt. Two bugs meant follow-up edits like "add a hat" silently ignored the previously generated image: 1. generate_images() never forwarded any input image, so image-to-image editing was impossible — the client attaches the prior image to the latest user turn, but the gateway dropped it. 2. _extract_image_prompt() stringified multimodal list content with str(), splicing the base64 image blob into the prompt text instead of extracting the actual prompt. Now the controller pulls reference images out of the user turns and forwards them to ByteDance's `image` field (URL or base64 data URI, array up to 10), and the prompt extractor reads only text parts. Chat-path image-output models (Gemini "nano banana") already worked because they receive the full multimodal history natively; this brings the images-endpoint models to parity. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01AwcCXVgsXGqGQGzKWXhMfj

…ytes (#110) The endpoint-based image-generation code (xAI Grok, ByteDance Seedream/Seedance, Z.ai GLM-Image) had grown convoluted and inconsistent, spread across llm_backend.py and chat_controller.py with two near-duplicate streaming/non-streaming responders, two separate message-walk helpers, and a return type that was sometimes inline bytes and sometimes a raw provider URL. Changes: - New tee_gateway/image_generation.py owns the whole flow: request shaping, the provider call, URL→inline-bytes fetching, and the signed chat-completion responders. llm_backend.py and chat_controller.py just route to it. - generate_images() now ALWAYS returns data: URIs. Providers that hand back a hosted URL (Z.ai, Seedance) are fetched inside the enclave and inlined, so the client always receives bytes — never a raw URL. This also matches what the chat-app already expects (it caches data URIs; raw URLs weren't cached). Image-to-image editing already rides inline as a data: URI on the user turn. - Per-provider request quirks (response_format, n support, reference-image editing, extra params like size/watermark) move out of branchy if/elif code into declarative fields on ModelConfig in model_registry.py. - The two streaming/non-streaming responders collapse onto one shared core (_run_image_generation), and the two message-walk helpers collapse into a single _extract_image_inputs pass returning (prompt, reference_images). Net: ~300 lines removed from the two original files, image logic isolated. Tests updated to patch the new URL fetch and assert inline-bytes output; added direct coverage for _fetch_url_as_data_uri. Claude-Session: https://claude.ai/code/session_01BHGhsd68znPWoooHvtLFD5 Co-authored-by: Claude <noreply@anthropic.com>

…l fix Responds to the Copilot review on #107 plus a regression the main merge surfaced. image_generation.py: - _fetch_url_as_data_uri is now hardened against SSRF/egress abuse: http(s) schemes only, IP-literal hosts in private/loopback/link-local/non-global ranges rejected, redirects capped, and the body streamed with a 25 MiB cap (declared Content-Length and actual bytes both checked) instead of buffering an unbounded response. It is only ever called with provider-response URLs, never client input; the docstring now states that invariant. - _extract_image_inputs returns only the latest user turn's reference images (each turn replaces the set; a text-only turn clears it) so stale images from earlier edits don't pile up toward the provider's 10-image cap. Text and reference values are coerced/filtered to strings so malformed input can't break downstream JSON serialization. - generate_images filters reference_images to non-empty strings before clamping/forwarding. model_registry.py: - Fix a regression from merging main: Seedream 5.0 Lite (added in #109) is an "ep-" ModelArk deployment endpoint that relied on the old startswith("ep-") auto-detection for the URL/no-n/watermark payload, which the #110 refactor replaced with explicit fields. It was defaulting to b64_json+n. Both ep- models now share _BYTEDANCE_EP_IMAGE_PARAMS and the url/no-n/reference config. Tests: streaming-aware fetch tests + SSRF/size-cap coverage, _extract_image_inputs parsing/latest-turn/robustness tests, and a Seedream 5.0 Lite payload regression guard. 26 tests pass; ruff + mypy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BHGhsd68znPWoooHvtLFD5

adambalogh marked this pull request as ready for review June 24, 2026 14:15

adambalogh requested a review from Copilot June 24, 2026 14:15

Copilot started reviewing on behalf of adambalogh June 24, 2026 14:16 View session

This comment was marked as outdated.

Sign in to view

adambalogh and others added 2 commits June 24, 2026 10:44

Merge branch 'main' into claude/quirky-thompson-7uutda

754f7be

adambalogh requested a review from Copilot June 24, 2026 14:51

Copilot started reviewing on behalf of adambalogh June 24, 2026 14:51 View session

This comment was marked as duplicate.

Sign in to view

adambalogh merged commit d4af6bf into main Jun 24, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add image-to-image editing support for ByteDance models#107

Add image-to-image editing support for ByteDance models#107
adambalogh merged 4 commits into
mainfrom
claude/quirky-thompson-7uutda

adambalogh commented Jun 23, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as duplicate.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

adambalogh commented Jun 23, 2026

Summary

Changes

Implementation Details

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as duplicate.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants