Skip to content

Extract image generation logic into dedicated module#110

Merged
adambalogh merged 1 commit into
claude/quirky-thompson-7uutdafrom
claude/nice-clarke-rcagb3
Jun 24, 2026
Merged

Extract image generation logic into dedicated module#110
adambalogh merged 1 commit into
claude/quirky-thompson-7uutdafrom
claude/nice-clarke-rcagb3

Conversation

@adambalogh

Copy link
Copy Markdown
Contributor

Summary

Refactor image generation functionality from llm_backend.py and chat_controller.py into a new dedicated image_generation.py module. This improves code organization by centralizing all endpoint-based image generation logic (xAI Aurora, ByteDance Seedream/Seedance, Z.ai GLM-Image) in one place.

Key Changes

  • New module tee_gateway/image_generation.py: Owns the complete image generation flow:

    • generate_images() — shapes provider requests, handles response format conversion (b64_json → data URI, hosted URLs → fetched inline bytes)
    • create_image_generation_response() / create_image_generation_streaming_response() — surfaces results on /v1/chat/completions with images out-of-band
    • _fetch_url_as_data_uri() — fetches provider-hosted image URLs into the enclave so clients always receive data: URIs
    • _extract_image_inputs() — extracts text prompt and reference images from chat messages for image-to-image editing
  • Moved from llm_backend.py:

    • generate_images() function (now with URL-fetch capability)
    • _IMAGE_GENERATION_PATH constant
  • Moved from chat_controller.py:

    • _extract_image_prompt() → renamed to _extract_image_inputs() with enhanced reference-image extraction
    • _extract_reference_images() → merged into _extract_image_inputs()
    • _create_image_generation_response()create_image_generation_response()
    • _create_image_generation_streaming_response()create_image_generation_streaming_response()
  • Enhanced model_registry.py: Added per-provider image generation configuration fields:

    • image_response_format — controls request format ("b64_json" vs "url" vs None)
    • image_send_n — whether endpoint accepts OpenAI-style n parameter
    • image_supports_reference — whether endpoint accepts reference images for image-to-image editing
    • image_extra_params — static extra params (e.g., size, watermark) merged into requests
  • Updated chat_controller.py: Imports image generation functions from new module; call sites unchanged

  • Updated tests: test_image_generation.py now imports from image_generation module; added tests for _fetch_url_as_data_uri() URL-fetch behavior

Implementation Details

  • Provider-specific request quirks (response format, n support, reference-image editing, extra params) are now declaratively configured in the model registry rather than hardcoded in the generation logic, keeping the core module flat and maintainable
  • All provider-hosted image URLs are fetched into the enclave and converted to data: URIs, ensuring clients never see raw provider URLs
  • Image-to-image editing (reference images) is now properly extracted from multimodal chat messages and forwarded to providers that support it
  • Billing remains flat per generated image, computed via compute_session_cost() with image_count parameter

https://claude.ai/code/session_01BHGhsd68znPWoooHvtLFD5

…ytes

The endpoint-based image-generation code (xAI Grok, ByteDance
Seedream/Seedance, Z.ai GLM-Image) had grown convoluted and inconsistent,
spread across llm_backend.py and chat_controller.py with two near-duplicate
streaming/non-streaming responders, two separate message-walk helpers, and a
return type that was sometimes inline bytes and sometimes a raw provider URL.

Changes:

- New tee_gateway/image_generation.py owns the whole flow: request shaping,
  the provider call, URL→inline-bytes fetching, and the signed chat-completion
  responders. llm_backend.py and chat_controller.py just route to it.

- generate_images() now ALWAYS returns data: URIs. Providers that hand back a
  hosted URL (Z.ai, Seedance) are fetched inside the enclave and inlined, so
  the client always receives bytes — never a raw URL. This also matches what
  the chat-app already expects (it caches data URIs; raw URLs weren't cached).
  Image-to-image editing already rides inline as a data: URI on the user turn.

- Per-provider request quirks (response_format, n support, reference-image
  editing, extra params like size/watermark) move out of branchy if/elif code
  into declarative fields on ModelConfig in model_registry.py.

- The two streaming/non-streaming responders collapse onto one shared core
  (_run_image_generation), and the two message-walk helpers collapse into a
  single _extract_image_inputs pass returning (prompt, reference_images).

Net: ~300 lines removed from the two original files, image logic isolated.
Tests updated to patch the new URL fetch and assert inline-bytes output;
added direct coverage for _fetch_url_as_data_uri.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BHGhsd68znPWoooHvtLFD5
@adambalogh adambalogh marked this pull request as ready for review June 24, 2026 14:44
@adambalogh adambalogh merged commit 56565d0 into claude/quirky-thompson-7uutda Jun 24, 2026
5 checks passed
adambalogh pushed a commit that referenced this pull request Jun 24, 2026
…l fix

Responds to the Copilot review on #107 plus a regression the main merge
surfaced.

image_generation.py:
- _fetch_url_as_data_uri is now hardened against SSRF/egress abuse: http(s)
  schemes only, IP-literal hosts in private/loopback/link-local/non-global
  ranges rejected, redirects capped, and the body streamed with a 25 MiB cap
  (declared Content-Length and actual bytes both checked) instead of buffering
  an unbounded response. It is only ever called with provider-response URLs,
  never client input; the docstring now states that invariant.
- _extract_image_inputs returns only the latest user turn's reference images
  (each turn replaces the set; a text-only turn clears it) so stale images from
  earlier edits don't pile up toward the provider's 10-image cap. Text and
  reference values are coerced/filtered to strings so malformed input can't
  break downstream JSON serialization.
- generate_images filters reference_images to non-empty strings before
  clamping/forwarding.

model_registry.py:
- Fix a regression from merging main: Seedream 5.0 Lite (added in #109) is an
  "ep-" ModelArk deployment endpoint that relied on the old startswith("ep-")
  auto-detection for the URL/no-n/watermark payload, which the #110 refactor
  replaced with explicit fields. It was defaulting to b64_json+n. Both ep-
  models now share _BYTEDANCE_EP_IMAGE_PARAMS and the url/no-n/reference config.

Tests: streaming-aware fetch tests + SSRF/size-cap coverage, _extract_image_inputs
parsing/latest-turn/robustness tests, and a Seedream 5.0 Lite payload regression
guard. 26 tests pass; ruff + mypy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BHGhsd68znPWoooHvtLFD5
adambalogh added a commit that referenced this pull request Jun 24, 2026
* Forward reference images to Seedream/Seedance image editing

Endpoint-based image-generation models (Seedream, Seedance) are served
via the dedicated /images/generations endpoint, which previously only
received a text prompt. Two bugs meant follow-up edits like "add a hat"
silently ignored the previously generated image:

1. generate_images() never forwarded any input image, so image-to-image
   editing was impossible — the client attaches the prior image to the
   latest user turn, but the gateway dropped it.
2. _extract_image_prompt() stringified multimodal list content with
   str(), splicing the base64 image blob into the prompt text instead of
   extracting the actual prompt.

Now the controller pulls reference images out of the user turns and
forwards them to ByteDance's `image` field (URL or base64 data URI, array
up to 10), and the prompt extractor reads only text parts. Chat-path
image-output models (Gemini "nano banana") already worked because they
receive the full multimodal history natively; this brings the
images-endpoint models to parity.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AwcCXVgsXGqGQGzKWXhMfj

* Refactor image generation into its own module; always return inline bytes (#110)

The endpoint-based image-generation code (xAI Grok, ByteDance
Seedream/Seedance, Z.ai GLM-Image) had grown convoluted and inconsistent,
spread across llm_backend.py and chat_controller.py with two near-duplicate
streaming/non-streaming responders, two separate message-walk helpers, and a
return type that was sometimes inline bytes and sometimes a raw provider URL.

Changes:

- New tee_gateway/image_generation.py owns the whole flow: request shaping,
  the provider call, URL→inline-bytes fetching, and the signed chat-completion
  responders. llm_backend.py and chat_controller.py just route to it.

- generate_images() now ALWAYS returns data: URIs. Providers that hand back a
  hosted URL (Z.ai, Seedance) are fetched inside the enclave and inlined, so
  the client always receives bytes — never a raw URL. This also matches what
  the chat-app already expects (it caches data URIs; raw URLs weren't cached).
  Image-to-image editing already rides inline as a data: URI on the user turn.

- Per-provider request quirks (response_format, n support, reference-image
  editing, extra params like size/watermark) move out of branchy if/elif code
  into declarative fields on ModelConfig in model_registry.py.

- The two streaming/non-streaming responders collapse onto one shared core
  (_run_image_generation), and the two message-walk helpers collapse into a
  single _extract_image_inputs pass returning (prompt, reference_images).

Net: ~300 lines removed from the two original files, image logic isolated.
Tests updated to patch the new URL fetch and assert inline-bytes output;
added direct coverage for _fetch_url_as_data_uri.


Claude-Session: https://claude.ai/code/session_01BHGhsd68znPWoooHvtLFD5

Co-authored-by: Claude <noreply@anthropic.com>

* Address PR review: harden image URL fetch, latest-turn refs, ep- model fix

Responds to the Copilot review on #107 plus a regression the main merge
surfaced.

image_generation.py:
- _fetch_url_as_data_uri is now hardened against SSRF/egress abuse: http(s)
  schemes only, IP-literal hosts in private/loopback/link-local/non-global
  ranges rejected, redirects capped, and the body streamed with a 25 MiB cap
  (declared Content-Length and actual bytes both checked) instead of buffering
  an unbounded response. It is only ever called with provider-response URLs,
  never client input; the docstring now states that invariant.
- _extract_image_inputs returns only the latest user turn's reference images
  (each turn replaces the set; a text-only turn clears it) so stale images from
  earlier edits don't pile up toward the provider's 10-image cap. Text and
  reference values are coerced/filtered to strings so malformed input can't
  break downstream JSON serialization.
- generate_images filters reference_images to non-empty strings before
  clamping/forwarding.

model_registry.py:
- Fix a regression from merging main: Seedream 5.0 Lite (added in #109) is an
  "ep-" ModelArk deployment endpoint that relied on the old startswith("ep-")
  auto-detection for the URL/no-n/watermark payload, which the #110 refactor
  replaced with explicit fields. It was defaulting to b64_json+n. Both ep-
  models now share _BYTEDANCE_EP_IMAGE_PARAMS and the url/no-n/reference config.

Tests: streaming-aware fetch tests + SSRF/size-cap coverage, _extract_image_inputs
parsing/latest-turn/robustness tests, and a Seedream 5.0 Lite payload regression
guard. 26 tests pass; ruff + mypy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BHGhsd68znPWoooHvtLFD5

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants