improvement: new image interface modeled after VLM interface by makiroll1125 · Pull Request #304 · CraftOS-dev/CraftBot

makiroll1125 · 2026-06-01T04:04:28Z

Image Generation Feature — Code Overview

This document explains the primary function of each changed file in the improvement/generate-image branch. The feature adds image generation as a first-class capability alongside CraftBot's existing LLM and VLM support, following the same layered architecture.

Architecture at a Glance

The feature is structured in three layers:

agent_core (provider-agnostic library)
  └── InterfaceType.IMAGE_GEN
  └── MODEL_REGISTRY entries
  └── ModelFactory support
  └── ImageGenInterface (core engine)

app (CraftBot application wrappers)
  └── app/image_gen_interface.py  — hooks into CraftBot state
  └── app/config.py               — reads settings.json
  └── app/agent_base.py           — lifecycle management
  └── app/internal_action_interface.py — action entry point
  └── app/data/action/generate_image.py — the callable action

UI (frontend settings)
  └── modelSettingsSlice.ts / selectors / ModelSettings.tsx / model_settings.py / browser_adapter.py

The pattern is identical to how VLM works. Every file has a VLM counterpart; image gen simply adds a parallel track.

agent_core Layer

`agent_core/core/models/types.py`

Defines the InterfaceType enum. A single value was added:

IMAGE_GEN = "image_gen"

This enum is the key used everywhere (registry lookups, factory dispatch, validation) to refer to the image generation interface type.

`agent_core/core/models/model_registry.py`

A dictionary mapping provider → InterfaceType → default model name. Two entries were added for IMAGE_GEN:

openai → "gpt-image-2"
gemini → "gemini-3.1-flash-image-preview"

`agent_core/core/models/factory.py`

ModelFactory.create() was modified so a small guard was added: if the registry returns None for the requested interface+provider combination, and the caller hasn't opted into deferred init, it raises a clear ValueError listing supported providers instead of failing later with a confusing error.

No special image-gen code path was needed as the existing OpenAI and Gemini client construction already covers it.

`agent_core/core/impl/image_gen/interface.py` (new)

The core engine. This is the most substantial new file. It:

Accepts provider, model, api_key, base_url, and optional hooks for token counting and usage reporting (same constructor signature as VLMInterface)
Initializes via ModelFactory.create() (same as VLM)
Dispatches to _openai_generate() or _gemini_generate() based on provider
Handles resolution (1K/2K/4K), aspect ratio, negative prompts, reference images, and safety filters
Saves output images
reinitialize() allows swapping providers at runtime without re-creating the object

VLMInterface has the identical structure with ModelFactory init, provider dispatch, sync+async public methods, same hooks. The main difference is the operation (describe vs. generate images).

`agent_core/core/impl/image_gen/init.py` and `agent_core/core/image_gen_interface.py` (new)

__init__.py exports ImageGenInterface from the impl module; image_gen_interface.py re-exports it at the agent_core.core level. Identical in structure to the VLM equivalents (vlm_interface.py, impl/vlm/__init__.py).

`agent_core/init.py`

Added ImageGenInterface to the package's public exports (__all__), making it importable as from agent_core import ImageGenInterface. Same treatment as VLMInterface.

app Layer

`app/image_gen_interface.py` (new)

CraftBot-specific subclass of the core ImageGenInterface. Its sole job is to inject the three CraftBot state hooks at construction time:

get_token_count / set_token_count — persist per-session token usage to the STATE singleton
report_usage — emit usage events to the billing/usage reporter

The core ImageGenInterface knows nothing about CraftBot's state system; this wrapper bridges that gap. Mirrors app/vlm_interface.py exactly.

`app/config.py`

Two new config accessors added:

get_image_gen_provider() — reads model.image_gen_provider from settings.json (default: "openai")
get_image_gen_model() — reads model.image_gen_model (optional override; None means use registry default)

The default settings dict also gets both keys. Pattern is identical to get_vlm_provider() / get_vlm_model().

`app/agent_base.py`

Manages the lifecycle of the ImageGenInterface instance for a running agent:

Constructor: reads image_gen_provider and image_gen_model from config, creates ImageGenInterface with deferred=True (doesn't hit the API until first use), passes it to InternalActionInterface.initialize()
reinitialize_image_gen(): creates a fresh ImageGenInterface instance and atomically replaces both self.image_gen and InternalActionInterface.image_gen_interface. Fresh-instance approach means any in-flight actions that hold a reference to the old instance complete cleanly. Mirrors the pattern used by the existing reinitialize_llm().

`app/internal_action_interface.py`

The shared class-level registry that actions use to reach CraftBot services without importing the full agent. Two changes:

Added image_gen_interface: Optional[ImageGenInterface] class variable
Added generate_image(**kwargs) classmethod that delegates to cls.image_gen_interface.generate_image(**kwargs)

Pattern is identical to how describe_image() is wired through the VLM interface.

`app/data/action/generate_image.py`

The @action-decorated function that the agent calls. It:

Returns early in simulated_mode (for tests)
Checks the registry to confirm the current provider supports image generation — returns a user-friendly error if not
Validates that prompt is non-empty
Delegates to InternalActionInterface.generate_image() with normalized parameters
Returns a {"status": "success", "image_paths": [...]} dict

Follows the same pattern as app/data/action/describe_image.py (the VLM equivalent): thin action wrapper, provider guard, delegate to interface, return dict.

`app/main.py`

Passes the configured image_gen_provider and image_gen_model through to the AgentBase constructor at startup, so the agent initializes with the right provider from the moment it starts.

UI / Settings Layer

The UI layer lets users configure the image generation provider and API key separately from the LLM provider. Each piece mirrors what was already in place for VLM.

`app/ui_layer/settings/model_settings.py`

Backend settings API. Changes:

get_available_providers() now includes has_image_gen: bool and image_gen_model: str|None on each ProviderInfo, derived from the registry. The frontend uses has_image_gen to filter the provider dropdown.
get_model_settings() returns image_gen_provider and image_gen_model alongside existing fields.
update_model_settings() accepts and saves both, with validation: it rejects any image_gen_provider value not present in the registry before touching settings.json.

`app/ui_layer/adapters/browser_adapter.py`

Handles the WebSocket model_settings_update message from the frontend. Added extraction of imageGenProvider / imageGenModel from the message payload, saving them via update_model_settings(), then calling agent.reinitialize_image_gen() so the running agent immediately switches to the new provider without a restart.

`app/ui_layer/browser/frontend/src/store/slices/modelSettingsSlice.ts`

Redux slice managing model settings state. Added:

imageGenProvider: string and currentImageGenModel: string state fields
setImageGenProvider and setCurrentImageGenModel actions
ProviderInfo interface extended with has_image_gen and image_gen_model
Both model_settings_get and model_settings_update socket message handlers updated to populate the new fields

`app/ui_layer/browser/frontend/src/store/selectors/modelSettings.ts`

Adds selectImageGenProvider and selectCurrentImageGenModel selectors for the two new state fields. Same pattern as the existing LLM/VLM selectors.

`app/ui_layer/browser/frontend/src/pages/Settings/ModelSettings.tsx`

Adds an "Image Generation" section to the Settings page (after VLM, before Slow Mode). Contains:

A provider dropdown filtered to only providers with has_image_gen: true
An API key field (shown only when the provider requires_api_key) with configured/required badge
A model override text input (auto-populated from the provider's default when the provider is switched)
A Save button that sends a model_settings_update socket message

The section is self-contained with its own local state (newImageGenProvider, newImageGenApiKey, newImageGenModel, imageGenHasChanges, isImageGenSaving) and save handler, matching the existing LLM provider section's structure.

ahmad-ajmal · 2026-06-04T02:19:30Z

+            target_model = None
+
+        try:
+            ctx = ModelFactory.create(


Could we not use the same helper function for __init__ and reinitialize?

ahmad-ajmal · 2026-06-04T02:30:16Z

-    openai_key = get_api_key("openai")
-    gemini_key = get_api_key("gemini")
+    image_gen = iai.InternalActionInterface.image_gen_interface
+    current_provider = get_image_gen_provider()


We need a fallback - if user is using a model that does not have image_gen but does have the token for a model that does have image_gen with priority (google, open ai, then others) - then use that model (would also need this login in the image_gen_interface

ahmad-ajmal · 2026-06-04T02:42:43Z

+        self._init_api_key = api_key
+        self._init_base_url = base_url
+
+        self._get_token_count = get_token_count or (lambda: 0)


These aren't used anywhere so the image_gen won't count any usage

ahmad-ajmal · 2026-06-04T02:44:50Z

+            if result.get("success") and image_gen_provider:
+                try:
+                    agent = self._controller.agent
+                    agent.reinitialize_image_gen(image_gen_provider)


update_model_settings saves the new provider to disk before reinitialize_image_gen runs, and the reinit only swaps the instance if ok. So if you switch to a provider whose key isn't configured, reinit fails → settings say the new provider, but the live image_gen_interface still points at the old one.

ahmad-ajmal · 2026-06-04T02:46:44Z

+
+        try:
+            api_key = self._gemini_client._api_key
+            client = genai.Client(api_key=api_key)


This creates a new client on every call to this action. Why not just use self._gemini_client

ahmad-ajmal · 2026-06-04T02:51:57Z

                )
        return llm_ok and vlm_ok

+    def reinitialize_image_gen(self, provider: str | None = None) -> bool:


There's another reinitialize inside the interface.py. Is this duplicated code? Could you use that as the helper and this as the wrapper?

ahmad-ajmal

I selected the model during the hard onboarding phase but this is still blank in my settings,
Also, I think having this separate is fine - then don't need to worry about the fallback. Please ask @zfoong what he thinks about this.

Also, please update the provider versions in requirements.txt file since the current ones don't have the required values.

Please also make sure that you've run ruff lint checks

improvement: base logic for image interface

a365aac

makiroll1125 requested a review from ahmad-ajmal June 1, 2026 04:04

makiroll1125 self-assigned this Jun 1, 2026

ahmad-ajmal reviewed Jun 4, 2026

View reviewed changes

ahmad-ajmal requested changes Jun 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improvement: new image interface modeled after VLM interface#304

improvement: new image interface modeled after VLM interface#304
makiroll1125 wants to merge 1 commit into
V1.3.3from
improvement/generate-image

makiroll1125 commented Jun 1, 2026

Uh oh!

ahmad-ajmal Jun 4, 2026

Uh oh!

ahmad-ajmal Jun 4, 2026

Uh oh!

ahmad-ajmal Jun 4, 2026

Uh oh!

ahmad-ajmal Jun 4, 2026

Uh oh!

ahmad-ajmal Jun 4, 2026 •

edited

Loading

Uh oh!

ahmad-ajmal Jun 4, 2026

Uh oh!

ahmad-ajmal left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

makiroll1125 commented Jun 1, 2026

Image Generation Feature — Code Overview

Architecture at a Glance

agent_core Layer

agent_core/core/models/types.py

agent_core/core/models/model_registry.py

agent_core/core/models/factory.py

agent_core/core/impl/image_gen/interface.py (new)

agent_core/core/impl/image_gen/__init__.py and agent_core/core/image_gen_interface.py (new)

agent_core/__init__.py

app Layer

app/image_gen_interface.py (new)

app/config.py

app/agent_base.py

app/internal_action_interface.py

app/data/action/generate_image.py

app/main.py

UI / Settings Layer

app/ui_layer/settings/model_settings.py

app/ui_layer/adapters/browser_adapter.py

app/ui_layer/browser/frontend/src/store/slices/modelSettingsSlice.ts

app/ui_layer/browser/frontend/src/store/selectors/modelSettings.ts

app/ui_layer/browser/frontend/src/pages/Settings/ModelSettings.tsx

Uh oh!

ahmad-ajmal Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

ahmad-ajmal Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

ahmad-ajmal Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

ahmad-ajmal Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

ahmad-ajmal Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahmad-ajmal Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

ahmad-ajmal left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`agent_core/core/models/types.py`

`agent_core/core/models/model_registry.py`

`agent_core/core/models/factory.py`

`agent_core/core/impl/image_gen/interface.py` (new)

`agent_core/core/impl/image_gen/init.py` and `agent_core/core/image_gen_interface.py` (new)

`agent_core/init.py`

`app/image_gen_interface.py` (new)

`app/config.py`

`app/agent_base.py`

`app/internal_action_interface.py`

`app/data/action/generate_image.py`

`app/main.py`

`app/ui_layer/settings/model_settings.py`

`app/ui_layer/adapters/browser_adapter.py`

`app/ui_layer/browser/frontend/src/store/slices/modelSettingsSlice.ts`

`app/ui_layer/browser/frontend/src/store/selectors/modelSettings.ts`

`app/ui_layer/browser/frontend/src/pages/Settings/ModelSettings.tsx`

ahmad-ajmal Jun 4, 2026 •

edited

Loading