improvement: new image interface modeled after VLM interface#304
improvement: new image interface modeled after VLM interface#304makiroll1125 wants to merge 1 commit into
Conversation
| target_model = None | ||
|
|
||
| try: | ||
| ctx = ModelFactory.create( |
There was a problem hiding this comment.
Could we not use the same helper function for __init__ and reinitialize?
| openai_key = get_api_key("openai") | ||
| gemini_key = get_api_key("gemini") | ||
| image_gen = iai.InternalActionInterface.image_gen_interface | ||
| current_provider = get_image_gen_provider() |
There was a problem hiding this comment.
We need a fallback - if user is using a model that does not have image_gen but does have the token for a model that does have image_gen with priority (google, open ai, then others) - then use that model (would also need this login in the image_gen_interface
| self._init_api_key = api_key | ||
| self._init_base_url = base_url | ||
|
|
||
| self._get_token_count = get_token_count or (lambda: 0) |
There was a problem hiding this comment.
These aren't used anywhere so the image_gen won't count any usage
| if result.get("success") and image_gen_provider: | ||
| try: | ||
| agent = self._controller.agent | ||
| agent.reinitialize_image_gen(image_gen_provider) |
There was a problem hiding this comment.
update_model_settings saves the new provider to disk before reinitialize_image_gen runs, and the reinit only swaps the instance if ok. So if you switch to a provider whose key isn't configured, reinit fails → settings say the new provider, but the live image_gen_interface still points at the old one.
|
|
||
| try: | ||
| api_key = self._gemini_client._api_key | ||
| client = genai.Client(api_key=api_key) |
There was a problem hiding this comment.
This creates a new client on every call to this action. Why not just use self._gemini_client
| ) | ||
| return llm_ok and vlm_ok | ||
|
|
||
| def reinitialize_image_gen(self, provider: str | None = None) -> bool: |
There was a problem hiding this comment.
There's another reinitialize inside the interface.py. Is this duplicated code? Could you use that as the helper and this as the wrapper?
ahmad-ajmal
left a comment
There was a problem hiding this comment.
I selected the model during the hard onboarding phase but this is still blank in my settings,
Also, I think having this separate is fine - then don't need to worry about the fallback. Please ask @zfoong what he thinks about this.
Also, please update the provider versions in requirements.txt file since the current ones don't have the required values.
Please also make sure that you've run ruff lint checks
Image Generation Feature — Code Overview
This document explains the primary function of each changed file in the
improvement/generate-imagebranch. The feature adds image generation as a first-class capability alongside CraftBot's existing LLM and VLM support, following the same layered architecture.Architecture at a Glance
The feature is structured in three layers:
The pattern is identical to how VLM works. Every file has a VLM counterpart; image gen simply adds a parallel track.
agent_core Layer
agent_core/core/models/types.pyDefines the
InterfaceTypeenum. A single value was added:This enum is the key used everywhere (registry lookups, factory dispatch, validation) to refer to the image generation interface type.
agent_core/core/models/model_registry.pyA dictionary mapping
provider → InterfaceType → default model name. Two entries were added forIMAGE_GEN:openai→"gpt-image-2"gemini→"gemini-3.1-flash-image-preview"agent_core/core/models/factory.pyModelFactory.create()was modified so a small guard was added: if the registry returnsNonefor the requested interface+provider combination, and the caller hasn't opted into deferred init, it raises a clearValueErrorlisting supported providers instead of failing later with a confusing error.No special image-gen code path was needed as the existing OpenAI and Gemini client construction already covers it.
agent_core/core/impl/image_gen/interface.py(new)The core engine. This is the most substantial new file. It:
provider,model,api_key,base_url, and optional hooks for token counting and usage reporting (same constructor signature asVLMInterface)ModelFactory.create()(same as VLM)_openai_generate()or_gemini_generate()based on providerreinitialize()allows swapping providers at runtime without re-creating the objectVLMInterface has the identical structure with
ModelFactoryinit, provider dispatch, sync+async public methods, same hooks. The main difference is the operation (describe vs. generate images).agent_core/core/impl/image_gen/__init__.pyandagent_core/core/image_gen_interface.py(new)__init__.pyexportsImageGenInterfacefrom the impl module;image_gen_interface.pyre-exports it at theagent_core.corelevel. Identical in structure to the VLM equivalents (vlm_interface.py,impl/vlm/__init__.py).agent_core/__init__.pyAdded
ImageGenInterfaceto the package's public exports (__all__), making it importable asfrom agent_core import ImageGenInterface. Same treatment asVLMInterface.app Layer
app/image_gen_interface.py(new)CraftBot-specific subclass of the core
ImageGenInterface. Its sole job is to inject the three CraftBot state hooks at construction time:get_token_count/set_token_count— persist per-session token usage to theSTATEsingletonreport_usage— emit usage events to the billing/usage reporterThe core
ImageGenInterfaceknows nothing about CraftBot's state system; this wrapper bridges that gap. Mirrorsapp/vlm_interface.pyexactly.app/config.pyTwo new config accessors added:
get_image_gen_provider()— readsmodel.image_gen_providerfrom settings.json (default:"openai")get_image_gen_model()— readsmodel.image_gen_model(optional override;Nonemeans use registry default)The default settings dict also gets both keys. Pattern is identical to
get_vlm_provider()/get_vlm_model().app/agent_base.pyManages the lifecycle of the
ImageGenInterfaceinstance for a running agent:image_gen_providerandimage_gen_modelfrom config, createsImageGenInterfacewithdeferred=True(doesn't hit the API until first use), passes it toInternalActionInterface.initialize()reinitialize_image_gen(): creates a freshImageGenInterfaceinstance and atomically replaces bothself.image_genandInternalActionInterface.image_gen_interface. Fresh-instance approach means any in-flight actions that hold a reference to the old instance complete cleanly. Mirrors the pattern used by the existingreinitialize_llm().app/internal_action_interface.pyThe shared class-level registry that actions use to reach CraftBot services without importing the full agent. Two changes:
image_gen_interface: Optional[ImageGenInterface]class variablegenerate_image(**kwargs)classmethod that delegates tocls.image_gen_interface.generate_image(**kwargs)Pattern is identical to how
describe_image()is wired through the VLM interface.app/data/action/generate_image.pyThe
@action-decorated function that the agent calls. It:simulated_mode(for tests)promptis non-emptyInternalActionInterface.generate_image()with normalized parameters{"status": "success", "image_paths": [...]}dictFollows the same pattern as
app/data/action/describe_image.py(the VLM equivalent): thin action wrapper, provider guard, delegate to interface, return dict.app/main.pyPasses the configured
image_gen_providerandimage_gen_modelthrough to theAgentBaseconstructor at startup, so the agent initializes with the right provider from the moment it starts.UI / Settings Layer
The UI layer lets users configure the image generation provider and API key separately from the LLM provider. Each piece mirrors what was already in place for VLM.
app/ui_layer/settings/model_settings.pyBackend settings API. Changes:
get_available_providers()now includeshas_image_gen: boolandimage_gen_model: str|Noneon eachProviderInfo, derived from the registry. The frontend useshas_image_gento filter the provider dropdown.get_model_settings()returnsimage_gen_providerandimage_gen_modelalongside existing fields.update_model_settings()accepts and saves both, with validation: it rejects anyimage_gen_providervalue not present in the registry before touching settings.json.app/ui_layer/adapters/browser_adapter.pyHandles the WebSocket
model_settings_updatemessage from the frontend. Added extraction ofimageGenProvider/imageGenModelfrom the message payload, saving them viaupdate_model_settings(), then callingagent.reinitialize_image_gen()so the running agent immediately switches to the new provider without a restart.app/ui_layer/browser/frontend/src/store/slices/modelSettingsSlice.tsRedux slice managing model settings state. Added:
imageGenProvider: stringandcurrentImageGenModel: stringstate fieldssetImageGenProviderandsetCurrentImageGenModelactionsProviderInfointerface extended withhas_image_genandimage_gen_modelmodel_settings_getandmodel_settings_updatesocket message handlers updated to populate the new fieldsapp/ui_layer/browser/frontend/src/store/selectors/modelSettings.tsAdds
selectImageGenProviderandselectCurrentImageGenModelselectors for the two new state fields. Same pattern as the existing LLM/VLM selectors.app/ui_layer/browser/frontend/src/pages/Settings/ModelSettings.tsxAdds an "Image Generation" section to the Settings page (after VLM, before Slow Mode). Contains:
has_image_gen: truerequires_api_key) with configured/required badgemodel_settings_updatesocket messageThe section is self-contained with its own local state (
newImageGenProvider,newImageGenApiKey,newImageGenModel,imageGenHasChanges,isImageGenSaving) and save handler, matching the existing LLM provider section's structure.