You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
gen.py is a thin CLI client that talks to the running server via HTTP. It requires the server to be running (uv run web/server.py --config config.toml).
# Check server status
uv run scripts/gen.py status
# FLUX.2 image generation
uv run scripts/gen.py flux2 --prompt "a cat sleeping in sunlight" --seed 42
# Z-Image generation
uv run scripts/gen.py zimage --prompt "a mountain" --width 512 --height 512
# LTX-2 video (always streaming)
uv run scripts/gen.py ltx2 --prompt "ocean waves" --num-frames 33 --seed 42
# Qwen-Image T2I
uv run scripts/gen.py qwen --prompt "a bird" --seed 42
# Custom server URL
uv run scripts/gen.py --server http://localhost:9000 flux2 --prompt "test"
gen.py global flags
Flag
Description
--server
Server base URL (default: http://127.0.0.1:7860)
--output
Output directory for saved files (default: outputs/gen/)
--timeout
Request timeout in seconds (default: 300)
--no-save
Print metadata only, don't save file
--json
Output raw JSON response instead of saving file
gen.py flux2 subcommand
Flag
Description
--prompt
Prompt text (required)
--width
Image width in pixels
--height
Image height in pixels
--num-steps
Number of inference steps
--seed
Random seed
--guidance
Guidance scale
--model-name
Model variant name (e.g., klein-4b)
--upsample-prompt / --no-upsample-prompt
Enable/disable prompt upsampling
--loras
LoRA specs (path:scale), space-separated
--block-offload
Enable block-level offload
--max-text-length
Maximum text sequence length
--reference-images
Input image path(s) for image editing
--stream
Use streaming SSE endpoint
gen.py zimage subcommand
Flag
Description
--prompt
Prompt text (required)
--width
Image width in pixels
--height
Image height in pixels
--steps
Number of inference steps
--seed
Random seed
--guidance-scale
CFG scale
--shift
Scheduler shift/mu
--hidden-layer
Encoder hidden layer to extract from
--template
Template name
--negative-prompt
Negative prompt
--loras
LoRA specs (path:scale), space-separated
--stream
Use streaming SSE endpoint
gen.py ltx2 subcommand
Flag
Description
--prompt
Prompt text (required)
--width
Video width in pixels
--height
Video height in pixels
--num-frames
Number of frames (must be 8n+1)
--fps
Frames per second
--seed
Random seed
--guidance-scale
Guidance scale
--use-two-stage / --no-two-stage
Enable/disable two-stage generation
--stage1-steps
Stage 1 inference steps
--stage2-steps
Stage 2 inference steps
--negative-prompt
Negative prompt
--stg-scale
Spatio-Temporal Guidance scale
--loras
LoRA specs, space-separated
--lora-path
Single LoRA path
--lora-scale
Single LoRA scale
--distilled-lora-path
Distilled LoRA path
--distilled-lora-scale
Distilled LoRA scale
--enhance-prompt
Enable prompt enhancement
--ge-gamma
GE gamma value
--fbcache-threshold
FBCache block-skip threshold
--use-distilled-sigmas
Use distilled sigma schedule
--enable-audio
Enable audio generation
gen.py qwen subcommand
Flag
Description
--prompt
Prompt text (required)
--width
Image width in pixels
--height
Image height in pixels
--steps
Number of inference steps
--cfg-scale
CFG scale
--seed
Random seed
--negative-prompt
Negative prompt
--max-sequence-length
Maximum sequence length
server CLI: web/server.py
Server-side flags for web/server.py. These configure model loading, device placement, optimization, and pipeline-specific parameters. All of these can also be set in config.toml.
uv run web/server.py --config config.toml
deprecated: scripts/generate.py
Deprecated. Use scripts/gen.py instead. generate.py is a standalone generation script that loads models directly. It remains functional but is no longer the recommended workflow. gen.py talks to the running server and benefits from persistent model loading, lazy-load, and the full API surface.
CLI flags for scripts/generate.py and web/server.py. Both use the same base parser from cli.py. Use --model-type to select which pipeline to run.
model and config
Flag
Description
--model-type
Model type: zimage (default), flux2, ltx2, qwenimage-layered, qwenimage-t2i, qwenimage-edit
--model-path
Path to Z-Image model
--qwen-image-model-path
Path to Qwen-Image-Layered model
--config
Path to TOML config file
--profile
Config profile to use (default: "default")
--templates-dir
Path to templates directory
# Deprecated -- config and profile are server-side flags:# uv run web/server.py --config config.toml --profile rtx4090# Then use gen.py: uv run scripts/gen.py zimage --prompt "A cat"
uv run scripts/generate.py --config config.toml --profile rtx4090 "A cat"
device placement
Flag
Description
--text-encoder-device
cpu/cuda/mps/auto
--dit-device
cpu/cuda/mps/auto
--vae-device
cpu/cuda/mps/auto
api backend
Flag
Description
--api-url
URL for heylookitsanllm API
--api-model
Model ID for API backend
--local-encoder
Force local encoder even when --api-url is set (for A/B testing)
optimization
Flag
Description
--cpu-offload
Enable CPU offload for transformer
--flash-attn
Enable Flash Attention
--compile
Compile transformer with torch.compile
--debug
Enable debug logging (embedding stats, token IDs)
pytorch native
Flag
Description
--attention-backend
auto/flash_attn_2/flash_attn_3/sage/xformers/sdpa
--use-custom-scheduler
Use pure PyTorch FlowMatchScheduler
--tiled-vae
Enable tiled VAE decode for 2K+ images
--tile-size
Tile size in pixels (default: 512)
--tile-overlap
Overlap between tiles (default: 64)
--embedding-cache
Enable embedding cache for repeated prompts
--cache-size
Max cached embeddings (default: 100)
--long-prompt-mode
How to handle prompts >1504 tokens: truncate/interpolate/pool/attention_pool
--hidden-layer
Which hidden layer to extract embeddings from (default: -2, penultimate)
dype (high-resolution)
Flag
Description
--dype
Enable DyPE position extrapolation for high-res generation
Input image path(s) for image editing (multiple allowed)
--flux2-offload
Enable CPU offload
--flux2-no-offload
Disable CPU offload
--flux2-block-offload
Enable block-level offload (incompatible with torch.compile and torchao)
# Deprecated -- use gen.py instead:# uv run scripts/gen.py flux2 --prompt "A cat sitting on a windowsill" --seed 42
uv run scripts/generate.py --model-type flux2 \
--flux2-model-name klein-4b \
--flux2-model-path /path/to/flux2-klein \
"A cat sitting on a windowsill"
LTX-2
Use --model-type ltx2 to select.
Flag
Description
--ltx2-model-path
Path to LTX-2 model directory
--ltx2-encoder-model-id
Gemma3 encoder model ID or path
--ltx2-num-frames
Number of frames to generate (must be 8n+1)
--ltx2-fps
Frames per second for output video
--ltx2-guidance-scale
Guidance scale
--ltx2-steps
Number of inference steps
--ltx2-lora-path
Path to LoRA weights
--ltx2-lora-scale
LoRA scale factor
--ltx2-audio
Enable audio generation
--ltx2-output
Output file path
--ltx2-save-embeddings
Save precomputed embeddings to file
--ltx2-load-embeddings
Load precomputed embeddings from file
LTX-2 optimization flags
Flag
Description
--ltx2-text-encoder-device
Device for Gemma3 encoder: cpu/cuda
--ltx2-transformer-device
Device for DiT: cpu/cuda
--ltx2-vae-device
Device for VAE: cpu/cuda
--ltx2-quantize
Transformer quantization: none/fp8
--ltx2-skip-cleanup
Skip model cleanup after generation (keep in memory)
--ltx2-gemma-variant
Gemma3 encoder variant: bf16/8bit/q4-qat
# Deprecated -- use gen.py instead:# uv run scripts/gen.py ltx2 --prompt "A golden retriever playing fetch in a park" --num-frames 41 --seed 42
uv run scripts/generate.py --model-type ltx2 \
--ltx2-model-path /path/to/ltx-video-2 \
--ltx2-num-frames 41 --ltx2-fps 24 \
--ltx2-gemma-variant q4-qat \
"A golden retriever playing fetch in a park"
qwen-image (all variants)
Unified configuration for all Qwen-Image variants. Use --model-type to select: