cli flags reference

last updated: 2026-03-03

generation client: `scripts/gen.py` (recommended)

gen.py is a thin CLI client that talks to the running server via HTTP. It requires the server to be running (uv run web/server.py --config config.toml).

# Check server status
uv run scripts/gen.py status

# FLUX.2 image generation
uv run scripts/gen.py flux2 --prompt "a cat sleeping in sunlight" --seed 42

# Z-Image generation
uv run scripts/gen.py zimage --prompt "a mountain" --width 512 --height 512

# LTX-2 video (always streaming)
uv run scripts/gen.py ltx2 --prompt "ocean waves" --num-frames 33 --seed 42

# Qwen-Image T2I
uv run scripts/gen.py qwen --prompt "a bird" --seed 42

# Custom server URL
uv run scripts/gen.py --server http://localhost:9000 flux2 --prompt "test"

gen.py global flags

Flag	Description
`--server`	Server base URL (default: `http://127.0.0.1:7860`)
`--output`	Output directory for saved files (default: `outputs/gen/`)
`--timeout`	Request timeout in seconds (default: 300)
`--no-save`	Print metadata only, don't save file
`--json`	Output raw JSON response instead of saving file

gen.py flux2 subcommand

Flag	Description
`--prompt`	Prompt text (required)
`--width`	Image width in pixels
`--height`	Image height in pixels
`--num-steps`	Number of inference steps
`--seed`	Random seed
`--guidance`	Guidance scale
`--model-name`	Model variant name (e.g., klein-4b)
`--upsample-prompt` / `--no-upsample-prompt`	Enable/disable prompt upsampling
`--loras`	LoRA specs (path:scale), space-separated
`--block-offload`	Enable block-level offload
`--max-text-length`	Maximum text sequence length
`--reference-images`	Input image path(s) for image editing
`--stream`	Use streaming SSE endpoint

gen.py zimage subcommand

Flag	Description
`--prompt`	Prompt text (required)
`--width`	Image width in pixels
`--height`	Image height in pixels
`--steps`	Number of inference steps
`--seed`	Random seed
`--guidance-scale`	CFG scale
`--shift`	Scheduler shift/mu
`--hidden-layer`	Encoder hidden layer to extract from
`--template`	Template name
`--negative-prompt`	Negative prompt
`--loras`	LoRA specs (path:scale), space-separated
`--stream`	Use streaming SSE endpoint

gen.py ltx2 subcommand

Flag	Description
`--prompt`	Prompt text (required)
`--width`	Video width in pixels
`--height`	Video height in pixels
`--num-frames`	Number of frames (must be 8n+1)
`--fps`	Frames per second
`--seed`	Random seed
`--guidance-scale`	Guidance scale
`--use-two-stage` / `--no-two-stage`	Enable/disable two-stage generation
`--stage1-steps`	Stage 1 inference steps
`--stage2-steps`	Stage 2 inference steps
`--negative-prompt`	Negative prompt
`--stg-scale`	Spatio-Temporal Guidance scale
`--loras`	LoRA specs, space-separated
`--lora-path`	Single LoRA path
`--lora-scale`	Single LoRA scale
`--distilled-lora-path`	Distilled LoRA path
`--distilled-lora-scale`	Distilled LoRA scale
`--enhance-prompt`	Enable prompt enhancement
`--ge-gamma`	GE gamma value
`--fbcache-threshold`	FBCache block-skip threshold
`--use-distilled-sigmas`	Use distilled sigma schedule
`--enable-audio`	Enable audio generation

gen.py qwen subcommand

Flag	Description
`--prompt`	Prompt text (required)
`--width`	Image width in pixels
`--height`	Image height in pixels
`--steps`	Number of inference steps
`--cfg-scale`	CFG scale
`--seed`	Random seed
`--negative-prompt`	Negative prompt
`--max-sequence-length`	Maximum sequence length

server CLI: `web/server.py`

Server-side flags for web/server.py. These configure model loading, device placement, optimization, and pipeline-specific parameters. All of these can also be set in config.toml.

uv run web/server.py --config config.toml

deprecated: `scripts/generate.py`

Deprecated. Use scripts/gen.py instead. generate.py is a standalone generation script that loads models directly. It remains functional but is no longer the recommended workflow. gen.py talks to the running server and benefits from persistent model loading, lazy-load, and the full API surface.

CLI flags for scripts/generate.py and web/server.py. Both use the same base parser from cli.py. Use --model-type to select which pipeline to run.

model and config

Flag	Description
`--model-type`	Model type: zimage (default), flux2, ltx2, qwenimage-layered, qwenimage-t2i, qwenimage-edit
`--model-path`	Path to Z-Image model
`--qwen-image-model-path`	Path to Qwen-Image-Layered model
`--config`	Path to TOML config file
`--profile`	Config profile to use (default: "default")
`--templates-dir`	Path to templates directory

# Deprecated -- config and profile are server-side flags:
# uv run web/server.py --config config.toml --profile rtx4090
# Then use gen.py: uv run scripts/gen.py zimage --prompt "A cat"
uv run scripts/generate.py --config config.toml --profile rtx4090 "A cat"

device placement

Flag	Description
`--text-encoder-device`	cpu/cuda/mps/auto
`--dit-device`	cpu/cuda/mps/auto
`--vae-device`	cpu/cuda/mps/auto

api backend

Flag	Description
`--api-url`	URL for heylookitsanllm API
`--api-model`	Model ID for API backend
`--local-encoder`	Force local encoder even when --api-url is set (for A/B testing)

optimization

Flag	Description
`--cpu-offload`	Enable CPU offload for transformer
`--flash-attn`	Enable Flash Attention
`--compile`	Compile transformer with torch.compile
`--debug`	Enable debug logging (embedding stats, token IDs)

pytorch native

Flag	Description
`--attention-backend`	auto/flash_attn_2/flash_attn_3/sage/xformers/sdpa
`--use-custom-scheduler`	Use pure PyTorch FlowMatchScheduler
`--tiled-vae`	Enable tiled VAE decode for 2K+ images
`--tile-size`	Tile size in pixels (default: 512)
`--tile-overlap`	Overlap between tiles (default: 64)
`--embedding-cache`	Enable embedding cache for repeated prompts
`--cache-size`	Max cached embeddings (default: 100)
`--long-prompt-mode`	How to handle prompts >1504 tokens: truncate/interpolate/pool/attention_pool
`--hidden-layer`	Which hidden layer to extract embeddings from (default: -2, penultimate)

dype (high-resolution)

Flag	Description
`--dype`	Enable DyPE position extrapolation for high-res generation
`--dype-method`	Method: vision_yarn/yarn/ntk (default: vision_yarn)
`--dype-scale`	DyPE magnitude lambda_s (default: 2.0)
`--dype-exponent`	DyPE decay speed lambda_t (default: 2.0 = quadratic)
`--dype-start-sigma`	When to start DyPE decay (0-1, 1.0 = from start)
`--dype-base-shift`	Noise schedule shift at base resolution (default: 0.5)
`--dype-max-shift`	Noise schedule shift at max resolution (default: 1.15)
`--dype-base-resolution`	Training resolution (Z-Image: 1024, Qwen: 1328)
`--dype-anisotropic`	Use per-axis scaling for extreme aspect ratios (16:9, 9:16)
`--dype-multipass`	Generation mode: single/twopass/threepass (default: single)
`--dype-pass2-strength`	img2img strength for second pass (default: 0.5)
`--dype-pass3-strength`	img2img strength for third pass (default: 0.4)
`--dype-frequency-modulation`	Enable timestep-based RoPE frequency scaling (experimental)

generation

Flag	Description
`--width`	Image width in pixels (default: 1024, must be divisible by 16)
`--height`	Image height in pixels (default: 1024, must be divisible by 16)
`--steps`	Inference steps (default: 9)
`--guidance-scale`	CFG scale (default: 0.0)
`--cfg-normalization`	CFG norm clamping (0.0 = disabled, 1.0-2.0 typical). Prevents over-amplification.
`--cfg-truncation`	CFG truncation threshold (1.0 = never, 0.5-0.8 typical). Stops CFG at this progress.
`--shift`	Scheduler shift/mu (default: 3.0)
`--dynamic-shift`	Calculate shift based on resolution (overrides --shift). Uses linear interpolation: base_shift=0.5 at 512x512, max_shift=1.15 at 2048x2048.
`--d-noise`	Sigma schedule scaling factor. <1.0 = sharper/more detail (try 0.95-0.98), >1.0 = softer/deeper colors (try 1.02-1.05). Default: 1.0 (no scaling).
`--seed`	Random seed
`--img2img`	Input image path for img2img generation
`--strength`	img2img strength: 0.0 (no change) to 1.0 (full regeneration) (default: 0.7)

prompt control

Flag	Description
`--system-prompt`	System message
`--thinking-content`	Content inside `<think>...</think>` (triggers think block)
`--assistant-content`	Content after `</think>`
`--enable-thinking`	Add `<think></think>` structure to prompt
`--template`	Template name to use

lora

Flag	Description
`--lora`	LoRA path with optional scale (path:scale). Repeatable.

skip layer guidance (slg)

Flag	Description
`--slg-scale`	SLG scale (default: 0.0, recommended: 2.8)
`--slg-layers`	Layers to skip (default: 15,16,17,18,19)
`--slg-start`	Start SLG at this fraction of steps (default: 0.01)
`--slg-stop`	Stop SLG at this fraction of steps (default: 0.20)

rewriter

Flag	Description
`--rewriter-use-api`	Use API backend for prompt rewriting
`--rewriter-api-url`	API URL for rewriter (defaults to --api-url)
`--rewriter-api-model`	Model ID for rewriter API (default: Qwen3-4B)
`--rewriter-vl-api-model`	Model ID for VL rewriting via API (e.g., qwen2.5-vl-72b-mlx)
`--rewriter-temperature`	Sampling temperature (default: 0.6)
`--rewriter-top-p`	Nucleus sampling threshold (default: 0.95)
`--rewriter-min-p`	Minimum probability threshold (default: 0.0, disabled)
`--rewriter-max-tokens`	Maximum tokens to generate (default: 512)
`--rewriter-timeout`	API request timeout in seconds (default: 120.0, VL models may need longer)
`--rewriter-no-vl`	Disable VL model selection in rewriter UI
`--rewriter-preload-vl`	Preload Qwen3-VL at startup for rewriting

vision conditioning (qwen3-vl)

Flag	Description
`--vl-model-path`	Path to Qwen3-VL model (enables vision conditioning)
`--vl-device`	Device for VL model: cpu/cuda/auto (cpu recommended to save VRAM)
`--vl-alpha`	VL influence ratio (0.0=pure text, 1.0=pure VL, default: 0.3)
`--vl-hidden-layer`	Hidden layer to extract from VL model (default: -2)
`--vl-no-auto-unload`	Keep VL model loaded after extraction (uses more VRAM)
`--vl-blend-mode`	Blend strategy: interpolate/adain_per_dim/adain/linear/style_only/graduated/attention_weighted

FLUX.2 Klein

Use --model-type flux2 to select.

Flag	Description
`--flux2-model-name`	Model variant name (e.g., klein-4b, klein-9b)
`--flux2-model-path`	Path to FLUX.2 model directory
`--flux2-encoder-path`	Path to Qwen3 encoder
`--flux2-vae-path`	Path to VAE decoder
`--flux2-num-steps`	Number of inference steps
`--flux2-guidance`	Guidance scale
`--flux2-seed`	Random seed
`--flux2-output`	Output file path
`--flux2-input-image`	Input image path(s) for image editing (multiple allowed)
`--flux2-offload`	Enable CPU offload
`--flux2-no-offload`	Disable CPU offload
`--flux2-block-offload`	Enable block-level offload (incompatible with torch.compile and torchao)

# Deprecated -- use gen.py instead:
# uv run scripts/gen.py flux2 --prompt "A cat sitting on a windowsill" --seed 42
uv run scripts/generate.py --model-type flux2 \
  --flux2-model-name klein-4b \
  --flux2-model-path /path/to/flux2-klein \
  "A cat sitting on a windowsill"

LTX-2

Use --model-type ltx2 to select.

Flag	Description
`--ltx2-model-path`	Path to LTX-2 model directory
`--ltx2-encoder-model-id`	Gemma3 encoder model ID or path
`--ltx2-num-frames`	Number of frames to generate (must be 8n+1)
`--ltx2-fps`	Frames per second for output video
`--ltx2-guidance-scale`	Guidance scale
`--ltx2-steps`	Number of inference steps
`--ltx2-lora-path`	Path to LoRA weights
`--ltx2-lora-scale`	LoRA scale factor
`--ltx2-audio`	Enable audio generation
`--ltx2-output`	Output file path
`--ltx2-save-embeddings`	Save precomputed embeddings to file
`--ltx2-load-embeddings`	Load precomputed embeddings from file

LTX-2 optimization flags

Flag	Description
`--ltx2-text-encoder-device`	Device for Gemma3 encoder: cpu/cuda
`--ltx2-transformer-device`	Device for DiT: cpu/cuda
`--ltx2-vae-device`	Device for VAE: cpu/cuda
`--ltx2-quantize`	Transformer quantization: none/fp8
`--ltx2-skip-cleanup`	Skip model cleanup after generation (keep in memory)
`--ltx2-gemma-variant`	Gemma3 encoder variant: bf16/8bit/q4-qat

# Deprecated -- use gen.py instead:
# uv run scripts/gen.py ltx2 --prompt "A golden retriever playing fetch in a park" --num-frames 41 --seed 42
uv run scripts/generate.py --model-type ltx2 \
  --ltx2-model-path /path/to/ltx-video-2 \
  --ltx2-num-frames 41 --ltx2-fps 24 \
  --ltx2-gemma-variant q4-qat \
  "A golden retriever playing fetch in a park"

qwen-image (all variants)

Unified configuration for all Qwen-Image variants. Use --model-type to select:

qwenimage-t2i - Text-to-image generation (60-layer DiT)
qwenimage-edit - Instruction-based image editing (8B DiT)
qwenimage-layered - Multi-layer decomposition (deprioritized)

Flag	Description
`--qwen-image-model-path`	Path to any Qwen-Image model
`--qwen-image-cpu-offload`	Enable CPU offload (required for RTX 4090)
`--qwen-image-layers`	Number of decomposition layers (layered variant only, default: 4)
`--qwen-image-steps`	Diffusion steps (variant default: t2i=40, edit=25, layered=50)
`--qwen-image-cfg-scale`	CFG scale (default: 4.0)
`--qwen-image-resolution`	Resolution (variant default: t2i=1024, edit=640, layered=640)
`--qwen-image-quantize-text-encoder`	Quantization for text encoder (Qwen2.5-VL-7B): none/4bit/8bit
`--qwen-image-quantize-transformer`	Quantization for DiT (variant default: t2i=fp8, edit/layered=diffsynth-fp8)

variant-aware defaults

When not specified, parameters use variant-specific defaults:

Variant	Steps	Resolution	Transformer Quantization
`qwenimage-t2i`	40	1024	fp8
`qwenimage-edit`	25	640	diffsynth-fp8
`qwenimage-layered`	50	640	diffsynth-fp8

quantization recommendations (rtx 4090)

For RTX 4090 (24GB VRAM):

Text encoder: Use none with CPU offload (best quality, 0 GPU VRAM)
Transformer: Use variant default (fp8 for T2I, diffsynth-fp8 for edit/layered)

# Deprecated -- use gen.py instead:
# uv run scripts/gen.py qwen --prompt "A mountain landscape" --seed 42

# T2I example (uses variant defaults)
uv run scripts/generate.py --model-type qwenimage-t2i \
  --qwen-image-model-path /path/to/Qwen-Image-2512 \
  "A mountain landscape"

# Edit example
uv run scripts/generate.py --model-type qwenimage-edit \
  --qwen-image-model-path /path/to/Qwen-Image-Edit-2511 \
  --img2img input.png "Change the sky to sunset"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cli flags reference

generation client: `scripts/gen.py` (recommended)

gen.py global flags

gen.py flux2 subcommand

gen.py zimage subcommand

gen.py ltx2 subcommand

gen.py qwen subcommand

server CLI: `web/server.py`

deprecated: `scripts/generate.py`

model and config

device placement

api backend

optimization

pytorch native

dype (high-resolution)

generation

prompt control

lora

skip layer guidance (slg)

rewriter

vision conditioning (qwen3-vl)

FLUX.2 Klein

LTX-2

LTX-2 optimization flags

qwen-image (all variants)

variant-aware defaults

quantization recommendations (rtx 4090)

FilesExpand file tree

cli_flags.md

Latest commit

History

cli_flags.md

File metadata and controls

cli flags reference

generation client: scripts/gen.py (recommended)

gen.py global flags

gen.py flux2 subcommand

gen.py zimage subcommand

gen.py ltx2 subcommand

gen.py qwen subcommand

server CLI: web/server.py

deprecated: scripts/generate.py

model and config

device placement

api backend

optimization

pytorch native

dype (high-resolution)

generation

prompt control

lora

skip layer guidance (slg)

rewriter

vision conditioning (qwen3-vl)

FLUX.2 Klein

LTX-2

LTX-2 optimization flags

qwen-image (all variants)

variant-aware defaults

quantization recommendations (rtx 4090)

generation client: `scripts/gen.py` (recommended)

server CLI: `web/server.py`

deprecated: `scripts/generate.py`