Skip to content

Feat/render chain v1#261

Merged
jdilley merged 33 commits intomainfrom
feat/render-chain-v1
Apr 22, 2026
Merged

Feat/render chain v1#261
jdilley merged 33 commits intomainfrom
feat/render-chain-v1

Conversation

@jdilley
Copy link
Copy Markdown
Collaborator

@jdilley jdilley commented Apr 21, 2026

No description provided.

jdilley and others added 11 commits April 20, 2026 16:14
At `strength < 1.0` (the `--strength 0.75` LTX-2 i2v default),
`run_real_distilled_stage` was cloning `video_latents` *after*
`apply_stage_video_conditioning` had already soft-blended the first
latent frame positions with noise: the "clean reference" tensor that
the per-step denoise-mask blend pulls conditioned tokens toward became
`noise*(1-s) + source*s` at replacement positions. Used as the clean
target, that pre-blended tensor pinned the first latent to a noisy
ghost of the image at every step, so i2v runs produced a first frame
that was 25 % noise + 75 % image instead of the source image.

Introduce a `clean_latents_for_conditioning` helper that re-applies
the replacement-based conditioning with `strength = 1.0` on top of the
post-apply tensor, overwriting replacement positions with pure source
tokens while appended keyframe tokens and pure-noise regions pass
through unchanged. `strength = 1.0` and pure-T2V paths remain
bit-for-bit identical. Two regression tests cover the soft-blended
case and the no-replacements passthrough.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce `mold_core::chain` with the `ChainStage` / `ChainRequest` /
`ChainResponse` types that will carry server-side chained LTX-2 video
generation. The wire format is stages-based from day one so the v2
movie-maker UI can author multi-prompt / multi-keyframe chains without
breaking callers: v1 only exposes a single-prompt auto-expand form
(`prompt` + `total_frames` + `clip_frames`), and `normalise()`
collapses it into a canonical `Vec<ChainStage>` before any engine work
runs.

Normalisation matches the stitch math that Phase 1.4 of the plan will
use:
  delivered_frames = clip_frames + (N - 1) * (clip_frames - motion_tail)

so auto-expand picks `N` large enough to cover `total_frames` with
tail-overlap trimming in mind; the over-production is discarded from
the final clip's tail per the 2026-04-20 sign-off. Guardrails cap
chains at 16 stages (≈1552 frames at 97-frame clips, ~64 s at 24 fps),
require `8k+1` frame counts for LTX-2, and forbid
`motion_tail_frames >= clip_frames` so every continuation emits at
least one new frame.

Also lifts the existing `base64_opt` serde helper in `types.rs` from
private to `pub(crate)` so chain types can share the single source of
truth for base64 wire encoding.

Unit tests cover: split-into-stages, first-stage-image preservation,
empty-request rejection, non-8k+1 rejection, canonical-form
passthrough, single-stage short chains, >16-stage guardrails,
motion-tail >= clip rejection, missing auto-expand fields, and a
property test confirming the auto-expand stage count delivers the
requested total frames under every representative (total, clip, tail)
combo from the design.

tasks/render-chain-v1-plan.md adds the signed-off decisions block at
the top so the rationale travels with the code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `MoldClient::generate_chain` (POST /api/generate/chain, non-
streaming JSON request/response) and `MoldClient::generate_chain_stream`
(POST /api/generate/chain/stream, SSE) mirroring the existing
`generate` / `generate_stream` shape. The server routes land in
Phase 2; this commit ships the client surface so Phase 1's fake-engine
tests and Phase 2's route wiring have a settled wire contract to
implement against.

Chain-specific wire types (all new, under `mold_core::chain`):

- `ChainProgressEvent` — tagged enum streamed under `event: progress`.
  Variants: `chain_start { stage_count, estimated_total_frames }`,
  `stage_start { stage_idx }`, `denoise_step { stage_idx, step, total }`,
  `stage_done { stage_idx, frames_emitted }`,
  `stitching { total_frames }`. snake_case tagged JSON matches the
  existing `SseProgressEvent` style.
- `SseChainCompleteEvent` — kept as a sibling to
  `crate::types::SseCompleteEvent` rather than an extension, so chain
  completion shape can evolve independently (stage_count, stitched-
  video payload, optional thumb/GIF, audio metadata, elapsed time).

Error translation matches the single-clip methods:

| Status                 | generate_chain                                  | generate_chain_stream                           |
|------------------------|-------------------------------------------------|-------------------------------------------------|
| 200                    | parse ChainResponse JSON                        | parse SSE until `complete` event                |
| 404, empty body        | hard error "chain endpoint not found"           | `Ok(None)` — caller may fall back               |
| 404, non-empty body    | `MoldError::ModelNotFound`                      | `MoldError::ModelNotFound`                      |
| 422                    | `MoldError::Validation`                         | `MoldError::Validation`                         |
| 4xx/5xx else           | generic anyhow                                   | generic anyhow                                   |

The non-streaming empty-404 behaviour deliberately differs from SSE:
streaming clients can fall back to non-streaming, but non-streaming
callers have nowhere to go and should fail loudly.

Integration coverage:
- `crates/mold-core/tests/chain_client.rs` (wiremock): endpoint/body
  shape assertion on non-streaming; 422 → Validation; 404-with-body →
  ModelNotFound; non-streaming empty 404 → hard error; SSE empty 404 →
  Ok(None); SSE progress + complete roundtrip reconstructs
  `ChainResponse` with thumb + gpu.
- Pure serde roundtrip test for every `ChainProgressEvent` variant
  asserting snake_case tag format.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce the carryover primitive that render-chain stages hand to each
other. `ChainTail { frames, latents, last_rgb_frame }` bundles the
final VAE latents of a stage's motion tail so the next stage can patch
those tokens straight into its conditioning without a VAE decode → RGB
→ VAE encode round-trip. No engine wiring yet — the orchestrator and
the `generate_with_carryover` entry point land in sibling commits.

Helpers in the new `ltx2::chain` module:

- `tail_latent_frame_count(pixel_frames: u32) -> usize` — exposes the
  LTX-2 VAE's 8× causal-first-frame temporal ratio as the formula
  `((n - 1) / 8) + 1`. Matches `VideoLatentShape::from_pixel_shape`.
  Panics on `0`; callers must validate upstream.

- `extract_tail_latents(final_latents: &Tensor, pixel_frames: u32) ->
  Result<Tensor>` — narrows the time axis of a rank-5
  `[B, C, T, H, W]` latents tensor down to the last K latent frames
  corresponding to the requested pixel-frame tail. Errors (not panics)
  on rank mismatch or oversize tail request so orchestrator bugs
  surface as operational errors, not process aborts.

Unit tests cover: the VAE formula across representative tail sizes
(4→1, 9→2, 16→2, 17→3, 97→13), rejection of a zero pixel-frame
tail, correct narrowing on a synthetic [1, 2, 3, 1, 1] tensor with
sentinel values proving the last latent frame is returned across all
channels, narrowing on a larger rank-5 tensor, rank-4 rejection, and
oversize-tail rejection. All tests are weight-free and run under
`cargo test -p mold-ai-inference --lib`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`StagedConditioning` now carries both disk-backed images (existing
single-clip path) and in-memory latent blocks (new, empty for every
non-chain caller). The render-chain orchestrator will populate the new
`latents: Vec<StagedLatent>` field with a prior stage's motion-tail
latents so the receiving stage can patchify those tokens straight into
its `StageVideoConditioning::replacements` without the VAE decode → RGB
→ VAE encode round-trip — that's the point of latent carryover.

Changes:

- `StagedLatent { latents, frame, strength }` in
  `ltx2::conditioning` — mirrors `StagedImage`'s semantics but with a
  pre-encoded `candle_core::Tensor` instead of a disk path. `frame = 0`
  routes tokens through `replacements` (chain v1 motion tail);
  non-zero `frame` builds a `VideoTokenAppendCondition` so the movie-
  maker in v2 can thread latents into arbitrary positions.

- `StagedConditioning` drops `PartialEq` since `Tensor` doesn't
  implement structural equality. Grepped for comparison usages — none.
  Existing callers of `stage_conditioning()` get `latents: Vec::new()`.

- `maybe_load_stage_video_conditioning` in `runtime.rs`:
  - Early-return gate now also considers `plan.conditioning.latents`.
  - VAE is loaded conditionally: only when images or reference video
    need encoding. Pure-latent chain handoffs skip VAE load entirely.
  - New loop iterates staged latents, patchifies each block, routes
    frame-0 tokens to `replacements` (keyframe pipelines aside) and
    other frames to `appended` — symmetrical with the image path.

Tests (weight-free):

- `stage_conditioning_leaves_latents_empty_for_non_chain_callers` —
  pins the back-compat invariant: every non-chain generate path
  continues to receive an empty latents vec.
- `staged_latent_patchifies_to_same_token_shape_as_image_at_single_latent_frame`
  — verifies a `[1, 128, 1, 22, 38]` chain-tail latent block patchifies
  to `[1, 836, 128]` tokens, the same shape the image-conditioning
  path produces after VAE encode + patchify for the equivalent latent
  geometry.

Chain orchestrator + `Ltx2Engine::generate_with_carryover` land in the
sibling Phase 1c commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `Ltx2ChainOrchestrator<R: ChainStageRenderer>` that drives the
per-stage render loop for chained video generation: builds each
stage's `GenerateRequest`, threads the prior stage's `ChainTail`
through the renderer, drops the leading motion-tail frames on every
continuation, accumulates frames, and returns a `ChainRunOutput`.

The `ChainStageRenderer` trait is the seam between the orchestrator
(pure control flow) and the engine (tensor work). The LTX-2 engine
implementation lands in Phase 1d — this commit ships the orchestrator
fully tested against a fake renderer so the engine plumbing can be
reviewed in isolation.

Behaviour nailed down (from the 2026-04-20 sign-off):

- **Per-stage seeds**: `base_seed ^ ((stage_idx as u64) << 32)`. A
  stage's `seed_offset` overrides the default when set — reserved for
  the v2 movie-maker's "regen just this stage" affordance.

- **Motion-tail trim**: stage 0 emits all its frames; continuations
  drop the leading `req.motion_tail_frames` pixel frames because those
  duplicate the previous clip's tail that was threaded back as latent
  conditioning. `motion_tail_frames = 0` is a legitimate configuration
  (simple concat).

- **Fail closed**: a mid-chain renderer error bubbles up immediately.
  All frames accumulated so far are discarded — no partial stitch is
  ever written to the gallery. Partial resume is a v2 feature.

- **No audio or target-total-frame trim in v1**: the orchestrator
  delivers whatever frame count the stages produce (with tail drops
  applied). Target-total trimming is the caller's responsibility
  (server / CLI). Audio-video chains are out of scope for v1.

Progress events forwarded through `Option<&mut dyn FnMut(ChainProgressEvent)>`:
`ChainStart` → `StageStart` → `DenoiseStep` (wrapping the renderer's
`StageProgressEvent`s with `stage_idx`) → `StageDone` → (next stage)
→ `Stitching`. Chain-level subscribers can render a stacked
overall+per-stage progress bar without coordinating with the engine.

Per-stage `GenerateRequest` is constructed to ensure only stage 0
carries the optional starting image — even if the caller forgot to
clear it on later stages, the orchestrator suppresses it because
continuations must condition on motion-tail latents only. `strength`
becomes `1.0` on continuations regardless of the chain default since
the tail carryover is always a hard replacement.

Tests (weight-free, injecting a `FakeRenderer`):

- `chain_runs_all_stages_and_drops_tail_prefix_from_continuations` —
  3×97-frame clips with 4-frame tail produce exactly 97 + 2×93 = 283
  accumulated frames.
- `chain_with_zero_tail_concats_full_clips_without_drop` — `tail=0`
  keeps every frame on continuations.
- `chain_empty_stages_errors_without_calling_renderer` — zero-stage
  requests fail before touching the renderer.
- `chain_fails_closed_mid_chain_discarding_accumulated_frames` —
  simulated stage-1 failure bubbles up; stage 2 never runs.
- `chain_derives_per_stage_seed_from_base_seed` — three stages from
  base seed 42 land on 42, 42^(1<<32), 42^(2<<32).
- `chain_only_stage0_carries_source_image` — a source image set on
  stages[1] is suppressed, so continuations can't accidentally
  condition on a still image instead of the motion tail.
- `chain_forwards_engine_events_with_stage_idx_wrapping` — checks the
  full expected event order for a 2-stage chain with per-stage
  progress emission.
- `chain_rejects_motion_tail_ge_stage_frames_before_running` —
  up-front validation catches `motion_tail >= frames` so the renderer
  is never invoked with a degenerate configuration.
- `chain_respects_seed_offset_override_when_stage_provides_one` —
  pins `ChainStage::seed_offset` semantics for the v2 movie-maker
  hook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Capture the state of the branch (6 commits on local main, not pushed),
the five signed-off design decisions, the Phase 1d → 2 → 3 → 4 remaining
work with specific file:line surgery points, and a ready-to-paste prompt
for a fresh Claude Code session. Gotchas documented: stale
`test = false` claim in CLAUDE.md, pre-existing clippy warnings unrelated
to this branch, VAE 8× causal temporal ratio already encoded by
`extract_tail_latents`, and the existing-parameter-reuse opportunity on
`run_real_distilled_stage` (no new params needed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…capture

Add a pre-VAE-decode tail-capture slot on Ltx2RuntimeSession threaded
into render_real_distilled_av, implement Ltx2Engine::render_chain_stage
that injects a carryover ChainTail as a StagedLatent and extracts the
post-denoise tail, and wire it through impl ChainStageRenderer for
Ltx2Engine. Distilled-only in v1; other pipeline families error up-front.
Amend ChainStageRenderer::render_stage to carry motion_tail_pixel_frames
so the engine knows how many frames to narrow off the emitted latents.

Part of render-chain v1 (Phase 1d). Weight-free tests added; full
mold-inference and mold-core lib test suites stay green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add POST /api/generate/chain and POST /api/generate/chain/stream for
server-side chained LTX-2 video generation. Handler take/restores the
engine out of the model cache and runs the full chain in a
spawn_blocking so the sync orchestrator never blocks the async runtime.
Drives Ltx2ChainOrchestrator through the engine's ChainStageRenderer
view, trims accumulated frames to target total from the tail per
sign-off, encodes the stitched output (MP4 when the mp4 feature is on,
APNG fallback otherwise), and saves to the gallery with a synthesised
OutputMetadata.

Expose as_chain_renderer() on InferenceEngine (default None), overridden
by Ltx2Engine. Relax Ltx2ChainOrchestrator's renderer bound to ?Sized so
trait objects compose cleanly. Promote ltx_video::video_enc from
pub(crate) to pub so mold-server can reuse encode_mp4/encode_apng/
encode_gif/first_frame_png for chain stitching.

Weight-free route tests cover the happy path, the mid-chain failure
(502 Bad Gateway), the unsupported-model rejection (422), progress
event ordering through the SSE helper, and tail-trim behaviour.

Part of render-chain v1 (Phase 2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When --frames exceeds the model's per-clip cap (97 for LTX-2 distilled),
`mold run` now auto-builds a ChainRequest and routes to
POST /api/generate/chain/stream (server mode) or runs the
Ltx2ChainOrchestrator in-process (--local mode). New flags --clip-frames
and --motion-tail let users tune the per-clip length and the motion-tail
overlap (default 4 frames of latent carryover between clips).

Stacked progress bars render a parent "Chain" bar (total frames) and a
wiping per-stage bar (denoise step / total). Both server and local paths
share a single encode+save+preview epilogue so output formatting, stdout
piping, and gallery save are identical.

Models outside LTX-2 distilled families error fast when --frames exceeds
the single-clip cap rather than silently dropping frames or hitting the
server's chain route with a non-chainable model. A pure
`decide_chain_routing` helper captures the branching logic so auto-
routing is unit-testable without async or network.

Part of render-chain v1 (Phase 3).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Document render-chain v1 across the four surfaces: a new "Chained video
output" section in website/models/ltx2.md explaining the per-clip cap,
motion-tail carryover, and the --frames / --clip-frames / --motion-tail
CLI contract; request/response/SSE schemas for the new
POST /api/generate/chain[/stream] endpoints in website/api/index.md;
an Unreleased/Added bullet in CHANGELOG.md covering the feature
end-to-end; and the new flags + endpoint in .claude/skills/mold/SKILL.md
so OpenClaw and the other AI agents surface chained video correctly.

Part of render-chain v1 (Phase 4).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 64.26584% with 1156 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.45%. Comparing base (1410d08) to head (ecae6d8).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
crates/mold-cli/src/commands/chain.rs 27.37% 260 Missing ⚠️
crates/mold-server/src/routes.rs 11.44% 209 Missing ⚠️
crates/mold-server/src/routes_chain.rs 69.55% 144 Missing ⚠️
crates/mold-cli/src/commands/generate.rs 0.00% 89 Missing ⚠️
crates/mold-inference/src/ltx2/pipeline.rs 41.98% 76 Missing ⚠️
crates/mold-inference/src/ltx2/runtime.rs 69.60% 69 Missing ⚠️
crates/mold-server/src/lib.rs 0.00% 57 Missing ⚠️
crates/mold-server/src/queue.rs 72.06% 50 Missing ⚠️
crates/mold-server/src/resources.rs 16.00% 42 Missing ⚠️
crates/mold-core/src/client.rs 73.72% 31 Missing ⚠️
... and 15 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #261      +/-   ##
==========================================
+ Coverage   59.11%   59.45%   +0.34%     
==========================================
  Files         189      193       +4     
  Lines       89525    92582    +3057     
==========================================
+ Hits        52921    55046    +2125     
- Misses      36604    37536     +932     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

jdilley and others added 17 commits April 20, 2026 21:43
The `cuda`/`metal` feature-gated local orchestrator branch in
`run_chain_local` passed `&model_name` to `mold_inference::create_engine`,
which takes `model_name: String`. Phase 3's verification only ran
`cargo check --features preview,discord,expand,tui,webp,mp4` — the
feature-matrix omitted `cuda`/`metal`, so CI and the local-default check
both missed the mismatch. Caught at rebuild time on killswitch
(sm_86 / RTX 3090 dual-GPU build). `cargo check -p mold-ai
--features metal,expand` now clean locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`ClipWithTokenizer::encode_text_to_embedding` padded up to
`max_position_embeddings` (77) but never truncated down. Prompts that
tokenised to more than 77 CLIP tokens fed an `[1, N, 768]` tensor into
`ClipTextTransformer`, where the 77-slot position-embedding broadcast-add
blew up with `shape mismatch in broadcast_add, lhs: [1, N, 768], rhs: [1, 77, 768]`.
The pooled-output slice at `eos_position = tokens.len() - 1` was also
out-of-bounds on the same path.

Extract the token preparation into a pure `prepare_clip_tokens` helper
that truncates to `max_len` (copying the trailing EOS token into the
final slot so the pooled branch still reads an EOS-position hidden state)
and then pads up to `max_len`. Wire it into both CLIP-L and CLIP-G via
the shared `ClipWithTokenizer` path, so every `sd3*` model benefits.

Unit-tested weight-free with four cases: short prompt, exact-77,
132-token overlong (matches the observed failure shape), and an empty
tokenisation. All four pass; the 132-token test was red before the
fix and is green after.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LTX-2 and the upscaler hardcoded `Device::new_cuda(0)` and
`reclaim_gpu_memory(0)` in their engine bodies, ignoring the
`gpu_ordinal` they were dispatched with. On a multi-GPU host that
meant dispatching LTX-2 to GPU 1 still destroyed GPU 0's primary
CUDA context mid-denoise, which surfaced as a misleading
CUDA_ERROR_OUT_OF_MEMORY on the sibling job and then segfaulted
inside `cuEventDestroy_v2` when candle's Drop chain unwound.

- Thread `gpu_ordinal` through `Ltx2Engine` → `Ltx2RuntimeSession`
  and `UpscalerEngine` / `create_upscale_engine`; replace all four
  hardcoded-0 call sites.
- Add a thread-local GPU binding (`init_thread_gpu_ordinal`) set by
  each GPU worker thread; `create_device` and `reclaim_gpu_memory`
  `debug_assert` the caller's ordinal matches, so any future
  hardcoded-0 regression panics in debug builds instead of silently
  corrupting a sibling GPU's context.
- Update all 4 `create_upscale_engine` callers (CLI, TUI, two in
  server routes) to pass ordinal 0 explicitly. Server upscaler cache
  stays pinned to GPU 0 with a comment noting the per-worker cache
  migration path if multi-GPU upscale becomes interesting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two bulk-UX affordances to the web gallery SPA. Hide-mode toggle blurs
every tile behind a dark shroud with a per-tile "Reveal" for single peeks;
the global preference persists in localStorage, peeks don't. Select mode
enables click-to-toggle, shift-click range, and drag-marquee selection with
a floating action bar for Select all / Clear / Delete selected / Delete all.
Bulk deletes parallelize via Promise.allSettled and partial failures surface
a rollup. Select button is gated on capabilities.gallery.can_delete so
servers without MOLD_GALLERY_ALLOW_DELETE=1 don't expose dead UI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collapse multi-line signature CI rustfmt wanted on one line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
find_flux_reference_gguf hardcoded the candidate list to flux-dev:q{8,6,4}
(plus schnell for schnell targets), so a host with flux-krea:q8 on disk —
a dev-family QuantStack GGUF with the full embedding set including
guidance_in — still forced users to download a redundant ~12 GB
flux-dev:q8 reference before they could load ultrareal-v4:q8 or any other
city96-format fine-tune.

Probe flux-krea:q{8,6,4} after base flux-dev. The existing
gguf_has_guidance verification still gates acceptance, so nothing is
assumed about candidate completeness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…endpoint

The SPA previously POSTed every generate to /api/generate/stream regardless
of frame count. Requesting frames > 97 (the LTX-2 19B/22B distilled per-clip
cap) then OOMed at VAE decode after a full denoise run — three identical
failures on a 241-frame 512x512 img2v request before the missing routing
was traced.

Mirror the CLI's decide_chain_routing in a new pure helper
(web/src/lib/chainRouting.ts) so the Composer and the submit path share
the same decision. When the decision is `chain`, useGenerateStream
dispatches to /api/generate/chain/stream with an auto-expand
ChainRequest, folds ChainProgressEvent into the existing JobProgress
(so RunningJobCard renders "Denoising clip K/N · step X/Y" with no per-
event UI changes), and shape-shifts SseChainCompleteEvent into
SseCompleteEvent on completion. A `reject` decision hard-blocks submit
with an alert() and surfaces a red error pill in the Composer; a `chain`
decision surfaces a brand-tinted "Will render as N chained clips" pill
so users understand the expected latency.

Non-chainable families below the per-clip budget stay on the single-clip
path unchanged. LTX-2 distilled requests at-or-below 97 frames also stay
single.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n requests

Two bugs that surfaced once the web UI started auto-promoting long LTX-2
distilled video requests to the chain endpoint.

1) Stage 2 of a chain errored with "native LTX-2 prompt encoder is
   unavailable" because Ltx2RuntimeSession::prepare() consumes the encoder
   on first call (intentional VRAM-free pattern) and render_chain_stage
   restores the session between stages. Fix: cache the NativePromptEncoding
   output on the session keyed by the EncodedPromptPair + unconditional
   flag so same-prompt follow-ups skip the encoder entirely. A new
   can_reuse_for(&plan) helper lets Ltx2Engine detect when a persisted
   session carries a consumed encoder AND a different prompt arrived, in
   which case the engine drops it and builds a fresh session.

2) Concurrent chain requests raced with "engine '...' vanished from cache
   after ensure_model_ready" because routes_chain deliberately takes the
   engine out of model_cache for the full chain duration without any
   serialization across chain requests. Fix: add AppState.chain_lock held
   for the whole run_chain; single-clip requests still flow through the
   normal generation queue unchanged.

Test updates: runtime_session_prepare_consumes_prompt_encoder now
documents the same-prompt cache-hit semantic; a new
runtime_session_prepare_rejects_encoder_reuse_with_different_prompt
locks in the fresh-session-required branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…, video-length UX

Reworks the /generate surface per user feedback:

- Move output-format dropdown onto the Composer next to the icon buttons;
  auto-pick the first valid format for the selected model family and reset
  when switching between image/video families (png → mp4 on FLUX → LTX-2).
- Promote resource telemetry to a global collapsible bottom tray mounted
  in App.vue (defaults collapsed, state persisted, `r` to toggle). Adds
  GPU core-utilization (NVML) and CPU utilization (sysinfo, sampled from
  a persistent System threaded through the 1 Hz aggregator) alongside the
  existing VRAM/RAM bars.
- Persist the generate queue to localStorage so cards survive refresh;
  running jobs rehydrate as "Disconnected — may still be running on the
  server" with a per-card dismiss + "Clear finished" affordance.
- Poll /api/gallery every 10 s and /api/models every 15 s from the
  Generate page; bump model polling to 3 s while the settings modal is
  open so freshly-downloaded variants show up without a manual refresh.
- Collapse the device-placement panel by default (state persisted).
- Disable the prompt-expansion Preview button while a generation is
  running in the queue; expansion already defaulted off.
- Add a Video Length (s) field in the settings video row; editing any of
  Frames / Length / FPS recomputes the other two with the backend only
  consuming the 8n+1-clamped frame count.

Backend:

- GpuSnapshot.gpu_utilization: Option<u8> populated via NVML
  utilization_rates(); None on nvidia-smi fallback and on Metal.
- ResourceSnapshot.cpu: Option<CpuSnapshot { cores, usage_percent }>
  driven by a CpuSampler with a long-lived sysinfo::System, threaded
  through spawn_blocking. First tick has cpu = None (no delta yet).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lt to 9

The previous revision XORed `(idx as u64) << 32` into each stage's seed so
the initial noise tensor differed per clip. With the motion-tail pin now
grounded on a proper causal-first latent, per-stage noise diversity just
amplifies drift at the stitch point — same-seed noise stays frozen in the
pinned region and settles on a consistent motion profile in the free region.
Callers that want variation can still supply `stage.seed_offset` explicitly.

Also bump --motion-tail default from 4 → 9 pixel frames (two LTX-2 latent
frames under the VAE's 8× causal temporal compression: causal-first slot +
one continuation slot). Four only pinned the causal slot, which the decoder
reconstructs as a single pixel frame and leaves the inter-clip stitch
visibly jumpy. `DEFAULT_MOTION_TAIL` in web/src/lib/chainRouting.ts is
kept in sync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the repro, the failing code paths, and the already-landed fix in
the SD1.5/SDXL shared encoder so a future session can pick up the SD3-
specific wrapper regression without re-deriving context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three coupled fixes so chained LTX-2 video generation stops drifting into
"really strange" territory after the first clip:

1. Persistent source-image anchor on every continuation. Stop zeroing
   ChainStage.source_image in build_auto_expand_stages and
   build_stage_generate_request so every stage receives the starting image.
   render_chain_stage re-routes the frame-0 staged image into the append
   path at frame = motion_tail_pixel_frames with CHAIN_SOFT_ANCHOR_STRENGTH
   = 0.4, giving the free-region denoise a durable cross-attention
   reference for identity without freezing any pixels. Frame-0 slot stays
   owned by the motion-tail pin.

2. Decoded-pixel carryover instead of raw latent carryover. ChainTail now
   carries tail_rgb_frames: Vec<RgbImage> (the last K decoded frames);
   StagedLatent drops its latents: Tensor + causal_first_frame_rgb fields
   in favour of the same. The receiving side VAE-encodes the RGB window
   fresh on the receiving clip's own time axis, so slot 0 is a proper
   causal 1-pixel encoding and slots 1+ are proper 8-pixel continuation
   encodings — no slot-semantics mismatch against the LTX-2 VAE's
   causal-first-frame convention and no backward-pointing jump at the
   stitch boundary. Pre-VAE-decode tail_capture plumbing in
   Ltx2RuntimeSession is kept (marked dead_code) for future diagnostic
   tooling.

3. Bump DEFAULT_MOTION_TAIL and the --motion-tail CLI default 9 → 17 pixel
   frames, i.e. three latent frames (causal + two continuation, ≈0.7 s at
   24 fps). The prior 9-frame window was too little context for the
   denoiser to reconstruct scene / lighting / subject after the pin ran
   out.

Drive-by: fix two pre-existing rust-1.94 clippy lints that were blocking
--all-targets clean builds (field_reassign_with_default in
placement_test.rs, manual repeat_n in download.rs) and update a stale
test-mock in routes_chain.rs to the new ChainTail shape.

Test renames reflect the new invariants:
- normalise_preserves_first_stage_image → normalise_preserves_starting_image_across_all_stages
- chain_only_stage0_carries_source_image → chain_propagates_source_image_to_every_stage

Full workspace test suite and web chainRouting tests green locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jdilley and others added 5 commits April 21, 2026 16:54
Make the TopBar sticky only on the gallery route so the Generate view's
header scrolls with the page. Flatten the ResourceTray so it hugs the
bottom and side edges with an opaque background instead of a floating
rounded card.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Probe sccache with a trivial compile after setup. If it fails (e.g. GHA
artifact cache returning 400), unset RUSTC_WRAPPER for the remaining
steps so cargo can proceed without a wrapper. Swatinem/rust-cache still
handles cross-run caching.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Move `mod tests;` include to the bottom of downloads.rs so
  clippy::items_after_test_module stops firing.
- Scope-allow clippy::await_holding_lock in downloads_test.rs and
  routes_test.rs. Those tests use std::sync::Mutex<()> to serialize
  process-global env-var mutation; holding the guard across `.await`
  is intentional under the current-thread tokio test runtime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jdilley jdilley enabled auto-merge (squash) April 22, 2026 00:39
@jdilley jdilley disabled auto-merge April 22, 2026 00:39
@jdilley jdilley merged commit 49ef35e into main Apr 22, 2026
6 checks passed
@jdilley jdilley deleted the feat/render-chain-v1 branch April 22, 2026 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant