Feat/render chain v1 by jdilley · Pull Request #261 · utensils/mold

jdilley · 2026-04-21T04:16:26Z

No description provided.

At `strength < 1.0` (the `--strength 0.75` LTX-2 i2v default), `run_real_distilled_stage` was cloning `video_latents` *after* `apply_stage_video_conditioning` had already soft-blended the first latent frame positions with noise: the "clean reference" tensor that the per-step denoise-mask blend pulls conditioned tokens toward became `noise*(1-s) + source*s` at replacement positions. Used as the clean target, that pre-blended tensor pinned the first latent to a noisy ghost of the image at every step, so i2v runs produced a first frame that was 25 % noise + 75 % image instead of the source image. Introduce a `clean_latents_for_conditioning` helper that re-applies the replacement-based conditioning with `strength = 1.0` on top of the post-apply tensor, overwriting replacement positions with pure source tokens while appended keyframe tokens and pure-noise regions pass through unchanged. `strength = 1.0` and pure-T2V paths remain bit-for-bit identical. Two regression tests cover the soft-blended case and the no-replacements passthrough. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Introduce `mold_core::chain` with the `ChainStage` / `ChainRequest` / `ChainResponse` types that will carry server-side chained LTX-2 video generation. The wire format is stages-based from day one so the v2 movie-maker UI can author multi-prompt / multi-keyframe chains without breaking callers: v1 only exposes a single-prompt auto-expand form (`prompt` + `total_frames` + `clip_frames`), and `normalise()` collapses it into a canonical `Vec<ChainStage>` before any engine work runs. Normalisation matches the stitch math that Phase 1.4 of the plan will use: delivered_frames = clip_frames + (N - 1) * (clip_frames - motion_tail) so auto-expand picks `N` large enough to cover `total_frames` with tail-overlap trimming in mind; the over-production is discarded from the final clip's tail per the 2026-04-20 sign-off. Guardrails cap chains at 16 stages (≈1552 frames at 97-frame clips, ~64 s at 24 fps), require `8k+1` frame counts for LTX-2, and forbid `motion_tail_frames >= clip_frames` so every continuation emits at least one new frame. Also lifts the existing `base64_opt` serde helper in `types.rs` from private to `pub(crate)` so chain types can share the single source of truth for base64 wire encoding. Unit tests cover: split-into-stages, first-stage-image preservation, empty-request rejection, non-8k+1 rejection, canonical-form passthrough, single-stage short chains, >16-stage guardrails, motion-tail >= clip rejection, missing auto-expand fields, and a property test confirming the auto-expand stage count delivers the requested total frames under every representative (total, clip, tail) combo from the design. tasks/render-chain-v1-plan.md adds the signed-off decisions block at the top so the rationale travels with the code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add `MoldClient::generate_chain` (POST /api/generate/chain, non- streaming JSON request/response) and `MoldClient::generate_chain_stream` (POST /api/generate/chain/stream, SSE) mirroring the existing `generate` / `generate_stream` shape. The server routes land in Phase 2; this commit ships the client surface so Phase 1's fake-engine tests and Phase 2's route wiring have a settled wire contract to implement against. Chain-specific wire types (all new, under `mold_core::chain`): - `ChainProgressEvent` — tagged enum streamed under `event: progress`. Variants: `chain_start { stage_count, estimated_total_frames }`, `stage_start { stage_idx }`, `denoise_step { stage_idx, step, total }`, `stage_done { stage_idx, frames_emitted }`, `stitching { total_frames }`. snake_case tagged JSON matches the existing `SseProgressEvent` style. - `SseChainCompleteEvent` — kept as a sibling to `crate::types::SseCompleteEvent` rather than an extension, so chain completion shape can evolve independently (stage_count, stitched- video payload, optional thumb/GIF, audio metadata, elapsed time). Error translation matches the single-clip methods: | Status | generate_chain | generate_chain_stream | |------------------------|-------------------------------------------------|-------------------------------------------------| | 200 | parse ChainResponse JSON | parse SSE until `complete` event | | 404, empty body | hard error "chain endpoint not found" | `Ok(None)` — caller may fall back | | 404, non-empty body | `MoldError::ModelNotFound` | `MoldError::ModelNotFound` | | 422 | `MoldError::Validation` | `MoldError::Validation` | | 4xx/5xx else | generic anyhow | generic anyhow | The non-streaming empty-404 behaviour deliberately differs from SSE: streaming clients can fall back to non-streaming, but non-streaming callers have nowhere to go and should fail loudly. Integration coverage: - `crates/mold-core/tests/chain_client.rs` (wiremock): endpoint/body shape assertion on non-streaming; 422 → Validation; 404-with-body → ModelNotFound; non-streaming empty 404 → hard error; SSE empty 404 → Ok(None); SSE progress + complete roundtrip reconstructs `ChainResponse` with thumb + gpu. - Pure serde roundtrip test for every `ChainProgressEvent` variant asserting snake_case tag format. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Introduce the carryover primitive that render-chain stages hand to each other. `ChainTail { frames, latents, last_rgb_frame }` bundles the final VAE latents of a stage's motion tail so the next stage can patch those tokens straight into its conditioning without a VAE decode → RGB → VAE encode round-trip. No engine wiring yet — the orchestrator and the `generate_with_carryover` entry point land in sibling commits. Helpers in the new `ltx2::chain` module: - `tail_latent_frame_count(pixel_frames: u32) -> usize` — exposes the LTX-2 VAE's 8× causal-first-frame temporal ratio as the formula `((n - 1) / 8) + 1`. Matches `VideoLatentShape::from_pixel_shape`. Panics on `0`; callers must validate upstream. - `extract_tail_latents(final_latents: &Tensor, pixel_frames: u32) -> Result<Tensor>` — narrows the time axis of a rank-5 `[B, C, T, H, W]` latents tensor down to the last K latent frames corresponding to the requested pixel-frame tail. Errors (not panics) on rank mismatch or oversize tail request so orchestrator bugs surface as operational errors, not process aborts. Unit tests cover: the VAE formula across representative tail sizes (4→1, 9→2, 16→2, 17→3, 97→13), rejection of a zero pixel-frame tail, correct narrowing on a synthetic [1, 2, 3, 1, 1] tensor with sentinel values proving the last latent frame is returned across all channels, narrowing on a larger rank-5 tensor, rank-4 rejection, and oversize-tail rejection. All tests are weight-free and run under `cargo test -p mold-ai-inference --lib`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`StagedConditioning` now carries both disk-backed images (existing single-clip path) and in-memory latent blocks (new, empty for every non-chain caller). The render-chain orchestrator will populate the new `latents: Vec<StagedLatent>` field with a prior stage's motion-tail latents so the receiving stage can patchify those tokens straight into its `StageVideoConditioning::replacements` without the VAE decode → RGB → VAE encode round-trip — that's the point of latent carryover. Changes: - `StagedLatent { latents, frame, strength }` in `ltx2::conditioning` — mirrors `StagedImage`'s semantics but with a pre-encoded `candle_core::Tensor` instead of a disk path. `frame = 0` routes tokens through `replacements` (chain v1 motion tail); non-zero `frame` builds a `VideoTokenAppendCondition` so the movie- maker in v2 can thread latents into arbitrary positions. - `StagedConditioning` drops `PartialEq` since `Tensor` doesn't implement structural equality. Grepped for comparison usages — none. Existing callers of `stage_conditioning()` get `latents: Vec::new()`. - `maybe_load_stage_video_conditioning` in `runtime.rs`: - Early-return gate now also considers `plan.conditioning.latents`. - VAE is loaded conditionally: only when images or reference video need encoding. Pure-latent chain handoffs skip VAE load entirely. - New loop iterates staged latents, patchifies each block, routes frame-0 tokens to `replacements` (keyframe pipelines aside) and other frames to `appended` — symmetrical with the image path. Tests (weight-free): - `stage_conditioning_leaves_latents_empty_for_non_chain_callers` — pins the back-compat invariant: every non-chain generate path continues to receive an empty latents vec. - `staged_latent_patchifies_to_same_token_shape_as_image_at_single_latent_frame` — verifies a `[1, 128, 1, 22, 38]` chain-tail latent block patchifies to `[1, 836, 128]` tokens, the same shape the image-conditioning path produces after VAE encode + patchify for the equivalent latent geometry. Chain orchestrator + `Ltx2Engine::generate_with_carryover` land in the sibling Phase 1c commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add `Ltx2ChainOrchestrator<R: ChainStageRenderer>` that drives the per-stage render loop for chained video generation: builds each stage's `GenerateRequest`, threads the prior stage's `ChainTail` through the renderer, drops the leading motion-tail frames on every continuation, accumulates frames, and returns a `ChainRunOutput`. The `ChainStageRenderer` trait is the seam between the orchestrator (pure control flow) and the engine (tensor work). The LTX-2 engine implementation lands in Phase 1d — this commit ships the orchestrator fully tested against a fake renderer so the engine plumbing can be reviewed in isolation. Behaviour nailed down (from the 2026-04-20 sign-off): - **Per-stage seeds**: `base_seed ^ ((stage_idx as u64) << 32)`. A stage's `seed_offset` overrides the default when set — reserved for the v2 movie-maker's "regen just this stage" affordance. - **Motion-tail trim**: stage 0 emits all its frames; continuations drop the leading `req.motion_tail_frames` pixel frames because those duplicate the previous clip's tail that was threaded back as latent conditioning. `motion_tail_frames = 0` is a legitimate configuration (simple concat). - **Fail closed**: a mid-chain renderer error bubbles up immediately. All frames accumulated so far are discarded — no partial stitch is ever written to the gallery. Partial resume is a v2 feature. - **No audio or target-total-frame trim in v1**: the orchestrator delivers whatever frame count the stages produce (with tail drops applied). Target-total trimming is the caller's responsibility (server / CLI). Audio-video chains are out of scope for v1. Progress events forwarded through `Option<&mut dyn FnMut(ChainProgressEvent)>`: `ChainStart` → `StageStart` → `DenoiseStep` (wrapping the renderer's `StageProgressEvent`s with `stage_idx`) → `StageDone` → (next stage) → `Stitching`. Chain-level subscribers can render a stacked overall+per-stage progress bar without coordinating with the engine. Per-stage `GenerateRequest` is constructed to ensure only stage 0 carries the optional starting image — even if the caller forgot to clear it on later stages, the orchestrator suppresses it because continuations must condition on motion-tail latents only. `strength` becomes `1.0` on continuations regardless of the chain default since the tail carryover is always a hard replacement. Tests (weight-free, injecting a `FakeRenderer`): - `chain_runs_all_stages_and_drops_tail_prefix_from_continuations` — 3×97-frame clips with 4-frame tail produce exactly 97 + 2×93 = 283 accumulated frames. - `chain_with_zero_tail_concats_full_clips_without_drop` — `tail=0` keeps every frame on continuations. - `chain_empty_stages_errors_without_calling_renderer` — zero-stage requests fail before touching the renderer. - `chain_fails_closed_mid_chain_discarding_accumulated_frames` — simulated stage-1 failure bubbles up; stage 2 never runs. - `chain_derives_per_stage_seed_from_base_seed` — three stages from base seed 42 land on 42, 42^(1<<32), 42^(2<<32). - `chain_only_stage0_carries_source_image` — a source image set on stages[1] is suppressed, so continuations can't accidentally condition on a still image instead of the motion tail. - `chain_forwards_engine_events_with_stage_idx_wrapping` — checks the full expected event order for a 2-stage chain with per-stage progress emission. - `chain_rejects_motion_tail_ge_stage_frames_before_running` — up-front validation catches `motion_tail >= frames` so the renderer is never invoked with a degenerate configuration. - `chain_respects_seed_offset_override_when_stage_provides_one` — pins `ChainStage::seed_offset` semantics for the v2 movie-maker hook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Capture the state of the branch (6 commits on local main, not pushed), the five signed-off design decisions, the Phase 1d → 2 → 3 → 4 remaining work with specific file:line surgery points, and a ready-to-paste prompt for a fresh Claude Code session. Gotchas documented: stale `test = false` claim in CLAUDE.md, pre-existing clippy warnings unrelated to this branch, VAE 8× causal temporal ratio already encoded by `extract_tail_latents`, and the existing-parameter-reuse opportunity on `run_real_distilled_stage` (no new params needed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…capture Add a pre-VAE-decode tail-capture slot on Ltx2RuntimeSession threaded into render_real_distilled_av, implement Ltx2Engine::render_chain_stage that injects a carryover ChainTail as a StagedLatent and extracts the post-denoise tail, and wire it through impl ChainStageRenderer for Ltx2Engine. Distilled-only in v1; other pipeline families error up-front. Amend ChainStageRenderer::render_stage to carry motion_tail_pixel_frames so the engine knows how many frames to narrow off the emitted latents. Part of render-chain v1 (Phase 1d). Weight-free tests added; full mold-inference and mold-core lib test suites stay green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add POST /api/generate/chain and POST /api/generate/chain/stream for server-side chained LTX-2 video generation. Handler take/restores the engine out of the model cache and runs the full chain in a spawn_blocking so the sync orchestrator never blocks the async runtime. Drives Ltx2ChainOrchestrator through the engine's ChainStageRenderer view, trims accumulated frames to target total from the tail per sign-off, encodes the stitched output (MP4 when the mp4 feature is on, APNG fallback otherwise), and saves to the gallery with a synthesised OutputMetadata. Expose as_chain_renderer() on InferenceEngine (default None), overridden by Ltx2Engine. Relax Ltx2ChainOrchestrator's renderer bound to ?Sized so trait objects compose cleanly. Promote ltx_video::video_enc from pub(crate) to pub so mold-server can reuse encode_mp4/encode_apng/ encode_gif/first_frame_png for chain stitching. Weight-free route tests cover the happy path, the mid-chain failure (502 Bad Gateway), the unsupported-model rejection (422), progress event ordering through the SSE helper, and tail-trim behaviour. Part of render-chain v1 (Phase 2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When --frames exceeds the model's per-clip cap (97 for LTX-2 distilled), `mold run` now auto-builds a ChainRequest and routes to POST /api/generate/chain/stream (server mode) or runs the Ltx2ChainOrchestrator in-process (--local mode). New flags --clip-frames and --motion-tail let users tune the per-clip length and the motion-tail overlap (default 4 frames of latent carryover between clips). Stacked progress bars render a parent "Chain" bar (total frames) and a wiping per-stage bar (denoise step / total). Both server and local paths share a single encode+save+preview epilogue so output formatting, stdout piping, and gallery save are identical. Models outside LTX-2 distilled families error fast when --frames exceeds the single-clip cap rather than silently dropping frames or hitting the server's chain route with a non-chainable model. A pure `decide_chain_routing` helper captures the branching logic so auto- routing is unit-testable without async or network. Part of render-chain v1 (Phase 3). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Document render-chain v1 across the four surfaces: a new "Chained video output" section in website/models/ltx2.md explaining the per-clip cap, motion-tail carryover, and the --frames / --clip-frames / --motion-tail CLI contract; request/response/SSE schemas for the new POST /api/generate/chain[/stream] endpoints in website/api/index.md; an Unreleased/Added bullet in CHANGELOG.md covering the feature end-to-end; and the new flags + endpoint in .claude/skills/mold/SKILL.md so OpenClaw and the other AI agents surface chained video correctly. Part of render-chain v1 (Phase 4). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

codecov · 2026-04-21T04:21:29Z

Codecov Report

❌ Patch coverage is 64.26584% with 1156 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.45%. Comparing base (1410d08) to head (ecae6d8).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/mold-cli/src/commands/chain.rs	27.37%	260 Missing ⚠️
crates/mold-server/src/routes.rs	11.44%	209 Missing ⚠️
crates/mold-server/src/routes_chain.rs	69.55%	144 Missing ⚠️
crates/mold-cli/src/commands/generate.rs	0.00%	89 Missing ⚠️
crates/mold-inference/src/ltx2/pipeline.rs	41.98%	76 Missing ⚠️
crates/mold-inference/src/ltx2/runtime.rs	69.60%	69 Missing ⚠️
crates/mold-server/src/lib.rs	0.00%	57 Missing ⚠️
crates/mold-server/src/queue.rs	72.06%	50 Missing ⚠️
crates/mold-server/src/resources.rs	16.00%	42 Missing ⚠️
crates/mold-core/src/client.rs	73.72%	31 Missing ⚠️
... and 15 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #261      +/-   ##
==========================================
+ Coverage   59.11%   59.45%   +0.34%     
==========================================
  Files         189      193       +4     
  Lines       89525    92582    +3057     
==========================================
+ Hits        52921    55046    +2125     
- Misses      36604    37536     +932

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The `cuda`/`metal` feature-gated local orchestrator branch in `run_chain_local` passed `&model_name` to `mold_inference::create_engine`, which takes `model_name: String`. Phase 3's verification only ran `cargo check --features preview,discord,expand,tui,webp,mp4` — the feature-matrix omitted `cuda`/`metal`, so CI and the local-default check both missed the mismatch. Caught at rebuild time on killswitch (sm_86 / RTX 3090 dual-GPU build). `cargo check -p mold-ai --features metal,expand` now clean locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`ClipWithTokenizer::encode_text_to_embedding` padded up to `max_position_embeddings` (77) but never truncated down. Prompts that tokenised to more than 77 CLIP tokens fed an `[1, N, 768]` tensor into `ClipTextTransformer`, where the 77-slot position-embedding broadcast-add blew up with `shape mismatch in broadcast_add, lhs: [1, N, 768], rhs: [1, 77, 768]`. The pooled-output slice at `eos_position = tokens.len() - 1` was also out-of-bounds on the same path. Extract the token preparation into a pure `prepare_clip_tokens` helper that truncates to `max_len` (copying the trailing EOS token into the final slot so the pooled branch still reads an EOS-position hidden state) and then pads up to `max_len`. Wire it into both CLIP-L and CLIP-G via the shared `ClipWithTokenizer` path, so every `sd3*` model benefits. Unit-tested weight-free with four cases: short prompt, exact-77, 132-token overlong (matches the observed failure shape), and an empty tokenisation. All four pass; the 132-token test was red before the fix and is green after. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

LTX-2 and the upscaler hardcoded `Device::new_cuda(0)` and `reclaim_gpu_memory(0)` in their engine bodies, ignoring the `gpu_ordinal` they were dispatched with. On a multi-GPU host that meant dispatching LTX-2 to GPU 1 still destroyed GPU 0's primary CUDA context mid-denoise, which surfaced as a misleading CUDA_ERROR_OUT_OF_MEMORY on the sibling job and then segfaulted inside `cuEventDestroy_v2` when candle's Drop chain unwound. - Thread `gpu_ordinal` through `Ltx2Engine` → `Ltx2RuntimeSession` and `UpscalerEngine` / `create_upscale_engine`; replace all four hardcoded-0 call sites. - Add a thread-local GPU binding (`init_thread_gpu_ordinal`) set by each GPU worker thread; `create_device` and `reclaim_gpu_memory` `debug_assert` the caller's ordinal matches, so any future hardcoded-0 regression panics in debug builds instead of silently corrupting a sibling GPU's context. - Update all 4 `create_upscale_engine` callers (CLI, TUI, two in server routes) to pass ordinal 0 explicitly. Server upscaler cache stays pinned to GPU 0 with a comment noting the per-worker cache migration path if multi-GPU upscale becomes interesting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds two bulk-UX affordances to the web gallery SPA. Hide-mode toggle blurs every tile behind a dark shroud with a per-tile "Reveal" for single peeks; the global preference persists in localStorage, peeks don't. Select mode enables click-to-toggle, shift-click range, and drag-marquee selection with a floating action bar for Select all / Clear / Delete selected / Delete all. Bulk deletes parallelize via Promise.allSettled and partial failures surface a rollup. Select button is gated on capabilities.gallery.can_delete so servers without MOLD_GALLERY_ALLOW_DELETE=1 don't expose dead UI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n-v1

Collapse multi-line signature CI rustfmt wanted on one line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

find_flux_reference_gguf hardcoded the candidate list to flux-dev:q{8,6,4} (plus schnell for schnell targets), so a host with flux-krea:q8 on disk — a dev-family QuantStack GGUF with the full embedding set including guidance_in — still forced users to download a redundant ~12 GB flux-dev:q8 reference before they could load ultrareal-v4:q8 or any other city96-format fine-tune. Probe flux-krea:q{8,6,4} after base flux-dev. The existing gguf_has_guidance verification still gates acceptance, so nothing is assumed about candidate completeness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…endpoint The SPA previously POSTed every generate to /api/generate/stream regardless of frame count. Requesting frames > 97 (the LTX-2 19B/22B distilled per-clip cap) then OOMed at VAE decode after a full denoise run — three identical failures on a 241-frame 512x512 img2v request before the missing routing was traced. Mirror the CLI's decide_chain_routing in a new pure helper (web/src/lib/chainRouting.ts) so the Composer and the submit path share the same decision. When the decision is `chain`, useGenerateStream dispatches to /api/generate/chain/stream with an auto-expand ChainRequest, folds ChainProgressEvent into the existing JobProgress (so RunningJobCard renders "Denoising clip K/N · step X/Y" with no per- event UI changes), and shape-shifts SseChainCompleteEvent into SseCompleteEvent on completion. A `reject` decision hard-blocks submit with an alert() and surfaces a red error pill in the Composer; a `chain` decision surfaces a brand-tinted "Will render as N chained clips" pill so users understand the expected latency. Non-chainable families below the per-clip budget stay on the single-clip path unchanged. LTX-2 distilled requests at-or-below 97 frames also stay single. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n requests Two bugs that surfaced once the web UI started auto-promoting long LTX-2 distilled video requests to the chain endpoint. 1) Stage 2 of a chain errored with "native LTX-2 prompt encoder is unavailable" because Ltx2RuntimeSession::prepare() consumes the encoder on first call (intentional VRAM-free pattern) and render_chain_stage restores the session between stages. Fix: cache the NativePromptEncoding output on the session keyed by the EncodedPromptPair + unconditional flag so same-prompt follow-ups skip the encoder entirely. A new can_reuse_for(&plan) helper lets Ltx2Engine detect when a persisted session carries a consumed encoder AND a different prompt arrived, in which case the engine drops it and builds a fresh session. 2) Concurrent chain requests raced with "engine '...' vanished from cache after ensure_model_ready" because routes_chain deliberately takes the engine out of model_cache for the full chain duration without any serialization across chain requests. Fix: add AppState.chain_lock held for the whole run_chain; single-clip requests still flow through the normal generation queue unchanged. Test updates: runtime_session_prepare_consumes_prompt_encoder now documents the same-prompt cache-hit semantic; a new runtime_session_prepare_rejects_encoder_reuse_with_different_prompt locks in the fresh-session-required branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…, video-length UX Reworks the /generate surface per user feedback: - Move output-format dropdown onto the Composer next to the icon buttons; auto-pick the first valid format for the selected model family and reset when switching between image/video families (png → mp4 on FLUX → LTX-2). - Promote resource telemetry to a global collapsible bottom tray mounted in App.vue (defaults collapsed, state persisted, `r` to toggle). Adds GPU core-utilization (NVML) and CPU utilization (sysinfo, sampled from a persistent System threaded through the 1 Hz aggregator) alongside the existing VRAM/RAM bars. - Persist the generate queue to localStorage so cards survive refresh; running jobs rehydrate as "Disconnected — may still be running on the server" with a per-card dismiss + "Clear finished" affordance. - Poll /api/gallery every 10 s and /api/models every 15 s from the Generate page; bump model polling to 3 s while the settings modal is open so freshly-downloaded variants show up without a manual refresh. - Collapse the device-placement panel by default (state persisted). - Disable the prompt-expansion Preview button while a generation is running in the queue; expansion already defaulted off. - Add a Video Length (s) field in the settings video row; editing any of Frames / Length / FPS recomputes the other two with the backend only consuming the 8n+1-clamped frame count. Backend: - GpuSnapshot.gpu_utilization: Option<u8> populated via NVML utilization_rates(); None on nvidia-smi fallback and on Metal. - ResourceSnapshot.cpu: Option<CpuSnapshot { cores, usage_percent }> driven by a CpuSampler with a long-lived sysinfo::System, threaded through spawn_blocking. First tick has cpu = None (no delta yet). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…lt to 9 The previous revision XORed `(idx as u64) << 32` into each stage's seed so the initial noise tensor differed per clip. With the motion-tail pin now grounded on a proper causal-first latent, per-stage noise diversity just amplifies drift at the stitch point — same-seed noise stays frozen in the pinned region and settles on a consistent motion profile in the free region. Callers that want variation can still supply `stage.seed_offset` explicitly. Also bump --motion-tail default from 4 → 9 pixel frames (two LTX-2 latent frames under the VAE's 8× causal temporal compression: causal-first slot + one continuation slot). Four only pinned the causal slot, which the decoder reconstructs as a single pixel frame and leaves the inter-clip stitch visibly jumpy. `DEFAULT_MOTION_TAIL` in web/src/lib/chainRouting.ts is kept in sync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Captures the repro, the failing code paths, and the already-landed fix in the SD1.5/SDXL shared encoder so a future session can pick up the SD3- specific wrapper regression without re-deriving context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three coupled fixes so chained LTX-2 video generation stops drifting into "really strange" territory after the first clip: 1. Persistent source-image anchor on every continuation. Stop zeroing ChainStage.source_image in build_auto_expand_stages and build_stage_generate_request so every stage receives the starting image. render_chain_stage re-routes the frame-0 staged image into the append path at frame = motion_tail_pixel_frames with CHAIN_SOFT_ANCHOR_STRENGTH = 0.4, giving the free-region denoise a durable cross-attention reference for identity without freezing any pixels. Frame-0 slot stays owned by the motion-tail pin. 2. Decoded-pixel carryover instead of raw latent carryover. ChainTail now carries tail_rgb_frames: Vec<RgbImage> (the last K decoded frames); StagedLatent drops its latents: Tensor + causal_first_frame_rgb fields in favour of the same. The receiving side VAE-encodes the RGB window fresh on the receiving clip's own time axis, so slot 0 is a proper causal 1-pixel encoding and slots 1+ are proper 8-pixel continuation encodings — no slot-semantics mismatch against the LTX-2 VAE's causal-first-frame convention and no backward-pointing jump at the stitch boundary. Pre-VAE-decode tail_capture plumbing in Ltx2RuntimeSession is kept (marked dead_code) for future diagnostic tooling. 3. Bump DEFAULT_MOTION_TAIL and the --motion-tail CLI default 9 → 17 pixel frames, i.e. three latent frames (causal + two continuation, ≈0.7 s at 24 fps). The prior 9-frame window was too little context for the denoiser to reconstruct scene / lighting / subject after the pin ran out. Drive-by: fix two pre-existing rust-1.94 clippy lints that were blocking --all-targets clean builds (field_reassign_with_default in placement_test.rs, manual repeat_n in download.rs) and update a stale test-mock in routes_chain.rs to the new ChainTail shape. Test renames reflect the new invariants: - normalise_preserves_first_stage_image → normalise_preserves_starting_image_across_all_stages - chain_only_stage0_carries_source_image → chain_propagates_source_image_to_every_stage Full workspace test suite and web chainRouting tests green locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Make the TopBar sticky only on the gallery route so the Generate view's header scrolls with the page. Flatten the ResourceTray so it hugs the bottom and side edges with an opaque background instead of a floating rounded card. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Probe sccache with a trivial compile after setup. If it fails (e.g. GHA artifact cache returning 400), unset RUSTC_WRAPPER for the remaining steps so cargo can proceed without a wrapper. Swatinem/rust-cache still handles cross-run caching. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Move `mod tests;` include to the bottom of downloads.rs so clippy::items_after_test_module stops firing. - Scope-allow clippy::await_holding_lock in downloads_test.rs and routes_test.rs. Those tests use std::sync::Mutex<()> to serialize process-global env-var mutation; holding the guard across `.await` is intentional under the current-thread tokio test runtime. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jdilley and others added 11 commits April 20, 2026 16:14

greptile-apps Bot reviewed Apr 21, 2026

View reviewed changes

jdilley and others added 17 commits April 20, 2026 21:43

fix(multi-gpu): restore queueing and worker affinity

ae7ae98

merge: bring multi-gpu queue and affinity fixes into feat/render-chai…

2ff127d

…n-v1

style: apply rustfmt to ltx2 runtime new_deferred_cuda

2c88463

Collapse multi-line signature CI rustfmt wanted on one line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore: ignore .worktrees/ directory

95a1af9

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

build: speed up local and CI builds

398f65e

merge: bring build-speed improvements into feat/render-chain-v1

60a4ead

jdilley and others added 5 commits April 21, 2026 16:54

Fix soft anchor token retention in LTX-2 chains

65b89a1

style(ltx2): satisfy cargo fmt on runtime.rs tensor literal

589cce1

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jdilley enabled auto-merge (squash) April 22, 2026 00:39

jdilley disabled auto-merge April 22, 2026 00:39

jdilley merged commit 49ef35e into main Apr 22, 2026
6 checks passed

jdilley deleted the feat/render-chain-v1 branch April 22, 2026 00:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/render chain v1#261

Feat/render chain v1#261
jdilley merged 33 commits intomainfrom
feat/render-chain-v1

jdilley commented Apr 21, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

codecov Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jdilley commented Apr 21, 2026

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented Apr 21, 2026 •

edited

Loading