feat(audio): scene-fitted Kokoro narration — reframe narrate (0.6.44)#77
Merged
Conversation
Make voiceover first-class and fit it to the scene. Narration is addressable
DATA (text + label anchor + voice) authored as a sibling <scene>-vo/script.json
the scene imports into audio.narration; `reframe narrate` reads the label clock,
synthesizes each line with a Kokoro python sidecar, and auto-fits its speech rate
to the slot between its anchor and the next line. The scene graph drives the voice
timing, and (being label-anchored) it survives retiming/regen.
Core (golden-safe, additive):
- AudioIR.narration: NarrationLineIR[] (ir.ts) — { at, text, voice?, gain?,
offset? } + baked { file, speed, duration }.
- resolveAudioPlan expands narration → label-anchored file cues; the baked
`duration` sizes the bed's duck window; an un-synthesized line warns (not fatal).
- validate.ts: narration-* structural checks. No-narration scenes stay
byte-identical (404 tests green, goldens unchanged).
CLI + sidecar:
- `reframe narrate <scene> [--voice] [--max-speed] [--script] [--dry-run]`
(narrate.ts): slot from compiled.labelTimes, ≤2-pass auto-fit (speed clamped
1.0–maxSpeed, warn on overrun), bakes file/voice/speed/duration into script.json,
prints a fit table. --dry-run estimates with no synthesis.
- narrate.py Kokoro sidecar (stdin JSON → wavs + durations); shipped beside
dist/narrate.js. python+kokoro preflighted like ffmpeg/chromium (optional dep).
Determinism: the .wav are external assets (same-machine, not golden), like images;
only the AudioPlan is deterministic. Example: examples/scenes/narrated-demo.ts
(+ narrated-demo-vo/script.json), AGENTS.md / CHANGELOG / docs, scene count 67→68.
PATCH bump reframe-video → 0.6.44.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Make voiceover first-class and fit it to the scene. Today Kokoro VO is generated by a standalone
generate.pyand the cues are hand-wired, with no awareness of how much time each line has. This adds anarrationrepresentation in the scene IR (text + label anchor + voice, authored as a sibling<scene>-vo/script.jsonthe scene imports) and areframe narrategenerator that reads the label clock and auto-fits each line's speech rate to its slot. Being label-anchored, the VO survives retiming/regen.Core (golden-safe, additive)
AudioIR.narration(ir.ts) —{ at, text, voice?, gain?, offset? }+ baked{ file, speed, duration }.resolveAudioPlanexpands narration → label-anchoredfilecues; the bakeddurationsizes the bed's duck window; an un-synthesized line warns (not fatal).validate.tsnarration-*checks. No-narration scenes stay byte-identical (404 tests green, goldens unchanged).CLI + sidecar
reframe narrate <scene> [--voice] [--max-speed] [--script] [--dry-run]— slot fromcompiled.labelTimes, ≤2-pass auto-fit (speed clamped 1.0–maxSpeed, warns on overrun), bakesfile/voice/speed/durationintoscript.json(likeassemblebakes ffprobe numbers), prints a fit table.--dry-runestimates with no synthesis.narrate.pyKokoro sidecar (stdin JSON → wavs + durations), shipped besidedist/narrate.js. python +kokoropreflighted like ffmpeg/chromium (optional dep).Determinism
The
.wavare external assets (same-machine, not golden), like images — only the AudioPlan is deterministic. Synthesis is out-of-band; commitscript.json+ wavs together.Verification
pnpm test(404) +pnpm typecheckgreen; goldens unchanged.duration-sized duck windows; validation; un-synthesized warning.narrate --dry-runfit table → real synth + auto-fit (point sped to fit, then tuned to 1.0×) →renderproduced an mp4 with 3 VO cues, bed ducking under each line.dist/narrate.js+dist/narrate.py; packaged--dry-runsmoke-tested.Example:
examples/scenes/narrated-demo.ts(+narrated-demo-vo/script.json+ wavs). Docs: AGENTS.md, CHANGELOG, README + cli-reference + examples. PATCH bumpreframe-video→ 0.6.44.🤖 Generated with Claude Code