Skip to content

feat(audio): scene-fitted Kokoro narration — reframe narrate (0.6.44)#77

Merged
kiyeonjeon21 merged 1 commit into
mainfrom
feat/kokoro-narration
Jun 21, 2026
Merged

feat(audio): scene-fitted Kokoro narration — reframe narrate (0.6.44)#77
kiyeonjeon21 merged 1 commit into
mainfrom
feat/kokoro-narration

Conversation

@kiyeonjeon21

Copy link
Copy Markdown
Owner

Make voiceover first-class and fit it to the scene. Today Kokoro VO is generated by a standalone generate.py and the cues are hand-wired, with no awareness of how much time each line has. This adds a narration representation in the scene IR (text + label anchor + voice, authored as a sibling <scene>-vo/script.json the scene imports) and a reframe narrate generator that reads the label clock and auto-fits each line's speech rate to its slot. Being label-anchored, the VO survives retiming/regen.

Core (golden-safe, additive)

  • AudioIR.narration (ir.ts) — { at, text, voice?, gain?, offset? } + baked { file, speed, duration }.
  • resolveAudioPlan expands narration → label-anchored file cues; the baked duration sizes the bed's duck window; an un-synthesized line warns (not fatal).
  • validate.ts narration-* checks. No-narration scenes stay byte-identical (404 tests green, goldens unchanged).

CLI + sidecar

  • reframe narrate <scene> [--voice] [--max-speed] [--script] [--dry-run] — slot from compiled.labelTimes, ≤2-pass auto-fit (speed clamped 1.0–maxSpeed, warns on overrun), bakes file/voice/speed/duration into script.json (like assemble bakes ffprobe numbers), prints a fit table. --dry-run estimates with no synthesis.
  • narrate.py Kokoro sidecar (stdin JSON → wavs + durations), shipped beside dist/narrate.js. python + kokoro preflighted like ffmpeg/chromium (optional dep).

Determinism

The .wav are external assets (same-machine, not golden), like images — only the AudioPlan is deterministic. Synthesis is out-of-band; commit script.json + wavs together.

Verification

  • pnpm test (404) + pnpm typecheck green; goldens unchanged.
  • New core tests: narration → file cues with duration-sized duck windows; validation; un-synthesized warning.
  • End-to-end (this machine, Kokoro installed): narrate --dry-run fit table → real synth + auto-fit (point sped to fit, then tuned to 1.0×) → render produced an mp4 with 3 VO cues, bed ducking under each line.
  • Packaged build ships dist/narrate.js + dist/narrate.py; packaged --dry-run smoke-tested.

Example: examples/scenes/narrated-demo.ts (+ narrated-demo-vo/script.json + wavs). Docs: AGENTS.md, CHANGELOG, README + cli-reference + examples. PATCH bump reframe-video0.6.44.

🤖 Generated with Claude Code

Make voiceover first-class and fit it to the scene. Narration is addressable
DATA (text + label anchor + voice) authored as a sibling <scene>-vo/script.json
the scene imports into audio.narration; `reframe narrate` reads the label clock,
synthesizes each line with a Kokoro python sidecar, and auto-fits its speech rate
to the slot between its anchor and the next line. The scene graph drives the voice
timing, and (being label-anchored) it survives retiming/regen.

Core (golden-safe, additive):
- AudioIR.narration: NarrationLineIR[] (ir.ts) — { at, text, voice?, gain?,
  offset? } + baked { file, speed, duration }.
- resolveAudioPlan expands narration → label-anchored file cues; the baked
  `duration` sizes the bed's duck window; an un-synthesized line warns (not fatal).
- validate.ts: narration-* structural checks. No-narration scenes stay
  byte-identical (404 tests green, goldens unchanged).

CLI + sidecar:
- `reframe narrate <scene> [--voice] [--max-speed] [--script] [--dry-run]`
  (narrate.ts): slot from compiled.labelTimes, ≤2-pass auto-fit (speed clamped
  1.0–maxSpeed, warn on overrun), bakes file/voice/speed/duration into script.json,
  prints a fit table. --dry-run estimates with no synthesis.
- narrate.py Kokoro sidecar (stdin JSON → wavs + durations); shipped beside
  dist/narrate.js. python+kokoro preflighted like ffmpeg/chromium (optional dep).

Determinism: the .wav are external assets (same-machine, not golden), like images;
only the AudioPlan is deterministic. Example: examples/scenes/narrated-demo.ts
(+ narrated-demo-vo/script.json), AGENTS.md / CHANGELOG / docs, scene count 67→68.
PATCH bump reframe-video → 0.6.44.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mintlify

mintlify Bot commented Jun 21, 2026

Copy link
Copy Markdown

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
reframe 🟢 Ready View Preview Jun 21, 2026, 11:54 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@kiyeonjeon21 kiyeonjeon21 merged commit daf146e into main Jun 21, 2026
2 checks passed
@kiyeonjeon21 kiyeonjeon21 deleted the feat/kokoro-narration branch June 21, 2026 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant