feat(audio): silence-aware chunk boundary splitting by JFK · Pull Request #51 · JFK/voicesrt

JFK · 2026-04-07T12:07:24Z

Summary

Snap chunk boundaries to natural silences instead of cutting at fixed intervals, preventing word truncation at chunk seams
Run ffmpeg silencedetect once over the full file via find_all_silences(), then compute boundaries in pure Python — no per-target subprocess calls
Each target boundary at N * chunk_duration_sec is snapped to the nearest silence midpoint within ±15s, falling back to the fixed-length boundary when no silence is in range
Reject anchors that would create chunks shorter than _MIN_CHUNK_SEC on either side; short-circuit to the original path when boundary computation collapses to [0.0, total_duration]
Use -ss before -i and -c copy for chunk cuts (~5-10x faster, no re-encoding since chunks come from already-decoded WAV/MP3)
11 unit tests covering the parser, boundary computation, and split_audio integration with mocked ffmpeg

Why

Current split_audio (src/services/audio.py) cuts audio at fixed whisper_chunk_duration_sec intervals with no silence detection. Words spanning a boundary can be truncated in both adjacent chunks. This becomes more visible once chunks stream to the editor (#50), so silence anchoring lands first.

Test plan

pytest tests/test_audio_split.py — 11 passing (parser, boundary computation, integration, tail-edge case)
pytest --ignore=tests/e2e — no regressions
ruff check / ruff format --check clean
Copilot review feedback addressed (3 comments)
Manual verification with a 30-min Japanese audio sample

Commits

b756124 initial implementation
0026fce /simplify refactor: single-pass silence detection + faster chunk cuts
7113ccc Copilot review: short-circuit collapsed boundaries, suppress ffmpeg progress noise, add tail-edge regression test

Closes #49

🤖 Generated with Claude Code

Cut audio chunks at natural pauses instead of fixed intervals to avoid truncating words at chunk seams. Each target boundary at N * chunk_duration is snapped to the midpoint of the closest silence detected by ffmpeg silencedetect within a ±15s window. Falls back to fixed-length cut when no silence is found, so the pipeline never blocks. Closes #49 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Per /simplify review: - Replace per-target find_silence_near with one full-file find_all_silences pass. Boundary computation becomes pure Python over a list of silences, eliminating subprocess plumbing duplication and N sequential ffmpeg invocations. - Add -c copy and -ss before -i to chunk cut ffmpeg args. Fast seek + no re-encode is ~5-10x faster per cut on long files. - Drop find_silence_near and its unused parameters. - Tighten test assertions and remove narration comments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Implements silence-aware audio chunk boundary selection so chunk seams land on natural pauses (with fixed-interval fallback), reducing mid-word truncation during downstream transcription.

Changes:

Added ffmpeg silencedetect parsing + silence midpoint anchoring logic for chunk boundaries.
Updated split_audio() to cut using computed boundaries (with a minimum-chunk-length safeguard).
Added unit tests for silence parsing, boundary computation, and split_audio() behavior with mocked ffmpeg.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`src/services/audio.py`	Adds silence detection/parsing and uses silence-anchored boundaries when splitting audio into chunks.
`tests/test_audio_split.py`	Introduces unit tests for silence parsing, boundary computation, and `split_audio()` integration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-07T12:45:58Z

src/services/audio.py

    duration = await get_audio_duration(audio_path)
    if duration <= chunk_duration_sec:
        return [audio_path]

+    silences = await find_all_silences(audio_path)
+    boundaries = _compute_chunk_boundaries(duration, float(chunk_duration_sec), silences)
+
    chunks = []
-    start = 0.0
-    idx = 0
-    while start < duration:
+    for idx in range(len(boundaries) - 1):
+        chunk_start = boundaries[idx]
+        chunk_len = boundaries[idx + 1] - chunk_start


split_audio() can end up producing exactly one chunk file (e.g., when duration is only slightly above chunk_duration_sec and _compute_chunk_boundaries() returns [0.0, total_duration] to avoid a <1s tail). That breaks the implicit contract used by src/services/transcribe.py (if len(chunks) == 1: ... transcribe(audio_path)), resulting in an unnecessary ffmpeg run and a leftover _chunk000 file that is never used. Consider short-circuiting after computing boundaries (e.g., if len(boundaries) == 2, return [audio_path]) and add a unit test for duration in (chunk_duration_sec, chunk_duration_sec + _MIN_CHUNK_SEC).

Copilot · 2026-04-07T12:45:58Z

src/services/audio.py

+    proc = await asyncio.create_subprocess_exec(
+        "ffmpeg",
+        "-i",
+        str(audio_path),
+        "-af",
+        f"silencedetect=noise={_SILENCE_NOISE_DB}dB:d={_SILENCE_MIN_DUR}",
+        "-f",
+        "null",
+        "-",
+        stdout=asyncio.subprocess.DEVNULL,
+        stderr=asyncio.subprocess.PIPE,
+    )


find_all_silences() runs ffmpeg without -nostats / -hide_banner, so stderr will include frequent progress updates for the entire file; because communicate() buffers all stderr in memory, long inputs can create unnecessarily large buffers. Consider adding -nostats/-hide_banner (while keeping a loglevel that still emits silencedetect lines), or streaming stderr line-by-line and extracting only the silence_start/silence_end lines to avoid unbounded growth.

Copilot · 2026-04-07T12:45:58Z

src/services/audio.py

+async def find_all_silences(audio_path: Path) -> list[tuple[float, float]]:
+    """Run ffmpeg silencedetect once over the entire file.
+
+    Returns list of (start, end) silence ranges, or empty list if detection
+    fails — callers fall back to fixed-length cuts.
+    """
+    proc = await asyncio.create_subprocess_exec(
+        "ffmpeg",
+        "-i",
+        str(audio_path),
+        "-af",
+        f"silencedetect=noise={_SILENCE_NOISE_DB}dB:d={_SILENCE_MIN_DUR}",
+        "-f",
+        "null",
+        "-",
+        stdout=asyncio.subprocess.DEVNULL,
+        stderr=asyncio.subprocess.PIPE,
+    )
+    _, stderr = await proc.communicate()
+    if proc.returncode != 0:
+        logger.debug("silencedetect failed (returncode=%d) — falling back", proc.returncode)
+        return []
+    return _parse_silence_ranges(stderr.decode(errors="replace"))
+
+
+def _nearest_silence_midpoint(
+    silences: list[tuple[float, float]],
+    target: float,
+    window: float,
+) -> float | None:
+    """Return midpoint of the silence closest to `target`, or None if none within ±window."""
+    candidates = [(s + e) / 2 for s, e in silences if abs((s + e) / 2 - target) <= window]
+    if not candidates:
+        return None
+    return min(candidates, key=lambda m: abs(m - target))
+
+
+def _compute_chunk_boundaries(
+    total_duration: float,
+    chunk_duration_sec: float,
+    silences: list[tuple[float, float]],
+) -> list[float]:
+    """Compute chunk boundary timestamps using silence anchoring.
+
+    Each target boundary at N * chunk_duration_sec is snapped to the nearest
+    silence midpoint within ±_SILENCE_SEARCH_WINDOW_SEC, falling back to the
+    fixed-length boundary when no silence is in range. Anchors that would
+    produce a chunk shorter than _MIN_CHUNK_SEC on either side are rejected.
+    """
+    boundaries: list[float] = [0.0]
+    target = chunk_duration_sec
+    while target < total_duration:
+        anchor = _nearest_silence_midpoint(silences, target, _SILENCE_SEARCH_WINDOW_SEC)
+        if anchor is None or anchor <= boundaries[-1] + _MIN_CHUNK_SEC:
+            anchor = target
+        if anchor >= total_duration - _MIN_CHUNK_SEC:
+            break
+        boundaries.append(anchor)
+        target = anchor + chunk_duration_sec
+    boundaries.append(total_duration)
+    return boundaries
+


PR description mentions a find_silence_near() helper that searches a ±15s window per boundary, but the implementation introduces find_all_silences() (single full-file pass) and _compute_chunk_boundaries() always uses the global _SILENCE_SEARCH_WINDOW_SEC. Either update the PR description to match the chosen approach, or adjust the implementation to match the described per-boundary window helper/signature.

- Short-circuit split_audio when boundary computation collapses to [0.0, total_duration]. Previously this produced a single redundant chunk file via ffmpeg even though the caller's len(chunks)==1 optimization would ignore it. - Add -nostats -hide_banner to find_all_silences so the silencedetect ffmpeg run doesn't buffer megabytes of progress text in stderr for long inputs. - Add a regression test for the borderline (chunk_duration_sec < duration < chunk_duration_sec + _MIN_CHUNK_SEC) case. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

JFK · 2026-04-07T13:05:16Z

Thanks @copilot — all 3 points addressed in 7113ccc:

Single-chunk leakage (audio.py:193): Added a len(boundaries) == 2 short-circuit that returns the original path before any ffmpeg invocation. New regression test test_split_audio_short_circuits_when_only_tail_remains covers the borderline case (chunk_duration_sec < duration < chunk_duration_sec + _MIN_CHUNK_SEC).
Unbounded stderr buffer (audio.py:132): Added -nostats -hide_banner to the find_all_silences ffmpeg invocation. silencedetect lines still come through; per-frame progress text no longer accumulates.
PR description drift (audio.py:176): Updated the PR description above to match the actual implementation (find_all_silences single-pass, not the original find_silence_near per-target design — that was the first commit, refactored away in 0026fce per /simplify review).

JFK and others added 2 commits April 7, 2026 21:07

JFK requested a review from Copilot April 7, 2026 12:41

Copilot started reviewing on behalf of JFK April 7, 2026 12:42 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Copilot started work on behalf of JFK April 7, 2026 13:05 View session

Copilot finished work on behalf of JFK April 7, 2026 13:07

JFK merged commit 58d4ae0 into main Apr 7, 2026
3 checks passed

JFK deleted the feat/silence-aware-chunking branch April 7, 2026 13:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(audio): silence-aware chunk boundary splitting#51

feat(audio): silence-aware chunk boundary splitting#51
JFK merged 3 commits intomainfrom
feat/silence-aware-chunking

JFK commented Apr 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

JFK commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JFK commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Test plan

Commits

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

JFK commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JFK commented Apr 7, 2026 •

edited

Loading