Skip to content

feat: Add video/podcast transcription to fetch phase#30

Open
StartupBros wants to merge 1 commit intoalexknowshtml:mainfrom
StartupBros:feat/video-podcast-transcription
Open

feat: Add video/podcast transcription to fetch phase#30
StartupBros wants to merge 1 commit intoalexknowshtml:mainfrom
StartupBros:feat/video-podcast-transcription

Conversation

@StartupBros
Copy link
Copy Markdown

@StartupBros StartupBros commented Mar 25, 2026

Summary

Add tiered transcript extraction to the fetch phase for video and podcast bookmarks:

  1. yt-dlp captions — downloads existing subtitles, no audio processing needed
  2. Whisper fallback — extracts audio and transcribes locally when no captions exist
  3. Graceful placeholderstatus: needs_transcript when tools aren't installed

Both yt-dlp and Whisper are optional — zero new required dependencies.

Architecture

Full transcripts are stored as separate files (.state/transcripts/{id}.txt) instead of being inlined in the pending JSON. A 215K char transcript (1-hour talk) takes 89 bytes in the JSON. During processing, Claude reads only the first ~20K characters for summarization. Files are cleaned up after processing.

Changes

  • src/processor.js — 6 new exports: findYtDlp, findWhisper, parseJson3Transcript, parseVttTranscript, fetchTranscriptContent, plus podcast URL classification fix and transcript file storage
  • src/config.js — Config keys: ytdlpPath, whisperPath, whisperModel, transcribeTimeouts
  • .claude/commands/process-bookmarks.mdtranscribe action reads transcript files, creates rich knowledge files with key takeaways
  • README.md — Transcription docs, install instructions, config reference
  • test/ — 23 new tests, 2 new fixtures

Testing

76 tests pass, 0 failures. Covers tool detection, subtitle parsing (json3 + VTT), transcript extraction, podcast URL classification, and config defaults/overrides. Integration verified with YouTube, Vimeo, and SoundCloud content.

Add tiered transcript extraction to the fetch phase:
1. yt-dlp captions (fast, no audio processing)
2. Whisper audio transcription (fallback when no captions)
3. Graceful placeholder (when tools not installed)

Full transcripts are stored as separate files in .state/transcripts/
to keep the pending JSON small — a 215K char transcript takes 89
bytes in JSON. Processing reads only the first 20K chars needed
for summarization.

New exports: findYtDlp, findWhisper, parseJson3Transcript,
parseVttTranscript, fetchTranscriptContent

Also fixes podcast URL classification (was falling through to
article type, fetching useless SPA HTML).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@StartupBros StartupBros force-pushed the feat/video-podcast-transcription branch from 84cc82e to 291d70f Compare March 26, 2026 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant