Podcast automation pipeline that turns raw recordings into publish-ready shorts, longform video, and an RSS podcast feed. Supports single-camera or multi-camera setups with external multi-track audio (Zoom H6E or similar).
Cascade runs a 14-agent pipeline:
- Ingest — Copy media from SD card(s) to SSD, validate with ffprobe, sync external audio
- Stitch — Concatenate clips via ffmpeg stream-copy
- Audio Analysis — Detect true stereo vs identical/mono channels
- Speaker Cut — Segment speakers via per-channel RMS energy (supports N-speaker multi-track)
- Transcribe — Deepgram Nova-3 with diarization + SRT generation
- Clip Miner — Claude identifies top 10 short-form candidates
- Longform Render — 16:9 speaker-cropped video with hardware encoding
- Shorts Render — 9:16 shorts with burned-in subtitles
- Metadata Gen — Per-platform titles, descriptions, hashtags, schedule
- Thumbnail Gen — AI-generated caricature artwork via OpenAI
- QA — Validate all outputs (durations, file sizes, formats)
- Podcast Feed — Extract audio, generate RSS, upload to Cloudflare R2
- Publish — Distribute to YouTube, TikTok, Instagram, and more
- Backup — rsync episode to external HDD
Agents run in parallel where possible (transcribe runs alongside audio analysis + speaker cut).
- Python 3.11+
- ffmpeg with libass (for subtitle burning) —
brew install ffmpegorbrew install homebrew-ffmpeg/ffmpeg/ffmpeg --with-libass - uv (recommended) —
brew install uv
git clone https://github.com/saml212/cascade.git && cd cascade
cp config/config.example.toml config/config.toml # Edit paths & podcast info
cp .env.example .env # Fill in your API keys (see below)
./start.sh # Creates venv, installs deps, opens UI
Or manually:
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp config/config.example.toml config/config.toml # Edit paths & podcast info
cp .env.example .env # Fill in API keys
| Key | Required | Purpose |
|---|---|---|
ANTHROPIC_API_KEY |
Yes | Claude — clip mining, metadata generation, chat |
DEEPGRAM_API_KEY |
Yes | Nova-3 transcription + speaker diarization |
OPENAI_API_KEY |
No | Thumbnail generation (caricature artwork) |
YOUTUBE_CLIENT_ID |
No | YouTube publishing |
YOUTUBE_CLIENT_SECRET |
No | YouTube publishing |
TIKTOK_CLIENT_KEY |
No | TikTok publishing |
TIKTOK_CLIENT_SECRET |
No | TikTok publishing |
INSTAGRAM_ACCESS_TOKEN |
No | Instagram publishing |
FACEBOOK_PAGE_ID |
No | Instagram publishing |
CLOUDFLARE_ACCOUNT_ID |
No | Podcast RSS feed (R2 storage) |
CLOUDFLARE_API_TOKEN |
No | Podcast RSS feed (R2 storage) |
UPLOAD_POST_API_KEY |
No | Upload-Post publishing |
UPLOAD_POST_USER |
No | Upload-Post publishing |
Only ANTHROPIC_API_KEY and DEEPGRAM_API_KEY are required for the core pipeline (ingest through QA). Publishing and RSS keys are only needed for those specific agents.
# Full pipeline from SD card
python -m agents --source-path "/path/to/media/"
# Specific agents only
python -m agents --source-path "/path/to/media/" --agents ingest stitch audio_analysis
# With a custom episode ID
python -m agents --source-path "/path/to/media/" --episode-id ep_2026-02-19_120000
./start.sh
# Opens http://localhost:8420 automatically
The web UI lets you review clips, approve/reject them, trim boundaries, chat with the AI about your episode, and trigger pipeline runs.
cascade/
├── agents/ # 14 pipeline agents (DAG-parallel execution)
│ ├── base.py # BaseAgent ABC (timing, logging, JSON I/O, config helpers)
│ ├── pipeline.py # DAG orchestrator with dependency-aware parallelism
│ ├── ingest.py → stitch.py → audio_analysis.py → speaker_cut.py
│ ├── transcribe.py (runs parallel to audio_analysis + speaker_cut)
│ ├── clip_miner.py → shorts_render.py + metadata_gen.py (parallel)
│ ├── longform_render.py (starts when speaker_cut + transcribe finish)
│ ├── thumbnail_gen.py → qa.py → podcast_feed.py → publish.py → backup.py
│ └── ...
├── lib/ # Shared utilities
│ ├── encoding.py # VideoToolbox / libx264 encoder selection + LUT support
│ ├── ffprobe.py # ffprobe wrapper
│ ├── audio_mix.py # Multi-track audio mixing with per-track volume control
│ ├── paths.py # Path resolution (external drive fallback)
│ ├── clips.py # Clip normalization
│ └── srt.py # SRT generation, parsing, and ffmpeg escaping
├── server/ # FastAPI app (port 8420)
│ ├── app.py # Entry point + static files
│ └── routes/ # API endpoints (episodes, clips, pipeline, chat, trim, etc.)
├── frontend/ # Vanilla JS SPA for clip review + chat + audio mix panel
├── config/ # config.toml — all settings
├── tests/ # pytest + Jest test suites
└── start.sh # One-command setup + launch
By default, Cascade stores everything locally in ./episodes/ and ./work/. This works out of the box with no external drives.
For large episodes (multi-GB source files), you can point to an external SSD by editing config/config.toml:
[paths]
output_dir = "~/cascade/episodes"
work_dir = "~/cascade/work"
backup_dir = "~/cascade/backup"
If an external drive path is configured but the volume isn't mounted, Cascade automatically falls back to local storage.
All settings live in config/config.toml. Key sections:
[paths]— Output directory, work directory, backup drive (local fallback if drive missing)[processing]— CRF, resolution, clip duration limits, hardware acceleration[transcription]— Deepgram model, language, diarization settings[clip_mining]— LLM model, temperature, clip count[schedule]— Shorts posting cadence, peak days, timezone[platforms.*]— Per-platform publishing settings[podcast]— RSS feed metadata (title, author, artwork)[podcast.links]— Link-in-bio page URLs (see below)
Cascade includes a built-in link-in-bio page generator. Fill in the [podcast.links] section of your config with your platform URLs, then generate the static HTML:
python -m links.generate
This produces links/index.html — a single-file, dark-themed page with your podcast artwork, platform links, and an embedded Spotify player. Deploy it to Cloudflare Pages, GitHub Pages, Netlify, or any static host.
Supported platforms: Spotify, Apple Podcasts, YouTube, Instagram, X, TikTok, iHeartRadio, GitHub. Empty URLs are automatically excluded.
| Service | Cost |
|---|---|
| Deepgram transcription | ~$0.50 |
| Claude clip mining | ~$0.10-0.30 |
| Claude metadata | ~$0.05-0.10 |
MIT — see LICENSE.