Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .claude/commands/contribute.md
Original file line number Diff line number Diff line change
Expand Up @@ -334,12 +334,12 @@ Use `/record-demo` command or Playwright scripts

### Voiceover
```bash
python tools/voiceover.py --script VOICEOVER-SCRIPT.md --output public/audio/voiceover.mp3
uv run tools/voiceover.py --script VOICEOVER-SCRIPT.md --output public/audio/voiceover.mp3
```

### Background music
```bash
python tools/music.py --prompt "subtle tech ambient" --duration 180 --output public/audio/background-music.mp3
uv run tools/music.py --prompt "subtle tech ambient" --duration 180 --output public/audio/background-music.mp3
```
```

Expand Down
2 changes: 1 addition & 1 deletion .claude/commands/generate-voiceover.md
Original file line number Diff line number Diff line change
Expand Up @@ -317,7 +317,7 @@ Share these tips with the user:

**Qwen3-TTS:**
- If `RUNPOD_API_KEY` is missing, tell user to add it to `.env`
- If `RUNPOD_QWEN3_TTS_ENDPOINT_ID` is missing, tell user to run `python tools/qwen3_tts.py --setup`
- If `RUNPOD_QWEN3_TTS_ENDPOINT_ID` is missing, tell user to run `uv run tools/qwen3_tts.py --setup`

**Both:**
- If script file not found, offer to create a template
Expand Down
6 changes: 3 additions & 3 deletions .claude/commands/redub.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ Accept default or enter custom path:
Run the redub tool:

```bash
source .venv/bin/activate && python tools/redub.py \
source .venv/bin/activate && uv run tools/redub.py \
--input "INPUT_PATH" \
--voice-id "VOICE_ID" \
--output "OUTPUT_PATH" \
Expand All @@ -95,7 +95,7 @@ source .venv/bin/activate && python tools/redub.py \
For transcript review workflow:
```bash
# Step 1: Transcribe only
source .venv/bin/activate && python tools/redub.py \
source .venv/bin/activate && uv run tools/redub.py \
--input "INPUT_PATH" \
--voice-id "VOICE_ID" \
--output "OUTPUT_PATH" \
Expand All @@ -104,7 +104,7 @@ source .venv/bin/activate && python tools/redub.py \

# Show transcript to user, let them edit
# Step 2: After approval, run with edited transcript
source .venv/bin/activate && python tools/redub.py \
source .venv/bin/activate && uv run tools/redub.py \
--input "INPUT_PATH" \
--voice-id "VOICE_ID" \
--output "OUTPUT_PATH" \
Expand Down
70 changes: 35 additions & 35 deletions .claude/commands/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,10 @@ On invocation, assess current state and adapt:
```
1. Check .env exists — if not, create from .env.example
2. Read current .env values (which keys are set vs placeholder)
3. Check prerequisites: node --version, python3 --version, ffmpeg -version
4. Check pip packages: python3 -c "import dotenv; import requests"
5. Check Modal CLI: modal --version (if installed)
6. Check for existing Modal apps: modal app list (if authenticated)
3. Check prerequisites: node --version, uv --version, ffmpeg -version
4. Check Python deps: uv run python -c "import dotenv; import requests"
5. Check Modal CLI: uv run modal --version (if installed)
6. Check for existing Modal apps: uv run modal app list (if authenticated)
7. Summarize what's ready vs what needs setup
```

Expand All @@ -42,7 +42,7 @@ Prerequisites:
[check] Node.js 20.x
[check] Python 3.12
[check] FFmpeg 7.1
[check] pip packages installed
[check] Python deps installed (uv)

Cloud GPU: Not configured
File transfer: Not configured (using free fallback services)
Expand Down Expand Up @@ -79,8 +79,8 @@ Check and report. Don't install anything automatically — just tell the user wh

### Recommended

- **Python 3.9+**: `python3 --version`. If missing: "Install from https://python.org/ — needed for AI voiceover, image editing, and all cloud GPU tools"
- **pip packages**: `python3 -c "import dotenv; import requests"`. If missing: guide through `pip install -r tools/requirements.txt` (or venv setup)
- **uv**: `uv --version`. If missing: "Install with `curl -LsSf https://astral.sh/uv/install.sh | sh` (macOS/Linux) or `powershell -c \"irm https://astral.sh/uv/install.ps1 | iex\"` (Windows) — manages Python and all toolkit dependencies; needed for AI voiceover, image editing, and all cloud GPU tools"
- **Python deps**: `uv run python -c "import dotenv; import requests"`. If missing: run `uv sync` from the toolkit root (uv installs a compatible Python automatically if needed)
- **FFmpeg**: `ffmpeg -version`. If missing: "Install with `brew install ffmpeg` (macOS) or see https://ffmpeg.org/ — needed for media conversion"

### Output
Expand Down Expand Up @@ -119,15 +119,15 @@ Frame the pitch:
- All 7 toolkit tools typically cost $0.50-2.00/month with normal use
- Faster cold starts than RunPod
- Scale to zero — no charges when idle
- Simple deployment: `modal deploy docker/modal-xxx/app.py`
- Simple deployment: `uv run modal deploy docker/modal-xxx/app.py`

Setup flow:
```
1. pip install modal
2. python3 -m modal setup
1. uv sync --extra modal
2. uv run modal setup
→ Opens browser for authentication
→ Creates ~/.modal.toml with credentials
3. Verify: modal app list
3. Verify: uv run modal app list
```

### Option B: RunPod
Expand Down Expand Up @@ -208,7 +208,7 @@ R2_BUCKET_NAME=video-toolkit

Test R2 connectivity:
```bash
python3 -c "
uv run python -c "
import sys; sys.path.insert(0, 'tools')
from file_transfer import upload_to_r2, delete_from_r2
import tempfile, os
Expand Down Expand Up @@ -274,17 +274,17 @@ Recommend "all" — with Modal's free tier, there's no cost to having them deplo

### Modal Deployment Flow

For each selected tool, run `modal deploy` and capture the endpoint URL:
For each selected tool, run `uv run modal deploy` and capture the endpoint URL:

```bash
# Deploy each app and capture the URL from output
modal deploy docker/modal-qwen3-tts/app.py
modal deploy docker/modal-flux2/app.py
modal deploy docker/modal-image-edit/app.py
modal deploy docker/modal-upscale/app.py
modal deploy docker/modal-music-gen/app.py
modal deploy docker/modal-sadtalker/app.py
modal deploy docker/modal-propainter/app.py
uv run modal deploy docker/modal-qwen3-tts/app.py
uv run modal deploy docker/modal-flux2/app.py
uv run modal deploy docker/modal-image-edit/app.py
uv run modal deploy docker/modal-upscale/app.py
uv run modal deploy docker/modal-music-gen/app.py
uv run modal deploy docker/modal-sadtalker/app.py
uv run modal deploy docker/modal-propainter/app.py
```

After each deploy, Modal prints the endpoint URL. Parse it and save to .env:
Expand All @@ -301,13 +301,13 @@ MODAL_FLUX2_ENDPOINT_URL=https://username--video-toolkit-flux2-...modal.run
For each selected tool, run the `--setup` command:

```bash
python3 tools/qwen3_tts.py --setup
python3 tools/flux2.py --setup
python3 tools/image_edit.py --setup
python3 tools/upscale.py --setup
python3 tools/music_gen.py --setup
python3 tools/sadtalker.py --setup
python3 tools/dewatermark.py --setup
uv run tools/qwen3_tts.py --setup
uv run tools/flux2.py --setup
uv run tools/image_edit.py --setup
uv run tools/upscale.py --setup
uv run tools/music_gen.py --setup
uv run tools/sadtalker.py --setup
uv run tools/dewatermark.py --setup
```

Each `--setup` command creates a RunPod template + endpoint and saves the endpoint ID to .env automatically.
Expand All @@ -318,7 +318,7 @@ After deployment, run a quick test for at least one tool to verify the pipeline

**If Qwen3-TTS was deployed (most common):**
```bash
python3 tools/qwen3_tts.py --text "Setup complete! Your video toolkit is ready." \
uv run tools/qwen3_tts.py --text "Setup complete! Your video toolkit is ready." \
--speaker Ryan --tone warm --output /tmp/setup-test.mp3 \
--cloud modal
```
Expand All @@ -327,7 +327,7 @@ Check that it produces an audio file. If it does, the full pipeline (upload →

**If FLUX.2 was deployed:**
```bash
python3 tools/flux2.py --prompt "A minimal geometric logo on dark background" \
uv run tools/flux2.py --prompt "A minimal geometric logo on dark background" \
--output /tmp/setup-test.png --cloud modal
```

Expand Down Expand Up @@ -356,7 +356,7 @@ Qwen3-TTS is ready! Available speakers:
Default speaker: Ryan (warm male voice)

You can change the speaker per-video or set a default in your brand's voice.json.
To preview voices: python3 tools/qwen3_tts.py --list-voices
To preview voices: uv run tools/qwen3_tts.py --list-voices
```

### ElevenLabs Setup (Optional)
Expand Down Expand Up @@ -405,7 +405,7 @@ Prerequisites:
[check] Node.js 20.x
[check] Python 3.12
[check] FFmpeg 7.1
[check] pip packages
[check] Python deps (uv)

Cloud GPU: Modal
[check] Speech (Qwen3-TTS) — deployed
Expand Down Expand Up @@ -473,7 +473,7 @@ lines = Path('.env').read_text().splitlines()

## Error Handling

- If `modal deploy` fails: show the error, suggest checking `modal app logs`, offer to retry
- If `modal deploy` fails: show the error, suggest checking `uv run modal app logs`, offer to retry
- If R2 test fails: re-check credentials, common issue is wrong bucket name or region
- If RunPod setup fails: check API key, check account has billing enabled
- If any step fails, don't block subsequent steps — mark as failed and continue
Expand All @@ -487,13 +487,13 @@ Use `tools/verify_setup.py` throughout and at the end of setup:

```bash
# Quick check (no cloud calls) — use at start to detect current state
python3 tools/verify_setup.py
uv run tools/verify_setup.py

# With smoke tests (makes cloud GPU calls, ~$0.01) — use at end to verify
python3 tools/verify_setup.py --test
uv run tools/verify_setup.py --test

# Machine-readable — use to programmatically check what's configured
python3 tools/verify_setup.py --json
uv run tools/verify_setup.py --json
```

Run `verify_setup.py --json` at the start of `/setup` to detect current state and skip already-configured phases. Run it with `--test` at the end for the Phase 6 verification.
Expand Down
4 changes: 2 additions & 2 deletions .claude/commands/video.md
Original file line number Diff line number Diff line change
Expand Up @@ -371,8 +371,8 @@ npm run studio # Preview in browser
npm run render # Final render
```

(For concept-explainer-short use instead: `python3 gen_vo.py`,
`python3 gen_captions.py`, `python3 build.py` — output: `out/short.mp4`.)
(For concept-explainer-short use instead: `uv run gen_vo.py`,
`uv run gen_captions.py`, `uv run build.py` — output: `out/short.mp4`.)

## Session History

Expand Down
8 changes: 4 additions & 4 deletions .claude/commands/voice-clone.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Verify the environment is ready:
1. Check .env for RUNPOD_API_KEY
- If missing: "Add `RUNPOD_API_KEY=your_key` to `.env`"
2. Check .env for RUNPOD_QWEN3_TTS_ENDPOINT_ID
- If missing and API key exists: offer to run `python3 tools/qwen3_tts.py --setup`
- If missing and API key exists: offer to run `uv run tools/qwen3_tts.py --setup`
- If API key also missing: guide user to add it first
3. Only proceed once both are confirmed
```
Expand Down Expand Up @@ -103,7 +103,7 @@ The transcript must match what was actually said — this is critical for clone
Generate a test clip using the reference audio:

```bash
python3 tools/qwen3_tts.py \
uv run tools/qwen3_tts.py \
--text "This is a test of the cloned voice. It should sound natural and similar to the original recording." \
--ref-audio brands/{name}/assets/voice-reference.{ext} \
--ref-text "TRANSCRIPT_HERE" \
Expand Down Expand Up @@ -174,10 +174,10 @@ Voice clone saved to: brands/{name}/voice.json
Usage:

# Per-scene voiceover with cloned voice
python3 tools/voiceover.py --provider qwen3 --brand {name} --scene-dir public/audio/scenes --json
uv run tools/voiceover.py --provider qwen3 --brand {name} --scene-dir public/audio/scenes --json

# Single file
python3 tools/voiceover.py --provider qwen3 --brand {name} --script script.txt --output out.mp3
uv run tools/voiceover.py --provider qwen3 --brand {name} --script script.txt --output out.mp3

# In /generate-voiceover, select Qwen3-TTS — the clone profile will be detected automatically.

Expand Down
42 changes: 21 additions & 21 deletions .claude/skills/acestep/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,47 +20,47 @@ echo "ACEMUSIC_API_KEY=your_key" >> .env
# Get key at https://acemusic.ai/api-key

# Self-hosted (optional fallback)
python tools/music_gen.py --setup # RunPod
modal deploy docker/modal-music-gen/app.py # Modal
uv run tools/music_gen.py --setup # RunPod
uv run modal deploy docker/modal-music-gen/app.py # Modal
```

## Quick Reference

```bash
# Basic generation (uses acemusic XL Turbo by default)
python tools/music_gen.py --prompt "Upbeat tech corporate" --duration 60 --output bg.mp3
uv run tools/music_gen.py --prompt "Upbeat tech corporate" --duration 60 --output bg.mp3

# Generate 4 variations, pick the best
python tools/music_gen.py --prompt "Calm ambient piano" --duration 30 --variations 4 --output ambient.mp3
uv run tools/music_gen.py --prompt "Calm ambient piano" --duration 30 --variations 4 --output ambient.mp3

# Fast mode (disable thinking)
python tools/music_gen.py --no-thinking --prompt "Quick draft" --duration 30 --output draft.mp3
uv run tools/music_gen.py --no-thinking --prompt "Quick draft" --duration 30 --output draft.mp3

# With musical control
python tools/music_gen.py --prompt "Calm ambient piano" --duration 30 --bpm 72 --key "D Major" --output ambient.mp3
uv run tools/music_gen.py --prompt "Calm ambient piano" --duration 30 --bpm 72 --key "D Major" --output ambient.mp3

# Scene presets (video production)
python tools/music_gen.py --preset corporate-bg --duration 60 --output bg.mp3
python tools/music_gen.py --preset tension --duration 20 --output problem.mp3
python tools/music_gen.py --preset cta --brand digital-samba --duration 15 --output cta.mp3
uv run tools/music_gen.py --preset corporate-bg --duration 60 --output bg.mp3
uv run tools/music_gen.py --preset tension --duration 20 --output problem.mp3
uv run tools/music_gen.py --preset cta --brand digital-samba --duration 15 --output cta.mp3

# Vocals with lyrics
python tools/music_gen.py --prompt "Indie pop jingle" --lyrics "[verse]\nBuild it better\nShip it faster" --duration 30 --output jingle.mp3
uv run tools/music_gen.py --prompt "Indie pop jingle" --lyrics "[verse]\nBuild it better\nShip it faster" --duration 30 --output jingle.mp3

# Cover / style transfer
python tools/music_gen.py --cover --reference theme.mp3 --prompt "Jazz piano version" --duration 60 --output jazz_cover.mp3
uv run tools/music_gen.py --cover --reference theme.mp3 --prompt "Jazz piano version" --duration 60 --output jazz_cover.mp3

# Repaint a weak section
python tools/music_gen.py --repaint --input track.mp3 --repaint-start 15 --repaint-end 25 --prompt "Guitar solo" --output fixed.mp3
uv run tools/music_gen.py --repaint --input track.mp3 --repaint-start 15 --repaint-end 25 --prompt "Guitar solo" --output fixed.mp3

# Continue from existing audio
python tools/music_gen.py --continuation --input track.mp3 --prompt "Continue with jazz piano" --output extended.mp3
uv run tools/music_gen.py --continuation --input track.mp3 --prompt "Continue with jazz piano" --output extended.mp3

# Stem extraction
python tools/music_gen.py --extract vocals --input mixed.mp3 --output vocals.mp3
uv run tools/music_gen.py --extract vocals --input mixed.mp3 --output vocals.mp3

# Fall back to self-hosted
python tools/music_gen.py --cloud modal --prompt "Background music" --duration 60 --output bg.mp3
uv run tools/music_gen.py --cloud modal --prompt "Background music" --duration 60 --output bg.mp3
```

## Fixing "Samey" Output
Expand All @@ -79,7 +79,7 @@ If generated music sounds repetitive or lacks variety, try these in order:

### 1. Instrumental background track (simplest)
```bash
python tools/music_gen.py --prompt "Upbeat indie rock, driving drums, jangly guitar" --duration 60 --bpm 120 --key "G Major" --output track.mp3
uv run tools/music_gen.py --prompt "Upbeat indie rock, driving drums, jangly guitar" --duration 60 --bpm 120 --key "G Major" --output track.mp3
```

### 2. Song with vocals and lyrics
Expand Down Expand Up @@ -117,7 +117,7 @@ That's what it's about
LYRICS

# Generate the song
python tools/music_gen.py \
uv run tools/music_gen.py \
--prompt "Upbeat indie rock anthem, male vocal, driving drums, electric guitar, studio polish" \
--lyrics "$(cat /tmp/lyrics.txt)" \
--duration 60 \
Expand All @@ -129,12 +129,12 @@ python tools/music_gen.py \
### 3. Repaint a weak section
If the chorus sounds weak, regenerate just that section:
```bash
python tools/music_gen.py --repaint --input my_song.mp3 --repaint-start 20 --repaint-end 35 --prompt "Powerful anthemic chorus, big drums" --output fixed.mp3
uv run tools/music_gen.py --repaint --input my_song.mp3 --repaint-start 20 --repaint-end 35 --prompt "Powerful anthemic chorus, big drums" --output fixed.mp3
```

### 4. Continue/extend a track
```bash
python tools/music_gen.py --continuation --input my_song.mp3 --prompt "Continue with gentle acoustic outro" --output extended.mp3
uv run tools/music_gen.py --continuation --input my_song.mp3 --prompt "Continue with gentle acoustic outro" --output extended.mp3
```

### Key tips for good results
Expand Down Expand Up @@ -215,13 +215,13 @@ Tracks: `vocals`, `drums`, `bass`, `guitar`, `piano`, `keyboard`, `strings`, `br
### repainting (acemusic only)
Regenerate a specific time segment within existing audio while preserving the rest.
```bash
python tools/music_gen.py --repaint --input track.mp3 --repaint-start 15 --repaint-end 25 --prompt "Guitar solo" --output fixed.mp3
uv run tools/music_gen.py --repaint --input track.mp3 --repaint-start 15 --repaint-end 25 --prompt "Guitar solo" --output fixed.mp3
```

### continuation (acemusic only)
Extend existing audio by continuing from where it ends.
```bash
python tools/music_gen.py --continuation --input track.mp3 --prompt "Continue with jazz piano" --output extended.mp3
uv run tools/music_gen.py --continuation --input track.mp3 --prompt "Continue with jazz piano" --output extended.mp3
```

## Prompt Engineering
Expand Down
2 changes: 1 addition & 1 deletion .claude/skills/elevenlabs/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ Use the toolkit's voiceover tool to generate audio for each scene:

```bash
# Generate voiceover files for each scene
python tools/voiceover.py --scene-dir public/audio/scenes --json
uv run tools/voiceover.py --scene-dir public/audio/scenes --json

# Output:
# public/audio/scenes/
Expand Down
Loading