diff --git a/.claude/commands/contribute.md b/.claude/commands/contribute.md index 6334a18..c59b1cc 100644 --- a/.claude/commands/contribute.md +++ b/.claude/commands/contribute.md @@ -334,12 +334,12 @@ Use `/record-demo` command or Playwright scripts ### Voiceover ```bash -python tools/voiceover.py --script VOICEOVER-SCRIPT.md --output public/audio/voiceover.mp3 +uv run tools/voiceover.py --script VOICEOVER-SCRIPT.md --output public/audio/voiceover.mp3 ``` ### Background music ```bash -python tools/music.py --prompt "subtle tech ambient" --duration 180 --output public/audio/background-music.mp3 +uv run tools/music.py --prompt "subtle tech ambient" --duration 180 --output public/audio/background-music.mp3 ``` ``` diff --git a/.claude/commands/generate-voiceover.md b/.claude/commands/generate-voiceover.md index 13b9159..3caace9 100644 --- a/.claude/commands/generate-voiceover.md +++ b/.claude/commands/generate-voiceover.md @@ -317,7 +317,7 @@ Share these tips with the user: **Qwen3-TTS:** - If `RUNPOD_API_KEY` is missing, tell user to add it to `.env` -- If `RUNPOD_QWEN3_TTS_ENDPOINT_ID` is missing, tell user to run `python tools/qwen3_tts.py --setup` +- If `RUNPOD_QWEN3_TTS_ENDPOINT_ID` is missing, tell user to run `uv run tools/qwen3_tts.py --setup` **Both:** - If script file not found, offer to create a template diff --git a/.claude/commands/redub.md b/.claude/commands/redub.md index b132e31..149e2ea 100644 --- a/.claude/commands/redub.md +++ b/.claude/commands/redub.md @@ -85,7 +85,7 @@ Accept default or enter custom path: Run the redub tool: ```bash -source .venv/bin/activate && python tools/redub.py \ +source .venv/bin/activate && uv run tools/redub.py \ --input "INPUT_PATH" \ --voice-id "VOICE_ID" \ --output "OUTPUT_PATH" \ @@ -95,7 +95,7 @@ source .venv/bin/activate && python tools/redub.py \ For transcript review workflow: ```bash # Step 1: Transcribe only -source .venv/bin/activate && python tools/redub.py \ +source .venv/bin/activate && uv run tools/redub.py \ --input "INPUT_PATH" \ --voice-id "VOICE_ID" \ --output "OUTPUT_PATH" \ @@ -104,7 +104,7 @@ source .venv/bin/activate && python tools/redub.py \ # Show transcript to user, let them edit # Step 2: After approval, run with edited transcript -source .venv/bin/activate && python tools/redub.py \ +source .venv/bin/activate && uv run tools/redub.py \ --input "INPUT_PATH" \ --voice-id "VOICE_ID" \ --output "OUTPUT_PATH" \ diff --git a/.claude/commands/setup.md b/.claude/commands/setup.md index 6bf926c..93e495c 100644 --- a/.claude/commands/setup.md +++ b/.claude/commands/setup.md @@ -24,10 +24,10 @@ On invocation, assess current state and adapt: ``` 1. Check .env exists — if not, create from .env.example 2. Read current .env values (which keys are set vs placeholder) -3. Check prerequisites: node --version, python3 --version, ffmpeg -version -4. Check pip packages: python3 -c "import dotenv; import requests" -5. Check Modal CLI: modal --version (if installed) -6. Check for existing Modal apps: modal app list (if authenticated) +3. Check prerequisites: node --version, uv --version, ffmpeg -version +4. Check Python deps: uv run python -c "import dotenv; import requests" +5. Check Modal CLI: uv run modal --version (if installed) +6. Check for existing Modal apps: uv run modal app list (if authenticated) 7. Summarize what's ready vs what needs setup ``` @@ -42,7 +42,7 @@ Prerequisites: [check] Node.js 20.x [check] Python 3.12 [check] FFmpeg 7.1 - [check] pip packages installed + [check] Python deps installed (uv) Cloud GPU: Not configured File transfer: Not configured (using free fallback services) @@ -79,8 +79,8 @@ Check and report. Don't install anything automatically — just tell the user wh ### Recommended -- **Python 3.9+**: `python3 --version`. If missing: "Install from https://python.org/ — needed for AI voiceover, image editing, and all cloud GPU tools" -- **pip packages**: `python3 -c "import dotenv; import requests"`. If missing: guide through `pip install -r tools/requirements.txt` (or venv setup) +- **uv**: `uv --version`. If missing: "Install with `curl -LsSf https://astral.sh/uv/install.sh | sh` (macOS/Linux) or `powershell -c \"irm https://astral.sh/uv/install.ps1 | iex\"` (Windows) — manages Python and all toolkit dependencies; needed for AI voiceover, image editing, and all cloud GPU tools" +- **Python deps**: `uv run python -c "import dotenv; import requests"`. If missing: run `uv sync` from the toolkit root (uv installs a compatible Python automatically if needed) - **FFmpeg**: `ffmpeg -version`. If missing: "Install with `brew install ffmpeg` (macOS) or see https://ffmpeg.org/ — needed for media conversion" ### Output @@ -119,15 +119,15 @@ Frame the pitch: - All 7 toolkit tools typically cost $0.50-2.00/month with normal use - Faster cold starts than RunPod - Scale to zero — no charges when idle -- Simple deployment: `modal deploy docker/modal-xxx/app.py` +- Simple deployment: `uv run modal deploy docker/modal-xxx/app.py` Setup flow: ``` -1. pip install modal -2. python3 -m modal setup +1. uv sync --extra modal +2. uv run modal setup → Opens browser for authentication → Creates ~/.modal.toml with credentials -3. Verify: modal app list +3. Verify: uv run modal app list ``` ### Option B: RunPod @@ -208,7 +208,7 @@ R2_BUCKET_NAME=video-toolkit Test R2 connectivity: ```bash -python3 -c " +uv run python -c " import sys; sys.path.insert(0, 'tools') from file_transfer import upload_to_r2, delete_from_r2 import tempfile, os @@ -274,17 +274,17 @@ Recommend "all" — with Modal's free tier, there's no cost to having them deplo ### Modal Deployment Flow -For each selected tool, run `modal deploy` and capture the endpoint URL: +For each selected tool, run `uv run modal deploy` and capture the endpoint URL: ```bash # Deploy each app and capture the URL from output -modal deploy docker/modal-qwen3-tts/app.py -modal deploy docker/modal-flux2/app.py -modal deploy docker/modal-image-edit/app.py -modal deploy docker/modal-upscale/app.py -modal deploy docker/modal-music-gen/app.py -modal deploy docker/modal-sadtalker/app.py -modal deploy docker/modal-propainter/app.py +uv run modal deploy docker/modal-qwen3-tts/app.py +uv run modal deploy docker/modal-flux2/app.py +uv run modal deploy docker/modal-image-edit/app.py +uv run modal deploy docker/modal-upscale/app.py +uv run modal deploy docker/modal-music-gen/app.py +uv run modal deploy docker/modal-sadtalker/app.py +uv run modal deploy docker/modal-propainter/app.py ``` After each deploy, Modal prints the endpoint URL. Parse it and save to .env: @@ -301,13 +301,13 @@ MODAL_FLUX2_ENDPOINT_URL=https://username--video-toolkit-flux2-...modal.run For each selected tool, run the `--setup` command: ```bash -python3 tools/qwen3_tts.py --setup -python3 tools/flux2.py --setup -python3 tools/image_edit.py --setup -python3 tools/upscale.py --setup -python3 tools/music_gen.py --setup -python3 tools/sadtalker.py --setup -python3 tools/dewatermark.py --setup +uv run tools/qwen3_tts.py --setup +uv run tools/flux2.py --setup +uv run tools/image_edit.py --setup +uv run tools/upscale.py --setup +uv run tools/music_gen.py --setup +uv run tools/sadtalker.py --setup +uv run tools/dewatermark.py --setup ``` Each `--setup` command creates a RunPod template + endpoint and saves the endpoint ID to .env automatically. @@ -318,7 +318,7 @@ After deployment, run a quick test for at least one tool to verify the pipeline **If Qwen3-TTS was deployed (most common):** ```bash -python3 tools/qwen3_tts.py --text "Setup complete! Your video toolkit is ready." \ +uv run tools/qwen3_tts.py --text "Setup complete! Your video toolkit is ready." \ --speaker Ryan --tone warm --output /tmp/setup-test.mp3 \ --cloud modal ``` @@ -327,7 +327,7 @@ Check that it produces an audio file. If it does, the full pipeline (upload → **If FLUX.2 was deployed:** ```bash -python3 tools/flux2.py --prompt "A minimal geometric logo on dark background" \ +uv run tools/flux2.py --prompt "A minimal geometric logo on dark background" \ --output /tmp/setup-test.png --cloud modal ``` @@ -356,7 +356,7 @@ Qwen3-TTS is ready! Available speakers: Default speaker: Ryan (warm male voice) You can change the speaker per-video or set a default in your brand's voice.json. -To preview voices: python3 tools/qwen3_tts.py --list-voices +To preview voices: uv run tools/qwen3_tts.py --list-voices ``` ### ElevenLabs Setup (Optional) @@ -405,7 +405,7 @@ Prerequisites: [check] Node.js 20.x [check] Python 3.12 [check] FFmpeg 7.1 - [check] pip packages + [check] Python deps (uv) Cloud GPU: Modal [check] Speech (Qwen3-TTS) — deployed @@ -473,7 +473,7 @@ lines = Path('.env').read_text().splitlines() ## Error Handling -- If `modal deploy` fails: show the error, suggest checking `modal app logs`, offer to retry +- If `modal deploy` fails: show the error, suggest checking `uv run modal app logs`, offer to retry - If R2 test fails: re-check credentials, common issue is wrong bucket name or region - If RunPod setup fails: check API key, check account has billing enabled - If any step fails, don't block subsequent steps — mark as failed and continue @@ -487,13 +487,13 @@ Use `tools/verify_setup.py` throughout and at the end of setup: ```bash # Quick check (no cloud calls) — use at start to detect current state -python3 tools/verify_setup.py +uv run tools/verify_setup.py # With smoke tests (makes cloud GPU calls, ~$0.01) — use at end to verify -python3 tools/verify_setup.py --test +uv run tools/verify_setup.py --test # Machine-readable — use to programmatically check what's configured -python3 tools/verify_setup.py --json +uv run tools/verify_setup.py --json ``` Run `verify_setup.py --json` at the start of `/setup` to detect current state and skip already-configured phases. Run it with `--test` at the end for the Phase 6 verification. diff --git a/.claude/commands/video.md b/.claude/commands/video.md index 6dcddc1..e7e5de0 100644 --- a/.claude/commands/video.md +++ b/.claude/commands/video.md @@ -371,8 +371,8 @@ npm run studio # Preview in browser npm run render # Final render ``` -(For concept-explainer-short use instead: `python3 gen_vo.py`, -`python3 gen_captions.py`, `python3 build.py` — output: `out/short.mp4`.) +(For concept-explainer-short use instead: `uv run gen_vo.py`, +`uv run gen_captions.py`, `uv run build.py` — output: `out/short.mp4`.) ## Session History diff --git a/.claude/commands/voice-clone.md b/.claude/commands/voice-clone.md index 423c532..7a5da50 100644 --- a/.claude/commands/voice-clone.md +++ b/.claude/commands/voice-clone.md @@ -14,7 +14,7 @@ Verify the environment is ready: 1. Check .env for RUNPOD_API_KEY - If missing: "Add `RUNPOD_API_KEY=your_key` to `.env`" 2. Check .env for RUNPOD_QWEN3_TTS_ENDPOINT_ID - - If missing and API key exists: offer to run `python3 tools/qwen3_tts.py --setup` + - If missing and API key exists: offer to run `uv run tools/qwen3_tts.py --setup` - If API key also missing: guide user to add it first 3. Only proceed once both are confirmed ``` @@ -103,7 +103,7 @@ The transcript must match what was actually said — this is critical for clone Generate a test clip using the reference audio: ```bash -python3 tools/qwen3_tts.py \ +uv run tools/qwen3_tts.py \ --text "This is a test of the cloned voice. It should sound natural and similar to the original recording." \ --ref-audio brands/{name}/assets/voice-reference.{ext} \ --ref-text "TRANSCRIPT_HERE" \ @@ -174,10 +174,10 @@ Voice clone saved to: brands/{name}/voice.json Usage: # Per-scene voiceover with cloned voice - python3 tools/voiceover.py --provider qwen3 --brand {name} --scene-dir public/audio/scenes --json + uv run tools/voiceover.py --provider qwen3 --brand {name} --scene-dir public/audio/scenes --json # Single file - python3 tools/voiceover.py --provider qwen3 --brand {name} --script script.txt --output out.mp3 + uv run tools/voiceover.py --provider qwen3 --brand {name} --script script.txt --output out.mp3 # In /generate-voiceover, select Qwen3-TTS — the clone profile will be detected automatically. diff --git a/.claude/skills/acestep/SKILL.md b/.claude/skills/acestep/SKILL.md index 27308d2..387553c 100644 --- a/.claude/skills/acestep/SKILL.md +++ b/.claude/skills/acestep/SKILL.md @@ -20,47 +20,47 @@ echo "ACEMUSIC_API_KEY=your_key" >> .env # Get key at https://acemusic.ai/api-key # Self-hosted (optional fallback) -python tools/music_gen.py --setup # RunPod -modal deploy docker/modal-music-gen/app.py # Modal +uv run tools/music_gen.py --setup # RunPod +uv run modal deploy docker/modal-music-gen/app.py # Modal ``` ## Quick Reference ```bash # Basic generation (uses acemusic XL Turbo by default) -python tools/music_gen.py --prompt "Upbeat tech corporate" --duration 60 --output bg.mp3 +uv run tools/music_gen.py --prompt "Upbeat tech corporate" --duration 60 --output bg.mp3 # Generate 4 variations, pick the best -python tools/music_gen.py --prompt "Calm ambient piano" --duration 30 --variations 4 --output ambient.mp3 +uv run tools/music_gen.py --prompt "Calm ambient piano" --duration 30 --variations 4 --output ambient.mp3 # Fast mode (disable thinking) -python tools/music_gen.py --no-thinking --prompt "Quick draft" --duration 30 --output draft.mp3 +uv run tools/music_gen.py --no-thinking --prompt "Quick draft" --duration 30 --output draft.mp3 # With musical control -python tools/music_gen.py --prompt "Calm ambient piano" --duration 30 --bpm 72 --key "D Major" --output ambient.mp3 +uv run tools/music_gen.py --prompt "Calm ambient piano" --duration 30 --bpm 72 --key "D Major" --output ambient.mp3 # Scene presets (video production) -python tools/music_gen.py --preset corporate-bg --duration 60 --output bg.mp3 -python tools/music_gen.py --preset tension --duration 20 --output problem.mp3 -python tools/music_gen.py --preset cta --brand digital-samba --duration 15 --output cta.mp3 +uv run tools/music_gen.py --preset corporate-bg --duration 60 --output bg.mp3 +uv run tools/music_gen.py --preset tension --duration 20 --output problem.mp3 +uv run tools/music_gen.py --preset cta --brand digital-samba --duration 15 --output cta.mp3 # Vocals with lyrics -python tools/music_gen.py --prompt "Indie pop jingle" --lyrics "[verse]\nBuild it better\nShip it faster" --duration 30 --output jingle.mp3 +uv run tools/music_gen.py --prompt "Indie pop jingle" --lyrics "[verse]\nBuild it better\nShip it faster" --duration 30 --output jingle.mp3 # Cover / style transfer -python tools/music_gen.py --cover --reference theme.mp3 --prompt "Jazz piano version" --duration 60 --output jazz_cover.mp3 +uv run tools/music_gen.py --cover --reference theme.mp3 --prompt "Jazz piano version" --duration 60 --output jazz_cover.mp3 # Repaint a weak section -python tools/music_gen.py --repaint --input track.mp3 --repaint-start 15 --repaint-end 25 --prompt "Guitar solo" --output fixed.mp3 +uv run tools/music_gen.py --repaint --input track.mp3 --repaint-start 15 --repaint-end 25 --prompt "Guitar solo" --output fixed.mp3 # Continue from existing audio -python tools/music_gen.py --continuation --input track.mp3 --prompt "Continue with jazz piano" --output extended.mp3 +uv run tools/music_gen.py --continuation --input track.mp3 --prompt "Continue with jazz piano" --output extended.mp3 # Stem extraction -python tools/music_gen.py --extract vocals --input mixed.mp3 --output vocals.mp3 +uv run tools/music_gen.py --extract vocals --input mixed.mp3 --output vocals.mp3 # Fall back to self-hosted -python tools/music_gen.py --cloud modal --prompt "Background music" --duration 60 --output bg.mp3 +uv run tools/music_gen.py --cloud modal --prompt "Background music" --duration 60 --output bg.mp3 ``` ## Fixing "Samey" Output @@ -79,7 +79,7 @@ If generated music sounds repetitive or lacks variety, try these in order: ### 1. Instrumental background track (simplest) ```bash -python tools/music_gen.py --prompt "Upbeat indie rock, driving drums, jangly guitar" --duration 60 --bpm 120 --key "G Major" --output track.mp3 +uv run tools/music_gen.py --prompt "Upbeat indie rock, driving drums, jangly guitar" --duration 60 --bpm 120 --key "G Major" --output track.mp3 ``` ### 2. Song with vocals and lyrics @@ -117,7 +117,7 @@ That's what it's about LYRICS # Generate the song -python tools/music_gen.py \ +uv run tools/music_gen.py \ --prompt "Upbeat indie rock anthem, male vocal, driving drums, electric guitar, studio polish" \ --lyrics "$(cat /tmp/lyrics.txt)" \ --duration 60 \ @@ -129,12 +129,12 @@ python tools/music_gen.py \ ### 3. Repaint a weak section If the chorus sounds weak, regenerate just that section: ```bash -python tools/music_gen.py --repaint --input my_song.mp3 --repaint-start 20 --repaint-end 35 --prompt "Powerful anthemic chorus, big drums" --output fixed.mp3 +uv run tools/music_gen.py --repaint --input my_song.mp3 --repaint-start 20 --repaint-end 35 --prompt "Powerful anthemic chorus, big drums" --output fixed.mp3 ``` ### 4. Continue/extend a track ```bash -python tools/music_gen.py --continuation --input my_song.mp3 --prompt "Continue with gentle acoustic outro" --output extended.mp3 +uv run tools/music_gen.py --continuation --input my_song.mp3 --prompt "Continue with gentle acoustic outro" --output extended.mp3 ``` ### Key tips for good results @@ -215,13 +215,13 @@ Tracks: `vocals`, `drums`, `bass`, `guitar`, `piano`, `keyboard`, `strings`, `br ### repainting (acemusic only) Regenerate a specific time segment within existing audio while preserving the rest. ```bash -python tools/music_gen.py --repaint --input track.mp3 --repaint-start 15 --repaint-end 25 --prompt "Guitar solo" --output fixed.mp3 +uv run tools/music_gen.py --repaint --input track.mp3 --repaint-start 15 --repaint-end 25 --prompt "Guitar solo" --output fixed.mp3 ``` ### continuation (acemusic only) Extend existing audio by continuing from where it ends. ```bash -python tools/music_gen.py --continuation --input track.mp3 --prompt "Continue with jazz piano" --output extended.mp3 +uv run tools/music_gen.py --continuation --input track.mp3 --prompt "Continue with jazz piano" --output extended.mp3 ``` ## Prompt Engineering diff --git a/.claude/skills/elevenlabs/SKILL.md b/.claude/skills/elevenlabs/SKILL.md index cc9ddce..a356255 100644 --- a/.claude/skills/elevenlabs/SKILL.md +++ b/.claude/skills/elevenlabs/SKILL.md @@ -154,7 +154,7 @@ Use the toolkit's voiceover tool to generate audio for each scene: ```bash # Generate voiceover files for each scene -python tools/voiceover.py --scene-dir public/audio/scenes --json +uv run tools/voiceover.py --scene-dir public/audio/scenes --json # Output: # public/audio/scenes/ diff --git a/.claude/skills/ideogram4/SKILL.md b/.claude/skills/ideogram4/SKILL.md index 97a9243..e84454e 100644 --- a/.claude/skills/ideogram4/SKILL.md +++ b/.claude/skills/ideogram4/SKILL.md @@ -58,19 +58,19 @@ Worked title-card / thumbnail / quote-card examples are in **`examples.md`**. ```bash # Hand-authored JSON caption (the recommended path for text/layout) — Claude writes caption.json -python3 tools/ideogram4.py --json caption.json --output title.png +uv run tools/ideogram4.py --json caption.json --output title.png # Caption from stdin (Claude can pipe it directly) -cat caption.json | python3 tools/ideogram4.py --json - --output title.png +cat caption.json | uv run tools/ideogram4.py --json - --output title.png # Plain prompt — Ideogram's server-side magic prompt expands it (weaker; prefer --json) -python3 tools/ideogram4.py --prompt "Title card: 'AI ENGINEERING REVIEW' bold white on dark" --output title.png +uv run tools/ideogram4.py --prompt "Title card: 'AI ENGINEERING REVIEW' bold white on dark" --output title.png # Inject brand hex colors into the caption's palette (JSON mode) -python3 tools/ideogram4.py --json caption.json --brand digital-samba --output cta.png +uv run tools/ideogram4.py --json caption.json --brand digital-samba --output cta.png # Quality tier + resolution -python3 tools/ideogram4.py --json caption.json --speed QUALITY --resolution 2048x2048 --output slide.png +uv run tools/ideogram4.py --json caption.json --speed QUALITY --resolution 2048x2048 --output slide.png ``` ## Key Files diff --git a/.claude/skills/ltx2/SKILL.md b/.claude/skills/ltx2/SKILL.md index 08172f2..c9c00f4 100644 --- a/.claude/skills/ltx2/SKILL.md +++ b/.claude/skills/ltx2/SKILL.md @@ -12,19 +12,19 @@ Runs on Modal (A100-80GB). Requires `MODAL_LTX2_ENDPOINT_URL` in `.env`. ```bash # Text-to-video -python3 tools/ltx2.py --prompt "A sunset over the ocean, golden light on waves, cinematic" --output sunset.mp4 +uv run tools/ltx2.py --prompt "A sunset over the ocean, golden light on waves, cinematic" --output sunset.mp4 # Image-to-video (animate a still image) -python3 tools/ltx2.py --prompt "Gentle camera drift, soft ambient motion" --input photo.jpg --output animated.mp4 +uv run tools/ltx2.py --prompt "Gentle camera drift, soft ambient motion" --input photo.jpg --output animated.mp4 # Custom resolution and duration -python3 tools/ltx2.py --prompt "..." --width 1024 --height 576 --num-frames 161 --output wide.mp4 +uv run tools/ltx2.py --prompt "..." --width 1024 --height 576 --num-frames 161 --output wide.mp4 # Fast mode (fewer steps, quicker) -python3 tools/ltx2.py --prompt "..." --quality fast --output quick.mp4 +uv run tools/ltx2.py --prompt "..." --quality fast --output quick.mp4 # Reproducible output -python3 tools/ltx2.py --prompt "..." --seed 42 --output reproducible.mp4 +uv run tools/ltx2.py --prompt "..." --seed 42 --output reproducible.mp4 ``` ## Parameters @@ -54,7 +54,7 @@ Base: LTX-2.3 22B, trained by [@lovis93](https://huggingface.co/lovis93/crt-anim ```bash # Trigger word is auto-prepended — write the prompt normally -python3 tools/ltx2.py --lora crt-terminal \ +uv run tools/ltx2.py --lora crt-terminal \ --prompt "a terminal typing out \"\\$ claude --continue\" character by character in glowing green pixel font, scanlines, phosphor glow, low choppy frame rate, hacker mood" \ --output crt_claude.mp4 ``` @@ -128,27 +128,27 @@ Keep prompts under 200 words. Be specific about the scene. ### B-Roll Clips Generate atmospheric 5s shots for cutaways between narrated scenes: ```bash -python3 tools/ltx2.py --prompt "Futuristic holographic interface, glowing data visualizations, clean workspace, cinematic" --output broll_tech.mp4 -python3 tools/ltx2.py --prompt "Aerial view of European city at golden hour, modern architecture" --output broll_europe.mp4 +uv run tools/ltx2.py --prompt "Futuristic holographic interface, glowing data visualizations, clean workspace, cinematic" --output broll_tech.mp4 +uv run tools/ltx2.py --prompt "Aerial view of European city at golden hour, modern architecture" --output broll_europe.mp4 ``` ### Animated Slide Backgrounds Feed a slide screenshot and add subtle motion: ```bash -python3 tools/ltx2.py --prompt "Gentle particle effects, soft ambient light shifts, very slight camera drift" --input slide.png --output animated_slide.mp4 +uv run tools/ltx2.py --prompt "Gentle particle effects, soft ambient light shifts, very slight camera drift" --input slide.png --output animated_slide.mp4 ``` ### Animated Portraits Bring still headshots to life: ```bash -python3 tools/ltx2.py --prompt "Subtle natural head movement, warm expression, professional lighting" --input headshot.png --output animated_portrait.mp4 +uv run tools/ltx2.py --prompt "Subtle natural head movement, warm expression, professional lighting" --input headshot.png --output animated_portrait.mp4 ``` ### Stylized Character Cameo (SadTalker Alternative) For non-realistic faces — fantasy characters, masked figures, heavy beards, helmets, illustrations — SadTalker often produces uncanny or broken lip sync because it's trained on photoreal humans. LTX-2 image-to-video is frequently a better choice when **lip-sync precision isn't critical** (the viewer's brain fills in the gap as long as something is moving). Prompt for *motion + atmosphere*, not phonemes: ```bash -python3 tools/ltx2.py \ +uv run tools/ltx2.py \ --input character_portrait.png \ --prompt "Ancient warrior speaks slowly with gravitas, beard shifts subtly, glowing aura pulses, embers drift past, slow head movement, cinematic close-up, mystical atmosphere" \ --width 768 --height 768 \ @@ -170,7 +170,7 @@ python3 tools/ltx2.py \ ### Branded Intro/Outro Generate abstract motion backgrounds for title cards: ```bash -python3 tools/ltx2.py --prompt "Dark moody background with flowing blue and coral light streaks, bokeh particles, cinematic tech atmosphere, no text" --output intro_bg.mp4 +uv run tools/ltx2.py --prompt "Dark moody background with flowing blue and coral light streaks, bokeh particles, cinematic tech atmosphere, no text" --output intro_bg.mp4 ``` ### Combining with Other Tools @@ -207,16 +207,16 @@ LTX-2 generates raw clips. Combine with the rest of the toolkit: ```bash # 1. Create Modal secret for HuggingFace (one-time) -modal secret create huggingface-token HF_TOKEN=hf_your_token +uv run modal secret create huggingface-token HF_TOKEN=hf_your_token # 2. Deploy (downloads ~55GB of weights, takes ~10 min) -modal deploy docker/modal-ltx2/app.py +uv run modal deploy docker/modal-ltx2/app.py # 3. Save endpoint URL to .env echo "MODAL_LTX2_ENDPOINT_URL=https://yourname--video-toolkit-ltx2-ltx2-generate.modal.run" >> .env # 4. Test -python3 tools/ltx2.py --prompt "A candle flickering on a dark table, cinematic" --output test.mp4 +uv run tools/ltx2.py --prompt "A candle flickering on a dark table, cinematic" --output test.mp4 ``` **Important:** HuggingFace token needs read-access scope. Accept the [Gemma 3 license](https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-unquantized) before deploying. Unauthenticated downloads are severely rate-limited. diff --git a/.claude/skills/moviepy/SKILL.md b/.claude/skills/moviepy/SKILL.md index 5e42897..f2b65c7 100644 --- a/.claude/skills/moviepy/SKILL.md +++ b/.claude/skills/moviepy/SKILL.md @@ -24,9 +24,9 @@ Two runnable references for everything in this skill live in `examples/`: - **`examples/quick-spot/build.py`** — 15-second ad-style spot. Audio-anchored timeline, text overlay, optional VO + ducked music. Renders silent out of the box with zero external assets. - **`examples/data-viz-chart/build.py`** — animated time-series chart with deterministic title and source attribution. Demonstrates the matplotlib (data) + moviepy (trustworthy text) split. -Both run with `python3 build.py` and produce a real `out.mp4` immediately. Read them alongside this skill — every pattern below is shown working there. +Both run with `uv run build.py` and produce a real `out.mp4` immediately. Read them alongside this skill — every pattern below is shown working there. -**Dependencies.** `moviepy`, `Pillow`, and `matplotlib` are declared in `tools/requirements.txt` and installed with the toolkit's one-line Python setup: `python3 -m pip install -r tools/requirements.txt`. If you hit `Missing dependency` when running an example, run that command from the repo root — the examples' `build.py` files will tell you the same thing in their error message and exit cleanly rather than printing a bare traceback. +**Dependencies.** `moviepy`, `Pillow`, and `matplotlib` are declared in the root `pyproject.toml` and installed with the toolkit's one-line Python setup: `uv sync`. If you hit `Missing dependency` when running an example, run that command from the repo root — the examples' `build.py` files will tell you the same thing in their error message and exit cleanly rather than printing a bare traceback. ## The main use case: text on AI-generated video diff --git a/.claude/skills/qwen-edit/SKILL.md b/.claude/skills/qwen-edit/SKILL.md index 078a463..b75bde0 100644 --- a/.claude/skills/qwen-edit/SKILL.md +++ b/.claude/skills/qwen-edit/SKILL.md @@ -36,24 +36,24 @@ Use when the user wants to: ```bash # Basic edit -python tools/image_edit.py --input photo.jpg --prompt "Add sunglasses" +uv run tools/image_edit.py --input photo.jpg --prompt "Add sunglasses" # With negative prompt (recommended) -python tools/image_edit.py --input photo.jpg \ +uv run tools/image_edit.py --input photo.jpg \ --prompt "Reframe as portrait with full head visible" \ --negative "blur, distortion, artifacts" # Style transfer -python tools/image_edit.py --input photo.jpg --style cyberpunk +uv run tools/image_edit.py --input photo.jpg --style cyberpunk # Background (use cautiously - often fails) -python tools/image_edit.py --input photo.jpg --background office +uv run tools/image_edit.py --input photo.jpg --background office # Higher quality -python tools/image_edit.py --input photo.jpg --prompt "..." --steps 16 --guidance 3.0 +uv run tools/image_edit.py --input photo.jpg --prompt "..." --steps 16 --guidance 3.0 # Multi-image composite (identity-preserving) -python tools/image_edit.py --input person.jpg background.jpg \ +uv run tools/image_edit.py --input person.jpg background.jpg \ --prompt "The [ethnicity] [gender] with [hair description] from first image is now in [scene] from second image. Same [features], [outfit]." \ --negative "different ethnicity, different hair color, different face shape, generic stock photo" \ --steps 16 --guidance 2.0 diff --git a/.claude/skills/qwen-edit/parameters.md b/.claude/skills/qwen-edit/parameters.md index 40552cd..32831a4 100644 --- a/.claude/skills/qwen-edit/parameters.md +++ b/.claude/skills/qwen-edit/parameters.md @@ -69,10 +69,10 @@ Use for reproducibility when iterating on prompts. ```bash # First attempt -python tools/image_edit.py --input photo.jpg --prompt "..." --seed 12345 +uv run tools/image_edit.py --input photo.jpg --prompt "..." --seed 12345 # Same seed, different prompt - compare results -python tools/image_edit.py --input photo.jpg --prompt "..." --seed 12345 +uv run tools/image_edit.py --input photo.jpg --prompt "..." --seed 12345 ``` ## Cost vs Quality Tradeoffs diff --git a/.claude/skills/runpod/SKILL.md b/.claude/skills/runpod/SKILL.md index 2b136e2..837fb51 100644 --- a/.claude/skills/runpod/SKILL.md +++ b/.claude/skills/runpod/SKILL.md @@ -15,11 +15,11 @@ Run open-source AI models on cloud GPUs via RunPod serverless. Pay-per-second, n echo "RUNPOD_API_KEY=your_key_here" >> .env # 3. Deploy any tool with --setup -python tools/image_edit.py --setup -python tools/upscale.py --setup -python tools/dewatermark.py --setup -python tools/sadtalker.py --setup -python tools/qwen3_tts.py --setup +uv run tools/image_edit.py --setup +uv run tools/upscale.py --setup +uv run tools/dewatermark.py --setup +uv run tools/sadtalker.py --setup +uv run tools/qwen3_tts.py --setup ``` Each `--setup` command: diff --git a/.env.example b/.env.example index e1269f6..4e0f1f0 100644 --- a/.env.example +++ b/.env.example @@ -21,8 +21,8 @@ # --- Cloud GPU: Modal (recommended) --- # $30/month free compute on Starter plan (just add a payment method) -# Setup: pip install modal && python3 -m modal setup -# Deploy: modal deploy docker/modal-{tool}/app.py +# Setup: uv sync --extra modal && uv run modal setup +# Deploy: uv run modal deploy docker/modal-{tool}/app.py # Endpoint URLs are printed after each deploy — paste them here. # MODAL_QWEN3_TTS_ENDPOINT_URL=https://yourname--video-toolkit-qwen3-tts-....modal.run # MODAL_FLUX2_ENDPOINT_URL=https://yourname--video-toolkit-flux2-....modal.run diff --git a/.python-version b/.python-version new file mode 100644 index 0000000..e4fba21 --- /dev/null +++ b/.python-version @@ -0,0 +1 @@ +3.12 diff --git a/AGENTS.md b/AGENTS.md index 18214d2..7997b14 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -9,7 +9,7 @@ For full toolkit guidance — workflow, conventions, timing rules, design patter If you're using this repo from Codex, run the migration script once to install the toolkit's skills under `~/.codex/skills/`: ```bash -python3 scripts/migrate_to_codex.py --force +uv run scripts/migrate_to_codex.py --force ``` That installs 25 entries (11 toolkit skills + 13 command wrappers + 1 overview). The script can also sync the full `CLAUDE.md` content into a generated block at the end of this file — re-run `--force` after editing `CLAUDE.md` to keep the synced block fresh. Manual content above the generated block (i.e. everything you're reading now) is preserved. @@ -17,7 +17,7 @@ That installs 25 entries (11 toolkit skills + 13 command wrappers + 1 overview). To uninstall: ```bash -python3 scripts/migrate_to_codex.py --reset +uv run scripts/migrate_to_codex.py --reset ``` See [README.md § Using with Codex](./README.md#using-with-codex) for details. diff --git a/CLAUDE.md b/CLAUDE.md index 567d2c8..74a5368 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -82,7 +82,7 @@ Config-driven sprint review videos with theme system, config-driven content (`sp Marketing/product demo videos with dark tech aesthetic, scene-based composition (title, problem, solution, demo, stats, CTA), animated background, Narrator PiP, browser/terminal chrome, and stats cards with spring animations. ### concept-explainer-short -9:16 vertical concept-explainer shorts (TikTok/Reels/YouTube Shorts). **Python/moviepy, not Remotion** — the whole video derives from `scenes.json` (per-scene narration + visual asset). Pipeline: `gen_vo.py` (per-scene TTS via voiceover.py, clone or built-in, `--max-wpm` pacing clamp) → `gen_captions.py` (whisper word timing force-aligned to script text, needs `pip install openai-whisper`) → `build.py` (audio-anchored composite: Ken Burns on stills, boomerang-looped clips, burned karaoke caption pills, ducked music). Renders at every stage — placeholder cards before assets, silent before audio. Visuals follow the FLUX/Ideogram/LTX split: Ideogram cards at `1440x2560` for anything text-bearing, LTX b-roll at `576x1024` for motion. +9:16 vertical concept-explainer shorts (TikTok/Reels/YouTube Shorts). **Python/moviepy, not Remotion** — the whole video derives from `scenes.json` (per-scene narration + visual asset). Pipeline: `gen_vo.py` (per-scene TTS via voiceover.py, clone or built-in, `--max-wpm` pacing clamp) → `gen_captions.py` (whisper word timing force-aligned to script text, needs `uv sync --extra whisper`) → `build.py` (audio-anchored composite: Ken Burns on stills, boomerang-looped clips, burned karaoke caption pills, ducked music). Renders at every stage — placeholder cards before assets, silent before audio. Visuals follow the FLUX/Ideogram/LTX split: Ideogram cards at `1440x2560` for anything text-bearing, LTX b-roll at `576x1024` for motion. ## Brand Profiles @@ -110,13 +110,17 @@ import { AnimatedBackground, SlideTransition, Label } from '../../../../lib/comp Audio, video, and image tools in `tools/`. See registry `tools` section for the full catalog with descriptions, options, presets, and env vars. Every tool supports `--help`. ```bash -# Setup -pip install -r tools/requirements.txt +# Setup (one-time) — uv creates .venv/ and installs all locked dependencies +uv sync +uv sync --extra whisper # optional: karaoke captions (heavy, pulls in torch) +uv sync --extra modal # optional: Modal CLI for self-hosted cloud GPU ``` -**Important: always invoke tools from the toolkit root directory.** When working inside a project (`projects/my-video/`), tool paths like `python3 tools/upscale.py` will fail because `tools/` is relative. Always use: +Run every Python tool through the project environment with `uv run` — it auto-syncs the venv, so no activation is needed. + +**Important: always invoke tools from the toolkit root directory.** When working inside a project (`projects/my-video/`), tool paths like `uv run tools/upscale.py` will fail because `tools/` is relative. Always use: ```bash -cd /path/to/claude-code-video-toolkit && python3 tools/upscale.py ... +cd /path/to/claude-code-video-toolkit && uv run tools/upscale.py ... ``` This is especially critical for background commands where the working directory may not be obvious. @@ -134,34 +138,34 @@ Utility tools work on any video file without requiring a project structure. ```bash # Per-scene generation (recommended) -python tools/voiceover.py --scene-dir public/audio/scenes --json +uv run tools/voiceover.py --scene-dir public/audio/scenes --json # Using Qwen3-TTS (self-hosted, free alternative to ElevenLabs) -python tools/voiceover.py --provider qwen3 --tone warm --scene-dir public/audio/scenes --json +uv run tools/voiceover.py --provider qwen3 --tone warm --scene-dir public/audio/scenes --json # Single file (legacy) -python tools/voiceover.py --script SCRIPT.md --output out.mp3 +uv run tools/voiceover.py --script SCRIPT.md --output out.mp3 ``` ### Timing Sync (after voiceover) ```bash -python3 tools/sync_timing.py # Dry run comparison -python3 tools/sync_timing.py --apply # Update config (1s default padding) -python3 tools/sync_timing.py --apply --padding 1.5 # Custom padding -python3 tools/sync_timing.py --voiceover-json vo.json # Use voiceover.py output -python3 tools/sync_timing.py --json # Machine-readable output +uv run tools/sync_timing.py # Dry run comparison +uv run tools/sync_timing.py --apply # Update config (1s default padding) +uv run tools/sync_timing.py --apply --padding 1.5 # Custom padding +uv run tools/sync_timing.py --voiceover-json vo.json # Use voiceover.py output +uv run tools/sync_timing.py --json # Machine-readable output ``` ### Qwen3-TTS (Standalone) ```bash -python tools/qwen3_tts.py --text "Hello world" --speaker Ryan --output hello.mp3 -python tools/qwen3_tts.py --text "Hello world" --tone warm --output hello.mp3 -python tools/qwen3_tts.py --text "Hello" --instruct "Speak enthusiastically" --output excited.mp3 -python tools/qwen3_tts.py --text "Hello" --ref-audio sample.wav --ref-text "transcript" --output cloned.mp3 -python tools/qwen3_tts.py --list-voices # 9 speakers: Ryan, Aiden, Vivian, etc. -python tools/qwen3_tts.py --list-tones # neutral, warm, professional, excited, etc. +uv run tools/qwen3_tts.py --text "Hello world" --speaker Ryan --output hello.mp3 +uv run tools/qwen3_tts.py --text "Hello world" --tone warm --output hello.mp3 +uv run tools/qwen3_tts.py --text "Hello" --instruct "Speak enthusiastically" --output excited.mp3 +uv run tools/qwen3_tts.py --text "Hello" --ref-audio sample.wav --ref-text "transcript" --output cloned.mp3 +uv run tools/qwen3_tts.py --list-voices # 9 speakers: Ryan, Aiden, Vivian, etc. +uv run tools/qwen3_tts.py --list-tones # neutral, warm, professional, excited, etc. ``` Temperature controls expressiveness: `--temperature 1.2` (more expressive) or `--temperature 0.4` (more consistent). @@ -173,15 +177,15 @@ All cloud GPU tools support two providers via `--cloud runpod|modal`. RunPod is ```bash # --- RunPod setup (automated, one-time per tool) --- echo "RUNPOD_API_KEY=your_key_here" >> .env -python tools/image_edit.py --setup -python tools/upscale.py --setup -python tools/qwen3_tts.py --setup -python tools/music_gen.py --setup +uv run tools/image_edit.py --setup +uv run tools/upscale.py --setup +uv run tools/qwen3_tts.py --setup +uv run tools/music_gen.py --setup # --- Modal setup (deploy each app you need) --- -pip install modal && python3 -m modal setup -modal deploy docker/modal-upscale/app.py # Then save URL to .env -modal deploy docker/modal-image-edit/app.py +uv sync --extra modal && uv run modal setup +uv run modal deploy docker/modal-upscale/app.py # Then save URL to .env +uv run modal deploy docker/modal-image-edit/app.py # See docs/modal-setup.md for full guide ``` @@ -192,12 +196,12 @@ The toolkit has **two** text-to-image generators. They barely overlap — the de ```bash # FLUX.2 — text-FREE backgrounds + image editing (self-hosted, free, Apache-2.0/commercial-OK) -python tools/flux2.py --preset title-bg --brand digital-samba # background for Remotion text overlay -python tools/flux2.py --prompt "Abstract tech background, no text" +uv run tools/flux2.py --preset title-bg --brand digital-samba # background for Remotion text overlay +uv run tools/flux2.py --prompt "Abstract tech background, no text" # Ideogram 4 — legible IN-IMAGE text + exact color/layout (hosted API, ~$0.03-0.09/img, commercial-OK) -python3 tools/ideogram4.py --json caption.json --output title.png # text baked into the image -python3 tools/ideogram4.py --prompt "Thumbnail: 'SHIP FASTER' bold" --output thumb.png +uv run tools/ideogram4.py --json caption.json --output title.png # text baked into the image +uv run tools/ideogram4.py --prompt "Thumbnail: 'SHIP FASTER' bold" --output thumb.png ``` **The key distinction — baked-in text vs. background-for-overlay:** @@ -228,15 +232,15 @@ format — Claude authors the caption as the "magic prompt" expander; needs `IDE ```bash # Image editing (Qwen-Image-Edit) -python tools/image_edit.py --input photo.jpg --prompt "Add sunglasses" -python tools/image_edit.py --input photo.jpg --prompt "Add sunglasses" --cloud modal -python tools/image_edit.py --input photo.jpg --style cyberpunk -python tools/image_edit.py --input photo.jpg --background office -python tools/image_edit.py --list-presets # Full preset list +uv run tools/image_edit.py --input photo.jpg --prompt "Add sunglasses" +uv run tools/image_edit.py --input photo.jpg --prompt "Add sunglasses" --cloud modal +uv run tools/image_edit.py --input photo.jpg --style cyberpunk +uv run tools/image_edit.py --input photo.jpg --background office +uv run tools/image_edit.py --list-presets # Full preset list # Upscaling (RealESRGAN) -python tools/upscale.py --input photo.jpg --output photo_4x.png --cloud runpod -python tools/upscale.py --input photo.jpg --scale 2 --model anime --face-enhance --cloud runpod +uv run tools/upscale.py --input photo.jpg --output photo_4x.png --cloud runpod +uv run tools/upscale.py --input photo.jpg --scale 2 --model anime --face-enhance --cloud runpod ``` See `docs/qwen-edit-patterns.md` and `.claude/skills/qwen-edit/` for prompting guidance. @@ -247,42 +251,42 @@ Default provider is **acemusic** (official cloud API, free key from [acemusic.ai ```bash # Background music (acemusic cloud API by default) -python tools/music_gen.py --prompt "Upbeat tech corporate" --duration 60 --bpm 128 --key "G Major" --output music.mp3 +uv run tools/music_gen.py --prompt "Upbeat tech corporate" --duration 60 --bpm 128 --key "G Major" --output music.mp3 # Generate 4 variations, pick the best -python tools/music_gen.py --prompt "Subtle corporate tech" --duration 60 --variations 4 --output bg.mp3 +uv run tools/music_gen.py --prompt "Subtle corporate tech" --duration 60 --variations 4 --output bg.mp3 # Fast mode (disable thinking) -python tools/music_gen.py --no-thinking --prompt "Quick draft" --duration 30 --output draft.mp3 +uv run tools/music_gen.py --no-thinking --prompt "Quick draft" --duration 30 --output draft.mp3 # Scene presets for video production -python tools/music_gen.py --preset corporate-bg --duration 60 --output bg.mp3 -python tools/music_gen.py --preset tension --duration 20 --output problem.mp3 -python tools/music_gen.py --preset cta --brand digital-samba --output cta.mp3 +uv run tools/music_gen.py --preset corporate-bg --duration 60 --output bg.mp3 +uv run tools/music_gen.py --preset tension --duration 20 --output problem.mp3 +uv run tools/music_gen.py --preset cta --brand digital-samba --output cta.mp3 # Song with vocals and lyrics (use structure tags for sections) -python tools/music_gen.py \ +uv run tools/music_gen.py \ --prompt "Indie pop anthem, male vocal, bright guitar, studio polish" \ --lyrics "[Verse]\nWalking through the morning light\nCoffee in my hand feels right\n\n[Chorus - anthemic]\nWE KEEP MOVING FORWARD\nThrough the noise and doubt\n\n[Outro - fade]\n(Moving forward...)" \ --duration 60 --bpm 128 --key "G Major" --output song.mp3 # Cover / style transfer -python tools/music_gen.py --cover --reference theme.mp3 --prompt "Jazz piano version" --output cover.mp3 +uv run tools/music_gen.py --cover --reference theme.mp3 --prompt "Jazz piano version" --output cover.mp3 # Repaint a weak section (acemusic only) -python tools/music_gen.py --repaint --input track.mp3 --repaint-start 15 --repaint-end 25 --prompt "Guitar solo" --output fixed.mp3 +uv run tools/music_gen.py --repaint --input track.mp3 --repaint-start 15 --repaint-end 25 --prompt "Guitar solo" --output fixed.mp3 # Continue from existing audio (acemusic only) -python tools/music_gen.py --continuation --input track.mp3 --prompt "Continue with jazz piano" --output extended.mp3 +uv run tools/music_gen.py --continuation --input track.mp3 --prompt "Continue with jazz piano" --output extended.mp3 # Stem extraction -python tools/music_gen.py --extract vocals --input mixed.mp3 --output vocals.mp3 +uv run tools/music_gen.py --extract vocals --input mixed.mp3 --output vocals.mp3 # Fall back to self-hosted -python tools/music_gen.py --cloud modal --prompt "Background music" --duration 60 --output bg.mp3 +uv run tools/music_gen.py --cloud modal --prompt "Background music" --duration 60 --output bg.mp3 # List presets -python tools/music_gen.py --list-presets +uv run tools/music_gen.py --list-presets ``` 8 scene presets: `corporate-bg`, `upbeat-tech`, `ambient`, `dramatic`, `tension`, `hopeful`, `cta`, `lofi`. See `.claude/skills/acestep/` for prompt engineering patterns and video production integration guide. @@ -291,12 +295,12 @@ python tools/music_gen.py --list-presets ```bash # Locate watermark coordinates -python tools/locate_watermark.py --input video.mp4 --grid --output-dir ./review/ -python tools/locate_watermark.py --input video.mp4 --preset notebooklm --verify +uv run tools/locate_watermark.py --input video.mp4 --grid --output-dir ./review/ +uv run tools/locate_watermark.py --input video.mp4 --preset notebooklm --verify # Remove watermark (RunPod) -python tools/dewatermark.py --input video.mp4 --region 1080,660,195,40 --output clean.mp4 --runpod -python tools/dewatermark.py --setup # One-time setup +uv run tools/dewatermark.py --input video.mp4 --region 1080,660,195,40 --output clean.mp4 --runpod +uv run tools/dewatermark.py --setup # One-time setup ``` **Workflow:** grid overlay → note coordinates → verify with `--region` → remove with dewatermark. @@ -307,11 +311,11 @@ python tools/dewatermark.py --setup # One-time setup ```bash # Basic usage -python tools/sadtalker.py --image portrait.png --audio voiceover.mp3 --output talking.mp4 +uv run tools/sadtalker.py --image portrait.png --audio voiceover.mp3 --output talking.mp4 # For NarratorPiP integration (recommended settings) # CRITICAL: --preprocess full preserves image dimensions (otherwise outputs square crop) -python tools/sadtalker.py \ +uv run tools/sadtalker.py \ --image presenter_16x9.png \ --audio voiceover.mp3 \ --preprocess full --still --expression-scale 0.8 \ @@ -330,7 +334,7 @@ See `docs/sadtalker.md` for detailed options and troubleshooting. ### Redub Sync Mode ```bash -python tools/redub.py --input video.mp4 --voice-id VOICE_ID --sync --output dubbed.mp4 +uv run tools/redub.py --input video.mp4 --voice-id VOICE_ID --sync --output dubbed.mp4 ``` The `--sync` flag enables word-level time remapping — essential when TTS voice pacing differs from original. Without it, audio can drift 3-4+ seconds by the end. @@ -342,7 +346,7 @@ The `--sync` flag enables word-level time remapping — essential when TTS voice Post-processes NotebookLM videos with custom branding. Solves the problem where redubbed TTS audio extends beyond the safe visual trim point. ```bash -python tools/notebooklm_brand.py \ +uv run tools/notebooklm_brand.py \ --input video_synced.mp4 \ --logo assets/logo.png \ --url "mysite.com" \ @@ -359,7 +363,7 @@ Trims NotebookLM visuals, keeps full audio, bridges with freeze frame, adds bran 4. **Scene review** - Run `/scene-review` to verify visuals in Remotion Studio 5. **Design refinement** - Use `/design` or the "Refine" option in scene-review to improve slide visuals 6. **Generate audio** - Use `/generate-voiceover` for AI narration -7. **Sync timing** - Run `python3 tools/sync_timing.py --apply` to update config durations +7. **Sync timing** - Run `uv run tools/sync_timing.py --apply` to update config durations 8. **Preview** - `npm run studio` in project directory 9. **Iterate** - Adjust timing, content, styling with Claude Code 10. **Render** - `npm run render` for final MP4 @@ -454,8 +458,8 @@ TTS engines do NOT consistently produce 150 WPM output. In practice: **The feedback loop after TTS generation:** 1. Generate per-scene audio files -2. Run `python3 tools/sync_timing.py` to compare actual vs config durations -3. Run `python3 tools/sync_timing.py --apply` to update config automatically +2. Run `uv run tools/sync_timing.py` to compare actual vs config durations +3. Run `uv run tools/sync_timing.py --apply` to update config automatically 4. For demo scenes: recalculate `playbackRate = rawDemoDuration / actualNarrationDuration` 5. Re-preview in Remotion Studio before rendering diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index c94bbe8..9465ccc 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -18,11 +18,9 @@ Thank you for your interest in contributing! This toolkit is designed to help pe ## Development Setup 1. Fork and clone the repository -2. Set up your environment: +2. Set up your environment with [uv](https://docs.astral.sh/uv/) (creates `.venv/` and installs all locked dependencies): ```bash - python -m venv .venv - source .venv/bin/activate - pip install -r tools/requirements.txt + uv sync ``` 3. Add your ElevenLabs API key to `.env` @@ -80,7 +78,7 @@ If your change affects Codex compatibility, also update: | What Changed | Update These Files | |--------------|-------------------| | Codex migration flow | `README.md` ("Using with Codex"), `docs/getting-started.md`, `scripts/migrate_to_codex.py` | -| Claude guidance source | `CLAUDE.md` and then re-run `python3 scripts/migrate_to_codex.py --force` to regenerate the Codex block in `AGENTS.md` | +| Claude guidance source | `CLAUDE.md` and then re-run `uv run scripts/migrate_to_codex.py --force` to regenerate the Codex block in `AGENTS.md` | | Generated resource list or warnings | `README.md` and `docs/getting-started.md` | **Quick verification:** After adding a command, grep for it across docs: diff --git a/README.md b/README.md index 9a4e13e..e1da573 100644 --- a/README.md +++ b/README.md @@ -24,10 +24,12 @@ More in the [showcase table](#templates) below. ```bash git clone https://github.com/digitalsamba/claude-code-video-toolkit.git cd claude-code-video-toolkit -python3 -m pip install -r tools/requirements.txt # Optional: AI voiceover, image gen, music, moviepy examples -claude # Open Claude Code in the toolkit +uv sync # Optional: AI voiceover, image gen, music, moviepy examples +claude # Open Claude Code in the toolkit ``` +> Python dependencies are managed with [uv](https://docs.astral.sh/uv/) — `uv sync` creates `.venv/` and installs everything from the lockfile in seconds. No uv yet? `curl -LsSf https://astral.sh/uv/install.sh | sh` (macOS/Linux) or `powershell -c "irm https://astral.sh/uv/install.ps1 | iex"` (Windows). + Then in Claude Code: ``` @@ -39,7 +41,7 @@ Then in Claude Code: **What's free:** The toolkit leans heavily on open-source AI models — voiceovers (Qwen3-TTS), image generation (FLUX.2), music (ACE-Step), and more. You deploy them to your own cloud GPU account and run them at cost. Cloudflare R2 has a generous free tier (10GB, zero egress), and Modal gives $30/month free compute on the Starter plan — more than enough for a few 5-minute videos a month. -**Requirements:** [Node.js](https://nodejs.org/) 18+ and [Claude Code](https://docs.anthropic.com/en/docs/claude-code). Python 3.9+ recommended for AI tools. FFmpeg optional. +**Requirements:** [Node.js](https://nodejs.org/) 18+ and [Claude Code](https://docs.anthropic.com/en/docs/claude-code). [uv](https://docs.astral.sh/uv/) recommended for the AI tools (it installs Python 3.10+ for you). FFmpeg optional. > **Want to skip setup and just render something?** > ```bash @@ -191,20 +193,20 @@ Audio, video, and image tools in `tools/`: ```bash # AI voiceover — ElevenLabs or self-hosted Qwen3-TTS (9 voices + cloning) -python tools/voiceover.py --provider qwen3 --speaker Ryan --scene-dir public/audio/scenes --json +uv run tools/voiceover.py --provider qwen3 --speaker Ryan --scene-dir public/audio/scenes --json # AI music (ACE-Step — free cloud API) -python tools/music_gen.py --preset corporate-bg --duration 120 --output music.mp3 +uv run tools/music_gen.py --preset corporate-bg --duration 120 --output music.mp3 # AI image generation (FLUX.2) and editing (Qwen-Image-Edit) -python tools/flux2.py --preset title-bg --brand digital-samba --cloud modal -python tools/image_edit.py --input photo.jpg --prompt "Add sunglasses" --cloud modal +uv run tools/flux2.py --preset title-bg --brand digital-samba --cloud modal +uv run tools/image_edit.py --input photo.jpg --prompt "Add sunglasses" --cloud modal # AI video generation (LTX-2.3 — text-to-video, image-to-video) -python tools/ltx2.py --prompt "A sunset over the ocean, cinematic" --cloud modal +uv run tools/ltx2.py --prompt "A sunset over the ocean, cinematic" --cloud modal # Talking head from a portrait + audio (SadTalker) -python tools/sadtalker.py --image portrait.png --audio voiceover.mp3 --output talking.mp4 --cloud modal +uv run tools/sadtalker.py --image portrait.png --audio voiceover.mp3 --output talking.mp4 --cloud modal ```
@@ -212,56 +214,56 @@ python tools/sadtalker.py --image portrait.png --audio voiceover.mp3 --output ta ```bash # Generate voiceover (ElevenLabs) -python tools/voiceover.py --script script.md --output voiceover.mp3 +uv run tools/voiceover.py --script script.md --output voiceover.mp3 # Generate voiceover (Qwen3-TTS — self-hosted, cheaper alternative) -python tools/voiceover.py --provider qwen3 --speaker Ryan --scene-dir public/audio/scenes --json -python tools/qwen3_tts.py --text "Hello world" --tone warm --output hello.mp3 +uv run tools/voiceover.py --provider qwen3 --speaker Ryan --scene-dir public/audio/scenes --json +uv run tools/qwen3_tts.py --text "Hello world" --tone warm --output hello.mp3 # Generate background music (ElevenLabs) -python tools/music.py --prompt "Upbeat corporate" --duration 120 --output music.mp3 +uv run tools/music.py --prompt "Upbeat corporate" --duration 120 --output music.mp3 # Generate background music (ACE-Step — free cloud API, XL Turbo 4B model) -python tools/music_gen.py --preset corporate-bg --duration 120 --output music.mp3 -python tools/music_gen.py --prompt "Dramatic cinematic" --duration 30 --bpm 90 --key "D Minor" --output reveal.mp3 -python tools/music_gen.py --prompt "Upbeat indie rock" --duration 60 --variations 4 --output intro.mp3 +uv run tools/music_gen.py --preset corporate-bg --duration 120 --output music.mp3 +uv run tools/music_gen.py --prompt "Dramatic cinematic" --duration 30 --bpm 90 --key "D Minor" --output reveal.mp3 +uv run tools/music_gen.py --prompt "Upbeat indie rock" --duration 60 --variations 4 --output intro.mp3 # Generate sound effects -python tools/sfx.py --preset whoosh --output sfx.mp3 +uv run tools/sfx.py --preset whoosh --output sfx.mp3 # Redub video with different voice -python tools/redub.py --input video.mp4 --voice-id VOICE_ID --output dubbed.mp4 +uv run tools/redub.py --input video.mp4 --voice-id VOICE_ID --output dubbed.mp4 # Add background music to existing video -python tools/addmusic.py --input video.mp4 --prompt "Subtle ambient" --output output.mp4 +uv run tools/addmusic.py --input video.mp4 --prompt "Subtle ambient" --output output.mp4 # Rebrand NotebookLM videos (trim outro, add your logo/URL) -python tools/notebooklm_brand.py --input video.mp4 --logo logo.png --url "mysite.com" --output branded.mp4 +uv run tools/notebooklm_brand.py --input video.mp4 --logo logo.png --url "mysite.com" --output branded.mp4 # AI image editing (style transfer, backgrounds, custom prompts) -python tools/image_edit.py --input photo.jpg --style cyberpunk --cloud modal -python tools/image_edit.py --input photo.jpg --prompt "Add sunglasses" --cloud modal +uv run tools/image_edit.py --input photo.jpg --style cyberpunk --cloud modal +uv run tools/image_edit.py --input photo.jpg --prompt "Add sunglasses" --cloud modal # AI image upscaling (2x/4x) -python tools/upscale.py --input photo.jpg --output photo_4x.png --cloud modal +uv run tools/upscale.py --input photo.jpg --output photo_4x.png --cloud modal # Remove watermarks (requires cloud GPU) -python tools/dewatermark.py --input video.mp4 --preset sora --output clean.mp4 --cloud modal +uv run tools/dewatermark.py --input video.mp4 --preset sora --output clean.mp4 --cloud modal # Locate watermark coordinates -python tools/locate_watermark.py --input video.mp4 --grid --output-dir ./review/ +uv run tools/locate_watermark.py --input video.mp4 --grid --output-dir ./review/ # Generate talking head video from image + audio (SadTalker) -python tools/sadtalker.py --image portrait.png --audio voiceover.mp3 --output talking.mp4 --cloud modal +uv run tools/sadtalker.py --image portrait.png --audio voiceover.mp3 --output talking.mp4 --cloud modal # AI image generation (FLUX.2 Klein 4B — text-to-image + editing) -python tools/flux2.py --prompt "A sunset over mountains" --cloud modal -python tools/flux2.py --preset title-bg --brand digital-samba --cloud modal -python tools/flux2.py --list-presets +uv run tools/flux2.py --prompt "A sunset over mountains" --cloud modal +uv run tools/flux2.py --preset title-bg --brand digital-samba --cloud modal +uv run tools/flux2.py --list-presets # AI video generation (LTX-2.3 22B — text-to-video + image-to-video) -python tools/ltx2.py --prompt "A sunset over the ocean, cinematic" --cloud modal -python tools/ltx2.py --prompt "Gentle camera drift" --input photo.jpg --cloud modal +uv run tools/ltx2.py --prompt "A sunset over the ocean, cinematic" --cloud modal +uv run tools/ltx2.py --prompt "Gentle camera drift" --input photo.jpg --cloud modal ```
@@ -291,7 +293,7 @@ python tools/ltx2.py --prompt "Gentle camera drift" --input photo.jpg --cloud mo **Modal (recommended):** Each tool deploys from `docker/modal-*/app.py` — Modal builds and hosts the containers. $30/month free compute on the Starter plan, typical usage is $1-2/month. Run `/setup` to deploy all tools automatically. -**RunPod (alternative):** Uses pre-built Docker images from `ghcr.io/conalmullan/video-toolkit-*`. Pay-per-second, no minimums. Run `python3 tools/.py --setup` to create endpoints. +**RunPod (alternative):** Uses pre-built Docker images from `ghcr.io/conalmullan/video-toolkit-*`. Pay-per-second, no minimums. Run `uv run tools/.py --setup` to create endpoints. See [docs/modal-setup.md](docs/modal-setup.md) and [docs/runpod-setup.md](docs/runpod-setup.md) for details. @@ -351,7 +353,7 @@ claude-code-video-toolkit/ The toolkit is built for Claude Code, but an experimental migration script installs its skills and workflows for [Codex](https://openai.com/codex/) and generates an `AGENTS.md` block from `CLAUDE.md`: ```bash -python3 scripts/migrate_to_codex.py --force +uv run scripts/migrate_to_codex.py --force ``` See [docs/codex.md](docs/codex.md) for what it installs, how the `AGENTS.md` block is managed, and how to remove it. Contributed by [@kimhoontae-gogo](https://github.com/kimhoontae-gogo) in [#16](https://github.com/digitalsamba/claude-code-video-toolkit/pull/16). diff --git a/_internal/toolkit-registry.json b/_internal/toolkit-registry.json index 4e8a4d2..a8bda29 100644 --- a/_internal/toolkit-registry.json +++ b/_internal/toolkit-registry.json @@ -192,7 +192,7 @@ "voiceover": { "path": "tools/voiceover.py", "description": "Generate TTS voiceovers using ElevenLabs or Qwen3-TTS", - "usage": "python tools/voiceover.py --script SCRIPT.md --output out.mp3", + "usage": "uv run tools/voiceover.py --script SCRIPT.md --output out.mp3", "status": "stable", "created": "2025-12-08", "updated": "2026-06-09", @@ -203,7 +203,7 @@ "music": { "path": "tools/music.py", "description": "Generate background music using ElevenLabs", - "usage": "python tools/music.py --prompt 'description' --duration 120 --output music.mp3", + "usage": "uv run tools/music.py --prompt 'description' --duration 120 --output music.mp3", "status": "stable", "created": "2025-12-08", "updated": "2025-12-08" @@ -211,7 +211,7 @@ "music_gen": { "path": "tools/music_gen.py", "description": "AI music generation using ACE-Step 1.5 (text-to-music, cover, stem extraction, repainting, continuation)", - "usage": "python tools/music_gen.py --prompt 'Upbeat tech' --duration 60 --bpm 128 --output music.mp3", + "usage": "uv run tools/music_gen.py --prompt 'Upbeat tech' --duration 60 --bpm 128 --output music.mp3", "status": "stable", "options": { "prompt": "Music description", @@ -259,7 +259,7 @@ "sfx": { "path": "tools/sfx.py", "description": "Generate sound effects using ElevenLabs", - "usage": "python tools/sfx.py --preset whoosh --output sfx.mp3", + "usage": "uv run tools/sfx.py --preset whoosh --output sfx.mp3", "status": "stable", "created": "2025-12-08", "updated": "2025-12-08" @@ -267,7 +267,7 @@ "redub": { "path": "tools/redub.py", "description": "Redub video with different voice (extract audio, transcribe, TTS, replace)", - "usage": "python tools/redub.py --input video.mp4 --voice-id VOICE_ID --output dubbed.mp4", + "usage": "uv run tools/redub.py --input video.mp4 --voice-id VOICE_ID --output dubbed.mp4", "status": "beta", "created": "2025-12-28", "updated": "2025-12-28" @@ -275,7 +275,7 @@ "addmusic": { "path": "tools/addmusic.py", "description": "Add background music to video (generate via ElevenLabs or use existing file)", - "usage": "python tools/addmusic.py --input video.mp4 --prompt 'Subtle corporate' --output output.mp4", + "usage": "uv run tools/addmusic.py --input video.mp4 --prompt 'Subtle corporate' --output output.mp4", "status": "beta", "created": "2025-12-28", "updated": "2025-12-28" @@ -283,7 +283,7 @@ "dewatermark": { "path": "tools/dewatermark.py", "description": "Remove watermarks using AI inpainting (ProPainter) - supports local and RunPod cloud", - "usage": "python tools/dewatermark.py --input video.mp4 --preset sora --output clean.mp4 --runpod", + "usage": "uv run tools/dewatermark.py --input video.mp4 --preset sora --output clean.mp4 --runpod", "status": "beta", "optional": true, "requires": "RunPod account OR local NVIDIA GPU + ProPainter (~2GB)", @@ -301,7 +301,7 @@ "locate_watermark": { "path": "tools/locate_watermark.py", "description": "Helper tool to identify watermark coordinates using grid overlay and verification", - "usage": "python tools/locate_watermark.py --input video.mp4 --grid --output-dir ./review/", + "usage": "uv run tools/locate_watermark.py --input video.mp4 --grid --output-dir ./review/", "status": "beta", "requires": "ImageMagick (brew install imagemagick)", "presets": [ @@ -318,7 +318,7 @@ "verify_setup": { "path": "tools/verify_setup.py", "description": "Verify toolkit setup - check prerequisites, cloud GPU, R2, and voice configuration", - "usage": "python tools/verify_setup.py [--test] [--json]", + "usage": "uv run tools/verify_setup.py [--test] [--json]", "status": "beta", "created": "2026-03-23", "updated": "2026-03-23" @@ -326,7 +326,7 @@ "notebooklm_brand": { "path": "tools/notebooklm_brand.py", "description": "Post-process NotebookLM videos with custom branding (trim outro, add logo/URL)", - "usage": "python tools/notebooklm_brand.py --input video.mp4 --logo logo.png --url 'mysite.com' --output branded.mp4", + "usage": "uv run tools/notebooklm_brand.py --input video.mp4 --logo logo.png --url 'mysite.com' --output branded.mp4", "status": "beta", "created": "2025-12-29", "updated": "2025-12-29" @@ -334,7 +334,7 @@ "image_edit": { "path": "tools/image_edit.py", "description": "AI image editing using Qwen-Image-Edit - style transfer, backgrounds, prompts", - "usage": "python tools/image_edit.py --input photo.jpg --prompt 'Add sunglasses' --output edited.png", + "usage": "uv run tools/image_edit.py --input photo.jpg --prompt 'Add sunglasses' --output edited.png", "status": "stable", "category": "image-editing", "backend": "qwen-image-edit-2511", @@ -383,7 +383,7 @@ "upscale": { "path": "tools/upscale.py", "description": "AI image upscaling using RealESRGAN - 2x/4x with optional face enhancement", - "usage": "python tools/upscale.py --input photo.jpg --output photo_4x.png --runpod", + "usage": "uv run tools/upscale.py --input photo.jpg --output photo_4x.png --runpod", "status": "stable", "category": "image-editing", "backend": "realesrgan", @@ -416,7 +416,7 @@ "sadtalker": { "path": "tools/sadtalker.py", "description": "Generate talking head videos from portrait image + audio using SadTalker", - "usage": "python tools/sadtalker.py --image portrait.png --audio voiceover.mp3 --preprocess full --output talking.mp4", + "usage": "uv run tools/sadtalker.py --image portrait.png --audio voiceover.mp3 --preprocess full --output talking.mp4", "status": "beta", "category": "video-generation", "backend": "sadtalker", @@ -453,7 +453,7 @@ "qwen3_tts": { "path": "tools/qwen3_tts.py", "description": "Generate speech using Qwen3-TTS - built-in voices, emotion control, voice cloning", - "usage": "python tools/qwen3_tts.py --text \"Hello\" --speaker Ryan --output hello.mp3", + "usage": "uv run tools/qwen3_tts.py --text \"Hello\" --speaker Ryan --output hello.mp3", "status": "beta", "category": "audio-generation", "backend": "qwen3-tts", @@ -504,7 +504,7 @@ "sync_timing": { "path": "tools/sync_timing.py", "description": "Sync scene durationSeconds in Remotion config with actual audio durations", - "usage": "python tools/sync_timing.py --apply", + "usage": "uv run tools/sync_timing.py --apply", "status": "stable", "category": "project", "requires": "ffprobe (from ffmpeg)", @@ -514,7 +514,7 @@ "flux2": { "path": "tools/flux2.py", "description": "AI image generation and editing using FLUX.2 Klein 4B - text-to-image, image editing, and scene presets", - "usage": "python tools/flux2.py --preset title-bg --brand digital-samba", + "usage": "uv run tools/flux2.py --preset title-bg --brand digital-samba", "status": "beta", "category": "image-generation", "backend": "flux2-klein-4b", @@ -565,7 +565,7 @@ "ideogram4": { "path": "tools/ideogram4.py", "description": "Ideogram 4 text-to-image via hosted v4 API — best-in-class in-image text rendering and exact color/layout control through structured JSON captions", - "usage": "python3 tools/ideogram4.py --json caption.json --output title.png", + "usage": "uv run tools/ideogram4.py --json caption.json --output title.png", "status": "beta", "category": "image-generation", "backend": "ideogram-4 hosted v4 API", @@ -588,7 +588,7 @@ "ltx2": { "path": "tools/ltx2.py", "description": "AI video generation using LTX-2.3 22B - text-to-video, image-to-video clips with synchronized audio", - "usage": "python tools/ltx2.py --prompt \"A sunset over the ocean\" --output sunset.mp4", + "usage": "uv run tools/ltx2.py --prompt \"A sunset over the ocean\" --output sunset.mp4", "status": "beta", "category": "video-generation", "backend": "ltx-2.3-22b", @@ -630,7 +630,7 @@ "chain_video": { "path": "tools/chain_video.py", "description": "Chain LTX-2 clips with visual continuity — each scene uses the last frame of the previous as input", - "usage": "python tools/chain_video.py --scenes-dir images/ --output-dir videos/ --prompt \"Cinematic flow\"", + "usage": "uv run tools/chain_video.py --scenes-dir images/ --output-dir videos/ --prompt \"Cinematic flow\"", "status": "beta", "category": "video-generation", "backend": "ltx-2.3-22b (via ltx2.py)", @@ -668,8 +668,8 @@ "notSupported": "Apple Silicon, CPU (use --runpod instead)", "memory": "8-10GB VRAM for 720p with fp16" }, - "installCommand": "python tools/dewatermark.py --install", - "statusCommand": "python tools/dewatermark.py --status", + "installCommand": "uv run tools/dewatermark.py --install", + "statusCommand": "uv run tools/dewatermark.py --status", "documentation": "docs/optional-components.md" } }, diff --git a/docs/codex.md b/docs/codex.md index 4de931e..1d03d64 100644 --- a/docs/codex.md +++ b/docs/codex.md @@ -6,7 +6,7 @@ contributed by [@kimhoontae-gogo](https://github.com/kimhoontae-gogo) in [#16](https://github.com/digitalsamba/claude-code-video-toolkit/pull/16). ```bash -python3 scripts/migrate_to_codex.py --force +uv run scripts/migrate_to_codex.py --force ``` This does two things: @@ -24,12 +24,12 @@ This does two things: - The script manages **only** its generated block inside `AGENTS.md`. - Manual `AGENTS.md` content outside that block is preserved. - The block is derived from `CLAUDE.md` — after `CLAUDE.md` changes, re-run - `python3 scripts/migrate_to_codex.py --force` to refresh it. + `uv run scripts/migrate_to_codex.py --force` to refresh it. ## Removing ```bash -python3 scripts/migrate_to_codex.py --reset +uv run scripts/migrate_to_codex.py --reset ``` `--reset` removes the toolkit skills previously installed under `~/.codex/skills` and diff --git a/docs/creating-brands.md b/docs/creating-brands.md index aae9c76..4b75370 100644 --- a/docs/creating-brands.md +++ b/docs/creating-brands.md @@ -169,5 +169,5 @@ Run `/voice-clone` for a guided workflow that: Once saved, use `--brand` to load it automatically: ```bash -python tools/voiceover.py --provider qwen3 --brand my-company --scene-dir public/audio/scenes --json +uv run tools/voiceover.py --provider qwen3 --brand my-company --scene-dir public/audio/scenes --json ``` diff --git a/docs/getting-started.md b/docs/getting-started.md index 6949bba..53dcef1 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -12,7 +12,7 @@ This guide will help you create your first video using the claude-code-video-too | Provider | Cost | Setup | |----------|------|-------| -| Qwen3-TTS | Free (self-hosted) | RunPod account + `python tools/qwen3_tts.py --setup` | +| Qwen3-TTS | Free (self-hosted) | RunPod account + `uv run tools/qwen3_tts.py --setup` | | ElevenLabs | Pay-per-use | API key in `.env` | ### Optional: Full Toolkit @@ -43,11 +43,19 @@ No API keys needed. Edit `src/config/sprint-config.ts` to customize content. cd claude-code-video-toolkit ``` -2. **Install Python dependencies** +2. **Install Python dependencies with [uv](https://docs.astral.sh/uv/)** ```bash - python -m venv .venv - source .venv/bin/activate # On Windows: .venv\Scripts\activate - pip install -r tools/requirements.txt + # Install uv if you don't have it: + # macOS/Linux: curl -LsSf https://astral.sh/uv/install.sh | sh + # Windows: powershell -c "irm https://astral.sh/uv/install.ps1 | iex" + uv sync + ``` + This creates `.venv/` and installs every locked dependency in one step — no manual + virtualenv or pip required. Run any Python tool through the environment with + `uv run tools/.py` (no activation needed). Optional extras: + ```bash + uv sync --extra whisper # burned karaoke captions (heavy, pulls in torch) + uv sync --extra modal # Modal CLI for self-hosted cloud GPU ``` 3. **Start Claude Code and run the setup wizard** @@ -66,7 +74,7 @@ No API keys needed. Edit `src/config/sprint-config.ts` to customize content. If you use Codex instead of Claude Code, install the toolkit's Codex-compatible wrappers and regenerate `AGENTS.md` from `CLAUDE.md`: ```bash -python3 scripts/migrate_to_codex.py --force +uv run scripts/migrate_to_codex.py --force ``` This installs toolkit skills into `~/.codex/skills` and appends or updates a generated Codex block in the repository root `AGENTS.md`. @@ -82,12 +90,12 @@ Important: 1. The script manages only a generated block inside the repository root `AGENTS.md`. 2. Manual `AGENTS.md` content outside that block is preserved. 3. The generated block is derived from `CLAUDE.md`. -4. Re-run `python3 scripts/migrate_to_codex.py --force` after updating `CLAUDE.md`. +4. Re-run `uv run scripts/migrate_to_codex.py --force` after updating `CLAUDE.md`. To remove the installed toolkit skills later: ```bash -python3 scripts/migrate_to_codex.py --reset +uv run scripts/migrate_to_codex.py --reset ``` `--reset` removes the generated Codex block from `AGENTS.md`, but does not remove the rest of the file. diff --git a/docs/ltx2.md b/docs/ltx2.md index c355735..4326917 100644 --- a/docs/ltx2.md +++ b/docs/ltx2.md @@ -6,16 +6,16 @@ Generate ~5 second video clips from text prompts or images using the LTX-2.3 22B ```bash # Text-to-video -python3 tools/ltx2.py --prompt "A cat playing with yarn in a sunlit room" +uv run tools/ltx2.py --prompt "A cat playing with yarn in a sunlit room" # Image-to-video (animate a still image) -python3 tools/ltx2.py --prompt "Camera slowly pans right" --input photo.jpg +uv run tools/ltx2.py --prompt "Camera slowly pans right" --input photo.jpg # Higher resolution -python3 tools/ltx2.py --prompt "Ocean waves at sunset" --width 1024 --height 576 +uv run tools/ltx2.py --prompt "Ocean waves at sunset" --width 1024 --height 576 # Fast mode (fewer steps, quicker but lower quality) -python3 tools/ltx2.py --prompt "A rocket launch" --quality fast +uv run tools/ltx2.py --prompt "A rocket launch" --quality fast ``` ## Setup @@ -24,7 +24,7 @@ LTX-2 runs on Modal cloud GPU (A100-80GB). Setup takes about 15 minutes — most ### Prerequisites -- Modal account and CLI installed (`pip install modal && python3 -m modal setup`) +- Modal account and CLI installed (`uv sync --extra modal && uv run modal setup`) - HuggingFace account with a **read-access** token ([create one here](https://huggingface.co/settings/tokens) — "Read access to contents of all repos" scope is sufficient) - Accept the [Gemma 3 license](https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-unquantized) (one-click "Agree" on the model page) @@ -33,7 +33,7 @@ LTX-2 runs on Modal cloud GPU (A100-80GB). Setup takes about 15 minutes — most 1. **Create a Modal secret** with your HuggingFace token: ```bash - modal secret create huggingface-token HF_TOKEN=hf_your_token_here + uv run modal secret create huggingface-token HF_TOKEN=hf_your_token_here ``` > **Important:** This token is used for both the LTX-2 weights (~55GB) and the Gemma text encoder (~7GB). While LTX-2 isn't a gated model, unauthenticated downloads from HuggingFace are severely rate-limited — a 46GB checkpoint that takes ~10 minutes with a token can take over an hour without one. The Gemma model is gated and will fail entirely without auth. @@ -41,7 +41,7 @@ LTX-2 runs on Modal cloud GPU (A100-80GB). Setup takes about 15 minutes — most 2. **Deploy the Modal app** (downloads and bakes all model weights — takes 10-15 min): ```bash - modal deploy docker/modal-ltx2/app.py + uv run modal deploy docker/modal-ltx2/app.py ``` 3. **Save the endpoint URL** printed by `modal deploy` to your `.env`: @@ -53,7 +53,7 @@ LTX-2 runs on Modal cloud GPU (A100-80GB). Setup takes about 15 minutes — most 4. **Test it:** ```bash - python3 tools/ltx2.py --prompt "A single lit candle flickering on a dark table, cinematic lighting" + uv run tools/ltx2.py --prompt "A single lit candle flickering on a dark table, cinematic lighting" ``` ## Parameters @@ -172,10 +172,10 @@ Total baked weight: ~55 GB. The pipeline manages memory by loading and freeing c Reduce dimensions or frame count: ```bash # Try smaller resolution -python3 tools/ltx2.py --prompt "..." --width 512 --height 512 +uv run tools/ltx2.py --prompt "..." --width 512 --height 512 # Or fewer frames -python3 tools/ltx2.py --prompt "..." --num-frames 73 +uv run tools/ltx2.py --prompt "..." --num-frames 73 ``` ### "Modal endpoint is scaling up" diff --git a/docs/modal-setup.md b/docs/modal-setup.md index d1d60de..c2e105e 100644 --- a/docs/modal-setup.md +++ b/docs/modal-setup.md @@ -14,9 +14,9 @@ Modal is the recommended cloud GPU provider for the toolkit's AI tools. It offer ## Install & Authenticate ```bash -pip install modal -python3 -m modal setup # Opens browser to authenticate, saves token to ~/.modal.toml -modal app list # Verify it works +uv sync --extra modal # Installs the Modal CLI into the toolkit's .venv +uv run modal setup # Opens browser to authenticate, saves token to ~/.modal.toml +uv run modal app list # Verify it works ``` ## Deploy Tools @@ -25,19 +25,19 @@ Each AI tool has its own Modal app. Deploy only what you need, or deploy all of ```bash # Speech generation (most commonly used) -modal deploy docker/modal-qwen3-tts/app.py +uv run modal deploy docker/modal-qwen3-tts/app.py # Image generation & editing -modal deploy docker/modal-flux2/app.py -modal deploy docker/modal-image-edit/app.py -modal deploy docker/modal-upscale/app.py +uv run modal deploy docker/modal-flux2/app.py +uv run modal deploy docker/modal-image-edit/app.py +uv run modal deploy docker/modal-upscale/app.py # Music generation -modal deploy docker/modal-music-gen/app.py +uv run modal deploy docker/modal-music-gen/app.py # Video processing -modal deploy docker/modal-sadtalker/app.py -modal deploy docker/modal-propainter/app.py +uv run modal deploy docker/modal-sadtalker/app.py +uv run modal deploy docker/modal-propainter/app.py ``` Each deploy prints an endpoint URL like: @@ -85,26 +85,26 @@ All cloud GPU tools accept `--cloud modal`: ```bash # AI voiceover -python3 tools/qwen3_tts.py --text "Hello world" --speaker Ryan --output hello.mp3 --cloud modal +uv run tools/qwen3_tts.py --text "Hello world" --speaker Ryan --output hello.mp3 --cloud modal # AI image generation -python3 tools/flux2.py --prompt "A sunset over mountains" --output sunset.png --cloud modal +uv run tools/flux2.py --prompt "A sunset over mountains" --output sunset.png --cloud modal # AI image editing -python3 tools/image_edit.py --input photo.jpg --style cyberpunk --cloud modal +uv run tools/image_edit.py --input photo.jpg --style cyberpunk --cloud modal # AI upscaling -python3 tools/upscale.py --input photo.jpg --output photo_4x.png --cloud modal +uv run tools/upscale.py --input photo.jpg --output photo_4x.png --cloud modal # AI music generation (acemusic cloud API is now default — no Modal needed) -python3 tools/music_gen.py --preset corporate-bg --duration 60 --output bg.mp3 -# Or use Modal: python3 tools/music_gen.py --preset corporate-bg --duration 60 --output bg.mp3 --cloud modal +uv run tools/music_gen.py --preset corporate-bg --duration 60 --output bg.mp3 +# Or use Modal: uv run tools/music_gen.py --preset corporate-bg --duration 60 --output bg.mp3 --cloud modal # Talking head from portrait + audio -python3 tools/sadtalker.py --image portrait.png --audio voiceover.mp3 --output talking.mp4 --cloud modal +uv run tools/sadtalker.py --image portrait.png --audio voiceover.mp3 --output talking.mp4 --cloud modal # Watermark removal -python3 tools/dewatermark.py --input video.mp4 --region 1080,660,195,40 --output clean.mp4 --cloud modal +uv run tools/dewatermark.py --input video.mp4 --region 1080,660,195,40 --output clean.mp4 --cloud modal ``` ## Tools & Costs @@ -141,16 +141,16 @@ After 60 seconds of no requests, containers scale back to zero. No charges while ```bash # Check what's running (Tasks column should be 0 when idle) -modal app list +uv run modal app list # Check today's spend -modal billing report --for today --json +uv run modal billing report --for today --json # View container logs -modal app logs video-toolkit-upscale +uv run modal app logs video-toolkit-upscale # Verify your setup -python3 tools/verify_setup.py +uv run tools/verify_setup.py ``` ## Architecture diff --git a/docs/optional-components.md b/docs/optional-components.md index 6d4f6d1..f1f80ed 100644 --- a/docs/optional-components.md +++ b/docs/optional-components.md @@ -39,10 +39,10 @@ This is a PyTorch/MPS limitation, not something we can fix in the tool. ```bash # Check current status -python tools/dewatermark.py --status +uv run tools/dewatermark.py --status # Install ProPainter -python tools/dewatermark.py --install +uv run tools/dewatermark.py --install ``` This will: @@ -55,7 +55,7 @@ This will: **Remove watermark by specifying region:** ```bash -python tools/dewatermark.py \ +uv run tools/dewatermark.py \ --input video.mp4 \ --region 1080,660,195,40 \ --output clean.mp4 @@ -63,7 +63,7 @@ python tools/dewatermark.py \ **Use a custom mask image:** ```bash -python tools/dewatermark.py \ +uv run tools/dewatermark.py \ --input video.mp4 \ --mask mask.png \ --output clean.mp4 @@ -75,10 +75,10 @@ Use the `locate_watermark.py` helper: ```bash # Extract frames with coordinate grid -python tools/locate_watermark.py --input video.mp4 --grid --output-dir ./review/ +uv run tools/locate_watermark.py --input video.mp4 --grid --output-dir ./review/ # Verify a region across multiple frames -python tools/locate_watermark.py --input video.mp4 --region 1100,650,150,50 --verify +uv run tools/locate_watermark.py --input video.mp4 --region 1100,650,150,50 --verify ``` ### Cloud GPU Alternative diff --git a/docs/qwen-edit-patterns.md b/docs/qwen-edit-patterns.md index 4d44d90..d90bdd7 100644 --- a/docs/qwen-edit-patterns.md +++ b/docs/qwen-edit-patterns.md @@ -124,10 +124,10 @@ pipe.to("cuda") ```bash # Basic test -python tools/test_qwen_edit.py --image photo.jpg --prompt "description" --steps 8 +uv run tools/test_qwen_edit.py --image photo.jpg --prompt "description" --steps 8 # With seed for reproducibility -python tools/test_qwen_edit.py --image photo.jpg --prompt "description" --seed 42 +uv run tools/test_qwen_edit.py --image photo.jpg --prompt "description" --seed 42 ``` ## Sample Results diff --git a/docs/runpod-setup.md b/docs/runpod-setup.md index 4fb8dcd..ae4ddaa 100644 --- a/docs/runpod-setup.md +++ b/docs/runpod-setup.md @@ -31,14 +31,14 @@ The fastest way to set up RunPod: echo "RUNPOD_API_KEY=your_key_here" >> .env # 2. Run automated setup for each tool you need -python tools/image_edit.py --setup # AI image editing -python tools/upscale.py --setup # AI upscaling -python tools/dewatermark.py --setup # AI watermark removal +uv run tools/image_edit.py --setup # AI image editing +uv run tools/upscale.py --setup # AI upscaling +uv run tools/dewatermark.py --setup # AI watermark removal # 3. Done! Now use them: -python tools/image_edit.py --input photo.jpg --style cyberpunk -python tools/upscale.py --input photo.jpg --output photo_4x.png --runpod -python tools/dewatermark.py --input video.mp4 --region x,y,w,h --output out.mp4 --runpod +uv run tools/image_edit.py --input photo.jpg --style cyberpunk +uv run tools/upscale.py --input photo.jpg --output photo_4x.png --runpod +uv run tools/dewatermark.py --input video.mp4 --region x,y,w,h --output out.mp4 --runpod ``` Each `--setup` command will: @@ -115,13 +115,13 @@ RUNPOD_ENDPOINT_ID=ghi789 # For dewatermark ```bash # Image editing -python tools/image_edit.py --input photo.jpg --prompt "Add sunglasses" +uv run tools/image_edit.py --input photo.jpg --prompt "Add sunglasses" # Upscaling -python tools/upscale.py --input photo.jpg --output photo_4x.png --runpod +uv run tools/upscale.py --input photo.jpg --output photo_4x.png --runpod # Dewatermark (with dry run) -python tools/dewatermark.py \ +uv run tools/dewatermark.py \ --input video.mp4 \ --region 1080,660,195,40 \ --output clean.mp4 \ @@ -187,7 +187,7 @@ RUNPOD_ENDPOINT_ID=abc123xyz Default timeout is 30 minutes. For longer videos: ```bash -python tools/dewatermark.py ... --runpod --runpod-timeout 3600 +uv run tools/dewatermark.py ... --runpod --runpod-timeout 3600 ``` ### "Failed to upload video" @@ -253,7 +253,7 @@ By default, videos are uploaded via free file hosting services (litterbox.catbox 6. **Install boto3** (if not already): ```bash - pip install boto3 + uv sync ``` That's it! All RunPod tools will automatically use R2 for file transfer when configured. diff --git a/docs/sadtalker.md b/docs/sadtalker.md index 0bbc544..e8b8cfb 100644 --- a/docs/sadtalker.md +++ b/docs/sadtalker.md @@ -6,10 +6,10 @@ Generate realistic talking head videos from a portrait image and audio file. ```bash # Basic usage -python tools/sadtalker.py --image portrait.png --audio voiceover.mp3 --output talking.mp4 +uv run tools/sadtalker.py --image portrait.png --audio voiceover.mp3 --output talking.mp4 # With preset -python tools/sadtalker.py --image portrait.png --audio voiceover.mp3 --preset natural --output talking.mp4 +uv run tools/sadtalker.py --image portrait.png --audio voiceover.mp3 --preset natural --output talking.mp4 ``` ## When NOT to Use SadTalker @@ -32,7 +32,7 @@ For these cases, use **LTX-2 image-to-video** instead. It animates the whole ima 2. Run setup to create the endpoint: ```bash - python tools/sadtalker.py --setup + uv run tools/sadtalker.py --setup ``` ## Presets @@ -97,7 +97,7 @@ For these cases, use **LTX-2 image-to-video** instead. It animates the whole ima ### Natural Talking Head ```bash -python tools/sadtalker.py \ +uv run tools/sadtalker.py \ --image portrait.png \ --audio narration.mp3 \ --preset natural \ @@ -106,7 +106,7 @@ python tools/sadtalker.py \ ### Professional/Calm Style ```bash -python tools/sadtalker.py \ +uv run tools/sadtalker.py \ --image headshot.png \ --audio presentation.mp3 \ --preset professional \ @@ -116,7 +116,7 @@ python tools/sadtalker.py \ ### Expressive Animation ```bash -python tools/sadtalker.py \ +uv run tools/sadtalker.py \ --image avatar.png \ --audio excited_speech.mp3 \ --preset expressive \ @@ -125,7 +125,7 @@ python tools/sadtalker.py \ ### Full Body Shot ```bash -python tools/sadtalker.py \ +uv run tools/sadtalker.py \ --image fullbody.png \ --audio voiceover.mp3 \ --preset fullbody \ @@ -134,7 +134,7 @@ python tools/sadtalker.py \ ### Fine-Tuned Settings ```bash -python tools/sadtalker.py \ +uv run tools/sadtalker.py \ --image portrait.png \ --audio speech.mp3 \ --pose-style 45 \ @@ -160,10 +160,10 @@ If your client times out but the job completes on RunPod: ```bash # Get job ID from the error output, then retrieve results -python tools/sadtalker.py --retrieve JOB_ID --output result.mp4 +uv run tools/sadtalker.py --retrieve JOB_ID --output result.mp4 # Example -python tools/sadtalker.py --retrieve 7d69546a-3d31-4f74-bf12-b1c19a2f4d3c-u1 --output narrator.mp4 +uv run tools/sadtalker.py --retrieve 7d69546a-3d31-4f74-bf12-b1c19a2f4d3c-u1 --output narrator.mp4 ``` Jobs persist on RunPod for ~24 hours, so you can retrieve later. @@ -194,7 +194,7 @@ SadTalker doesn't resize or change backgrounds. To customize: ffmpeg -i portrait.png -vf scale=1280:720 portrait_720p.png # Or use image_edit.py for background changes - python tools/image_edit.py --input portrait.png --background studio --output portrait_studio.png + uv run tools/image_edit.py --input portrait.png --background studio --output portrait_studio.png ``` 2. **Post-process output video:** @@ -284,7 +284,7 @@ Humans frame faces better than automated cropping. Guide users to: ### 4. Recommended Command for NarratorPiP ```bash -python tools/sadtalker.py \ +uv run tools/sadtalker.py \ --image presenter_16x9.png \ --audio voiceover.mp3 \ --still --expression-scale 0.8 --preprocess full \ @@ -300,13 +300,13 @@ Key flags: ```bash # 1. Generate voiceover -python tools/voiceover.py --script script.md --output narration.mp3 +uv run tools/voiceover.py --script script.md --output narration.mp3 # 2. Prepare 16:9 image (user crops manually for best framing) # presenter_16x9.png should be ~640x360 or similar 16:9 ratio # 3. Create talking head with NarratorPiP-optimized settings -python tools/sadtalker.py \ +uv run tools/sadtalker.py \ --image presenter_16x9.png \ --audio narration.mp3 \ --still --expression-scale 0.8 --preprocess full \ diff --git a/examples/README.md b/examples/README.md index 9841b8d..0ab31ef 100644 --- a/examples/README.md +++ b/examples/README.md @@ -10,11 +10,11 @@ Curated showcase projects demonstrating toolkit capabilities. | quick-spot | moviepy + PIL | — | 15s ad-style spot with audio-anchored timeline. Runs with zero external assets. | Beginner | | data-viz-chart | moviepy + matplotlib | — | Animated time-series chart with deterministic title and source attribution. Runs with included data file. | Beginner | | ds-crt-stinger | LTX-2 + moviepy + PIL | — | 6s brand stinger — LTX-2 CRT LoRA footage + post-processed grunged logo | Intermediate | -| sky-blue-short | concept-explainer-short (moviepy) | — | 52s vertical 9:16 explainer short — Qwen3 VO + Ideogram cards + LTX b-roll + burned captions. All assets committed; re-renders with `python3 build.py`. | Intermediate | +| sky-blue-short | concept-explainer-short (moviepy) | — | 52s vertical 9:16 explainer short — Qwen3 VO + Ideogram cards + LTX b-roll + burned captions. All assets committed; re-renders with `uv run build.py`. | Intermediate | | digital-samba-skill-demo | Remotion product-demo | [Digital Samba](https://digitalsamba.com) | Marketing video for Claude Code skill | Intermediate | | sprint-review-cho-oyu | Remotion sprint-review | [Digital Samba](https://digitalsamba.com) | iOS sprint review for Digital Samba Mobile | Intermediate | -> **Note:** Remotion examples include configs and documentation but NOT large media files — see each example's `ASSETS-NEEDED.md` for what to create. The moviepy examples (`quick-spot`, `data-viz-chart`) are fully self-contained and run end-to-end with `python3 build.py`. +> **Note:** Remotion examples include configs and documentation but NOT large media files — see each example's `ASSETS-NEEDED.md` for what to create. The moviepy examples (`quick-spot`, `data-viz-chart`) are fully self-contained and run end-to-end with `uv run build.py`. ## Contributors @@ -41,7 +41,7 @@ npm run studio ```bash cd examples/quick-spot # or examples/data-viz-chart -python3 build.py # produces out.mp4 in the example directory +uv run build.py # produces out.mp4 in the example directory ``` These are fully self-contained references for the moviepy skill. Read the `build.py` and `README.md` in each. @@ -52,7 +52,7 @@ Examples don't include large media files (videos, audio). To run them: 1. **Record demos** - Use `/record-demo` to capture screen recordings 2. **Generate voiceover** - Use `/generate-voiceover` with the included script -3. **Add music** - Use `python tools/music.py` for background tracks +3. **Add music** - Use `uv run tools/music.py` for background tracks Each example includes a `ASSETS-NEEDED.md` documenting what to create. diff --git a/examples/data-viz-chart/README.md b/examples/data-viz-chart/README.md index 08771b6..c3945a7 100644 --- a/examples/data-viz-chart/README.md +++ b/examples/data-viz-chart/README.md @@ -9,11 +9,11 @@ Renders out of the box with the included `data/star_series.json` (real GitHub st ```bash # From the toolkit root (one-time — installs matplotlib, moviepy, and # Pillow alongside the rest of the toolkit's optional Python deps): -python3 -m pip install -r tools/requirements.txt +uv sync # Then: cd examples/data-viz-chart -python3 build.py +uv run build.py ``` First run takes ~30 seconds (matplotlib renders 450 frames, moviepy composites). Subsequent runs reuse the cached chart animation unless `data/star_series.json` is newer. @@ -55,7 +55,7 @@ To plot something else: 2. Update the title in `build.py` (`text_clip("github.com/...", ...)`). 3. Update the y-axis label in `render_chart_animation()` (`ax.set_ylabel("Stars", ...)`). 4. Update the unit label (`text_clip("stars", ...)`) and source attribution. -5. Run `python3 build.py`. +5. Run `uv run build.py`. The chart re-renders automatically because the data file mtime is newer than the cached animation. @@ -64,7 +64,7 @@ The chart re-renders automatically because the data file mtime is newer than the Same pattern as `examples/quick-spot`: ```bash -python3 ../../tools/voiceover.py \ +uv run ../../tools/voiceover.py \ --script VOICEOVER-SCRIPT.md \ --scene-dir public/audio/scenes ``` diff --git a/examples/data-viz-chart/build.py b/examples/data-viz-chart/build.py index e017181..398d212 100644 --- a/examples/data-viz-chart/build.py +++ b/examples/data-viz-chart/build.py @@ -4,7 +4,7 @@ Demonstrates the toolkit's "matplotlib for data, moviepy for trustworthy text" pattern. Run as: - python3 build.py + uv run build.py Produces a 15-second animated time-series chart from data/star_series.json with a title, axis labels, and source attribution rendered via PIL + @@ -58,8 +58,8 @@ except ImportError as e: print(f"Missing dependency: {e}") print("Install the toolkit's Python dependencies:") - print(" python3 -m pip install -r ../../tools/requirements.txt") - print("(run from this directory, or use an absolute path)") + print(" uv sync") + print("(run from the toolkit root, then re-run this script with `uv run build.py`)") sys.exit(1) HERE = Path(__file__).resolve().parent diff --git a/examples/digital-samba-skill-demo/ASSETS-NEEDED.md b/examples/digital-samba-skill-demo/ASSETS-NEEDED.md index 8388f82..6c54cc8 100644 --- a/examples/digital-samba-skill-demo/ASSETS-NEEDED.md +++ b/examples/digital-samba-skill-demo/ASSETS-NEEDED.md @@ -38,13 +38,13 @@ Recording specs: 1920x1080, 30fps | File | Duration | Description | How to Create | |------|----------|-------------|---------------| | `remotion/public/audio/voiceover.mp3` | ~2:30 | Narration from VOICEOVER-SCRIPT.md | `/generate-voiceover` | -| `remotion/public/audio/background-music.mp3` | ~3:00 | Subtle tech ambient | `python tools/music.py` | +| `remotion/public/audio/background-music.mp3` | ~3:00 | Subtle tech ambient | `uv run tools/music.py` | ### Voiceover Generation ```bash cd /path/to/toolkit -python tools/voiceover.py \ +uv run tools/voiceover.py \ --script examples/digital-samba-skill-demo/VOICEOVER-SCRIPT.md \ --output examples/digital-samba-skill-demo/remotion/public/audio/voiceover.mp3 ``` @@ -52,7 +52,7 @@ python tools/voiceover.py \ ### Background Music Generation ```bash -python tools/music.py \ +uv run tools/music.py \ --prompt "subtle tech ambient, modern, clean" \ --duration 180 \ --output examples/digital-samba-skill-demo/remotion/public/audio/background-music.mp3 diff --git a/examples/digital-samba-skill-demo/CLAUDE.md b/examples/digital-samba-skill-demo/CLAUDE.md index c09c842..1b380d4 100644 --- a/examples/digital-samba-skill-demo/CLAUDE.md +++ b/examples/digital-samba-skill-demo/CLAUDE.md @@ -73,16 +73,14 @@ Target: 2:30 (4500 frames @ 30fps) ## Python Environment -This project uses a Python virtual environment for audio generation scripts. +Audio generation scripts run in the toolkit's uv-managed environment. ```bash -# First time setup -python3 -m venv .venv -source .venv/bin/activate -pip install python-dotenv elevenlabs +# First time setup (from the toolkit root) +uv sync -# Subsequent runs - always activate venv first -source .venv/bin/activate +# Run scripts through the environment — no venv activation needed +uv run