AI Restyle: Phase 1+2 backend ML pipeline (no routes/UI yet)#35
AI Restyle: Phase 1+2 backend ML pipeline (no routes/UI yet)#35vansteenbergenmatisse wants to merge 56 commits into
Conversation
Locks in current behavior so the upcoming package restructure can be verified one move at a time: - tests/unit/: pure-Python tests for SmoothedCameraman, SpeakerTracker, VideoEditor filter sanitization + zoompan enforcement, generate_srt / format_srt_block / hex_to_ass_color, create_hook_image (real PIL), and translate.SUPPORTED_LANGUAGES. - tests/api/: FastAPI TestClient contract checks. Captures the full openapi.json into tests/snapshots/baseline.openapi.json (32 routes) so any drift across the restructure fails loudly. - tests/e2e/: real-ffmpeg pipeline smoke test, skipped unless a fixture video and all production deps are present. - conftest.py stubs heavy ML deps (cv2, mediapipe, ultralytics, torch, yt_dlp, scenedetect, google.genai, faster_whisper) via sys.modules so unit + api tests run on a stock laptop. - requirements-dev.txt + pyproject.toml pull pytest, respx, fastapi, pillow, boto3, etc. and configure the e2e marker. Result on the unchanged flat codebase: pytest -m \"not e2e\" -> 62 passed in 0.6s Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ject Phase 1 step 0: create the package skeleton with __init__.py docstrings in every target folder so subsequent moves have a destination. No code moves yet — tests stay 62/62 green. Extends pyproject.toml with [build-system] + [project] + setuptools package discovery so `pip install -e .` exposes the new package. requires-python pinned to >=3.9 to match the local dev venv (Docker still uses 3.11). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 step 1: relocate the leaf module with the fewest reverse imports (only app.py imports from s3_uploader). Adds a re-export shim at the old path so existing `from s3_uploader import ...` keeps working through the restructure. Tests stay 62/62 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…abs.py Phase 1 step 2: relocate the ElevenLabs dubbing client. Only app.py imports from translate. Adds a re-export shim at the old path. Tests stay 62/62 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 step 3: relocate the PIL hook-card generator + FFmpeg overlay helper. Imported by app.py and the three root verify_*.py scripts. Adds a re-export shim at the old path. Tests stay 62/62 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ender}.py Phase 1 step 4: separate concerns. Generation (faster-whisper + SRT writing) lives in subtitles_generate.py; FFmpeg burn-in + ASS color conversion lives in subtitles_render.py. Shim at the old path preserves existing imports. Tests stay 62/62 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pts + utils/filters Phase 1 step 5: separate the Gemini-driven filter generator. The VideoEditor class moves to openshorts/editing/ai_filters.py; the two long Gemini prompt strings move to openshorts/editing/prompts.py as functions; the previously private filter helpers (sanitize_filter_string, enforce_zoompan_output_size, split_filter_chain) move to openshorts/utils/filters.py so the future motion-graphics and audio compositors can reuse them. VideoEditor still re-exposes the helpers as static/classmethods so the existing characterization tests pass unchanged. Shim at editor.py keeps `from editor import VideoEditor` working. Tests stay 62/62 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…riptions}.py Phase 1 step 6: separate the thumbnail workflow into three modules. Each concern is < 100 lines and independently testable. Shim at the old path preserves existing imports. Tests stay 62/62 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 step 7 (biggest split): main.py is broken into eight modules: - openshorts/video/tracking.py SmoothedCameraman, SpeakerTracker - openshorts/video/scene_analysis.py detect_scenes, analyze_scenes_strategy - openshorts/video/reframing.py create_general_frame - openshorts/video/pipeline.py process_video_to_vertical (the hot loop) - openshorts/ml/detection.py detect_face_candidates, detect_person_yolo - openshorts/ml/transcription.py transcribe_video - openshorts/ml/viral_extraction.py GEMINI_PROMPT_TEMPLATE, get_viral_clips - openshorts/ingest/youtube.py download_youtube_video, sanitize_filename main.py becomes a thin shim that re-exports the public surface for backwards compatibility AND preserves the CLI entrypoint (`python main.py -i ... -o ...`) in a private `_cli()` function. Tests stay 62/62 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 step 8: relocate the SaaS UGC pipeline as a single module. The plan calls for an internal split into research / scripting / media / compositing / pipeline; doing that as one move keeps the change small, ships the file into the right folder, and defers the function-level split to a follow-up commit. Shim at saasshorts.py preserves existing `from saasshorts import ...` calls. No direct test coverage for this module; the openapi.json contract still passes. Tests stay 62/62 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oint Phase 1 step 9 (minimum viable): expose the FastAPI app at ``openshorts.app:app`` so the Dockerfile / docker-compose entrypoint can target the package path. The actual route handlers still live in the root-level app.py (2256 lines, 32 routes) during the restructure; the full split into routers (process, editing, subtitles, hooks, translation, thumbnails, saasshorts, social + the future audio/layouts/motion_graphics domains) is intentionally deferred to a follow-up commit to keep this change focused. The plan and ROADMAP track the deferred work. Tests stay 62/62 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 step 10: introduce the single FFmpeg wrapper module that the plan calls for. The scaffold exposes the helpers needed by the existing call sites (run, probe_resolution, probe_duration, cut, extract_audio, mux_video_audio, overlay_png) plus a build_filter_complex composer that the future motion-graphics compositor and audio mixer will use to batch overlay/eq/amix operations into a single ffmpeg invocation. Migration of every existing ``subprocess.run(['ffmpeg', ...])`` call to this wrapper is deferred — it's incremental per-caller work that benefits from running between commits with the test suite green. The ROADMAP documents the migration as a follow-up. Tests stay 62/62 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 step 11: update the container entrypoint to the new package path. openshorts.app re-exports the FastAPI instance from the root-level app.py (it inserts the repo root on sys.path itself, so no editable install is needed). docker-compose.yml inherits this via the backend service. Tests stay 62/62 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3: replace the AWS-only stub with the complete set of env vars the codebase reads via os.getenv: - GEMINI_API_KEY (required — viral clip extractor) - AWS_* (optional — S3 clip/actor/video galleries) - DISABLE_YOUTUBE_URL (gate the YouTube tab) - YOUTUBE_COOKIES (yt-dlp bot-detection workaround) - RENDER_SERVICE_URL (Remotion proxy) - MAX_CONCURRENT_JOBS (asyncio semaphore in job queue) - VITE_API_URL + VITE_ENCRYPTION_KEY (frontend) Documents that ELEVENLABS_API_KEY / UPLOAD_POST_API_KEY / FAL_KEY come from the browser via headers (encrypted in localStorage), not server- side env — they're listed at the bottom as commented hints in case a deployer wants to wire a server default later. Tests stay 62/62 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 4: enforce the "every Python module ships with a one-line docstring" convention mechanically so CLAUDE.md stays in sync with the codebase rather than relying on advisory adherence. - scripts/update_claude_md.py walks openshorts/, parses each module's ast for its docstring + public surface, reads .env.example, and rewrites the three auto-managed sections of CLAUDE.md between marker comments (REPO-MAP, MODULE-MAP, ENV). It exits non-zero with a list of offenders if any module lacks a docstring — that failure mode is what enforces the convention. - scripts/install_hooks.sh: one-liner that runs `pre-commit install`. - .pre-commit-config.yaml: runs the updater on every commit. Since the hook regenerates CLAUDE.md and the resulting changes need to be re-staged, developers should rerun `git add CLAUDE.md && git commit` after a code change touches module structure. CLAUDE.md itself stays untouched in this commit — the actual rewrite with markers in the right place happens in Phase 2. Tests stay 62/62 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ions
Phase 2: replace the old top-down listing with a tree-aware version
that tells Claude (and humans) where new code lands.
Sections (in order):
- Project + quick start (docker compose / backend only / frontend only)
- "Where things go" decision table — the heart of the file. 11 rows
mapping intent ("add a new HTTP endpoint") to destination
("openshorts/routes/<domain>.py"). Plus the removal checklist.
- Repo layout — top-level folders + backend-package subfolders with
one-liner rules each. Top-level table is AUTO-MANAGED.
- Module map — every .py under openshorts/ with its docstring +
public surface. AUTO-MANAGED.
- Processing pipeline — 11 stages with function-level references to
the new module paths.
- API surface — 12-row table; full inventory in the openapi snapshot.
- Environment — AUTO-MANAGED from .env.example.
- Conventions — six opinionated rules including the "single FFmpeg
wrapper" and "every module has a docstring" rules that the
pre-commit hook enforces mechanically.
- Pointers, Tech stack.
Auto-managed sections are filled by scripts/update_claude_md.py
between marker comments (REPO-MAP, MODULE-MAP, ENV). Includes a
small filter fix to exclude .egg-info / .dist-info from REPO-MAP.
Tests stay 62/62 green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five small CLAUDE.md files at directory boundaries. Each carries the one rule that's easy to violate when working inside that subtree: - openshorts/video/ — FFmpeg only via ffmpeg.py - openshorts/layouts/ — subclass Layout; don't bypass in callers - openshorts/motion_graphics/ — register effects + batch via compositor - openshorts/audio/ — never mix audio inside video/ - openshorts/prompts/ — one .md per prompt; loaded by name Per the brainstorming / web-research guidance the user requested: sub-CLAUDE.md files at directory boundaries keep guidance scoped and discoverable without bloating the root CLAUDE.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 5: write ROADMAP.md with three feature designs and a candid
account of what was deferred from Phase 1.
Ordering rationale (lowest blast radius first):
1. Motion Graphics Library — reuses overlay pattern, ships first
because its compositor is the prerequisite for A's audio batching.
2. Background Soundtracks + Ducking — self-contained at the audio
layer once the FFmpeg wrapper is migrated.
3. Layout Templates (educational, side-by-side, picture-in-picture) —
last because it touches the per-frame loop in pipeline.py.
Each feature section includes: rationale, architecture sketch, files
to add, integration points (referencing the new module paths from
the restructure), API surface, and risks.
Deferred refactors documented honestly:
- Full router split of app.py
- subprocess.run -> openshorts/video/ffmpeg.py migration
- Internal split of openshorts/saas/pipeline.py
- openshorts/core/{job_store,api_keys}.py extraction
- Frontend restructure (always out of scope this round)
Plus the commit log of what landed in this restructure and the revert
point: `git reset --hard pre-restructure-20260519-1526`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+ assets/ Top-level cleanup per fastapi/full-stack-fastapi-template conventions. The root is now a clear monorepo with three deployable services and no loose Python files. Layout: backend/ Python FastAPI (was: openshorts/ + root .py monoliths) frontend/ React + Vite dashboard (renamed from dashboard/) renderer/ Remotion service + compositions (was: render-service/ + remotion/) assets/ Committed fonts + screenshots (was: fonts/ + screenshots/) scripts/ Dev tooling (unchanged) Highlights: - openshorts/ Python package renamed to backend/app/ to match FastAPI template convention (uvicorn app.main:app). - Root app.py (the 2256-line FastAPI monolith) moved to backend/app/main.py; its shim imports now point at app.integrations.s3, app.editing.ai_filters, app.overlays.subtitles_*, app.thumbnails.*, app.saas.pipeline, etc. - Root main.py CLI moved to backend/app/cli.py with rewritten imports. - All root .py shims deleted (editor.py, hooks.py, subtitles.py, translate.py, s3_uploader.py, thumbnail.py, saasshorts.py) plus the three verify_*.py scripts that the test suite replaced. - backend/Dockerfile uses uvicorn app.main:app entrypoint. - renderer/service/Dockerfile updated for new compositions/ path. - docker-compose.yml updated: backend builds ./backend, frontend builds ./frontend, renderer builds renderer/service/Dockerfile. - Font path now auto-resolves walking up from hooks.py until it finds assets/fonts/, so tests work whether run from repo root or backend/. - scripts/update_claude_md.py: PACKAGE_ROOT -> backend/app, repo-map descriptions updated for new top-level layout. - CLAUDE.md: hand-written sections rewritten for new paths; auto-managed REPO-MAP/MODULE-MAP/ENV sections regenerated by the updater. - .gitignore: snapshot/fixture paths repointed under backend/tests/. Tests: 62/62 green (pytest -m "not e2e" from backend/). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brand Kit feature: 9-anchor position grid, per-ratio (9:16/16:9) sizing, font upload (system/bundled/user), and live chunk-cycling preview. Subtitle + Hook modals pre-fill from the live brand kit via useBrandKit so changes propagate without a page reload. Backend: new font endpoints (/api/fonts, /api/fonts/upload, /api/fonts/file/*), words_per_line threaded through SubtitleRequest -> generate_srt(max_words). OpenAPI snapshot regenerated for the new font routes. Also bundles the previous session's port refresh (3001/3002/3003 host mappings) and the HANDOFF.md briefing doc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the App.jsx tab switcher with react-router-dom (HashRouter) and
a new platform shell:
- 210px fixed Sidebar with 5 items (Dashboard, Short-form, Long-form,
Clip Generator, Settings).
- 50px persistent Header (page title + notification bell stub).
- New theme tokens in tailwind.config.js (bg #0c0c0c, sidebar #111,
surface #141414, indigo accent #5b5ef4, platform colors).
Extract cross-page state out of App.jsx:
- state/keysStore.js — Gemini/Upload-Post/ElevenLabs/fal keys + profile.
- state/jobStore.js — jobId/status/results/processingMedia/session.
- hooks/useJobPolling.js — /api/status/{id} polling loop.
- lib/crypto.js — XOR+Base64 helpers extracted from App.jsx.
Pages:
- /clip-generator carries the existing process flow (uses extracted stores).
- /settings keeps existing config UI (Gemini, Brand Kit, Upload-Post,
ElevenLabs, fal.ai) until Phase 2 rebuilds with VS Code layout.
- /dashboard, /short-form, /long-form are stubs until phases 3-4.
Legacy code preserved at /legacy/saasshorts, /legacy/thumbnails,
/legacy/ugc, /legacy/ai-agent — hidden from sidebar, reachable by URL.
main.jsx wraps App in HashRouter; resolveView treats hash starting with
'#/' as in-app so deep links survive reloads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rebuild Settings with a 180px left nav grouped into General / Platforms / System. Each item is its own route under /settings/*: - General: Brand Kit (live), subtitle style / color presets / export defaults (placeholders documenting future controls). - Platforms: shared PlatformSection driven by :platform route param, one panel per YouTube/TikTok/Instagram/Snapchat/Facebook. - System: API Keys (Gemini + Upload-Post + ElevenLabs + fal.ai with the same connect/profile flow as before), Processing history (placeholder). Notification system: - state/notificationsStore.js — localStorage-backed feed with pushNotification / markRead / clearNotifications + useNotifications(). - components/ui/NotificationBell.jsx — Header dropdown with unread badge, platform-colored dots, "Mark all read" + "Clear all". - ResultCard + ScheduleWeekModal now push a 'submitted' (or 'scheduled' / 'failed') notification per platform on /api/social/post. UI primitives: - components/ui/Tooltip.jsx — CSS-only group-hover label, no deps. - components/ui/InfoIcon.jsx — small lucide Info wrapped in Tooltip, used next to API key panels. Backend gap (plan TODO mutonby#9) still open: /api/social/post stays synchronous, so 'submitted' is the terminal client-side status until a publish_jobs queue + GET /api/social/publish/status/{id} exists. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the ShortForm placeholder with a 4-step wizard (Upload → Categorize → Processing → Review) and the supporting UI primitives. Backend integration uses existing /api/process per file — batch endpoint is plan TODO #1. Wizard state: - New `useWizard` hook (useReducer + localStorage rehydrate) with optional `lock` flag on the Processing step. Persists step + serializable data; File handles don't survive JSON round-trips, so reloads recover the step index but require re-upload to retry a lost batch. - Step indicator with back-navigation disabled during locked steps. Steps: - Upload: drag-drop or click-to-browse, up to 5 files, MP4/MOV ≤ 2 GB, client-side type + size validation. - Categorize: 4 category cards per clip (Educational / Yap / Live / Viral, defaults pre-filled — AI categorization is plan TODO mutonby#2) plus an auto-edit settings panel (color grade, auto subs, silence removal, face-focus layout). - Processing: parallel POST /api/process per file, per-row status, Skip enables once any clip completes, Review unlocks when every file reaches complete/error. SnakeGame fills the wait. - Review: 230px clip list (left) + phone-framed video preview with Before/After (blob URL for the original) + export bar (Download, Publish, Schedule, Send to CapCut). Publish/Schedule pushes a notification via the bell store (real /api/social/post wiring blocked on plan TODO mutonby#9). UI primitives: - `PhoneFrame` — 9:16 bezel with notch (sm/md/lg). - `SnakeGame` — self-contained 20×20 grid, arrows/WASD, space pause, auto-pauses on document.hidden. - `PlatformBadge` — color-coded chip per platform, reuses the bg-platform-* tokens. - `StatCard` — Dashboard stat panel (Phase 4 will consume it). History: - /short-form/history reads `openshorts.shortForm.history` (written by Processing on completion). Backend index endpoint is plan TODO mutonby#10. App.jsx import updated to `./pages/ShortForm/index.jsx`; the old single-file `ShortForm.jsx` is removed. Build verified: 1610 modules, 1264 KB JS chunk. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final phase of the UI/UX overhaul. Replaces the LongForm + Dashboard placeholders with a single-file 4-step wizard (Upload → Settings → Processing → Editor) and a real Dashboard that consumes the notifications store + local histories. Long-form wizard: - Reuses `useWizard` from Phase 3. - Step 1 Upload: single MP4/MOV up to 8 GB (4K), drag-drop or browse. - Step 2 Settings: 5 toggles — color grade, auto subtitles, chapter detection, description/tags, intro/outro. Each toggle is annotated with its backend TODO #. - Step 3 Processing: simulated 5-stage progress bar with `SnakeGame` on the side. The real pipeline branches (silence removal, LUT, chapter detection, intro/outro) are plan TODOs mutonby#4–mutonby#8; the timer here lets the rest of the wizard be exercised end-to-end. - Step 4 Editor: 16:9 video preview + chapter timeline scrubber + right-panel tabs (Chapters / Subtitles / Export). Chapters are seeded from Step 3 (placeholder until backend TODO mutonby#6 ships). Inline rename + seek-on-click. "Export segment as short" opens a modal that documents the pending /api/long-form/export-segment route (plan TODO mutonby#7). - History tab reads `openshorts.longForm.history` (written on processing complete). Dashboard: - Three StatCards: clips processed (sum of short-form clip counts + long-form edits), scheduled (notifications with status='scheduled'), published (notifications submitted/published). Deltas surface the next platform on deck and the latest publish. - Upcoming uploads panel: filtered notifications list with platform badges and timestamps. - Recent activity panel: last 8 notifications, any type. - All values derive locally — the live backend feed lands with plan TODO mutonby#10 (GET /api/clips/recent). Wiring: - App.jsx import switched to ./pages/LongForm/index.jsx; the old single-file LongForm.jsx is removed. - Build verified: 1616 modules, 1288 KB JS chunk. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surfaced by the first browser smoke test after the 4-phase UI overhaul (npm run build was green but never exercised the running app), plus follow-ups for the Codex adversarial review's HIGH/MEDIUM findings. Bugs caught by smoke test: - backend/app/main.py: run_job invoked `python -u main.py` which no longer exists post-restructure. Container log: "python: can't open file '/app/main.py'". Switched to `python -u -m app.cli`. Without this, every short-form Processing job exits with code 2. - frontend/src/pages/LongForm/steps/Processing.jsx: under React StrictMode (dev) the `startedRef` gate paired with the cleanup `clearInterval` caused mount #1 to start the timer, cleanup to clear it, and mount mutonby#2 to bail early — simulated progress stuck at 0%. Removed the gate; idempotent setData prevents double-reset. - frontend/src/components/ProcessingAnimation.jsx:228-229: two unescaped `>` chars in JSX text → `{'>'}`. Removes the only build warnings. Codex adversarial review remediation: - H1 / backend/app/main.py (input validation for STATE-MUTATING /api/process): new `_ensure_video_upload(filename, first_chunk)` rejects on extension (.mp4/.mov) and on missing MP4/MOV `ftyp` signature at byte offset 4. Validation runs before any disk write, so junk uploads no longer reach the pipeline. Returns 415 with a precise reason. Verified: text-content-with-.mp4-extension → 415 ftyp; real-mp4-with-.txt-extension → 415 ext; real .mp4 → 200. - H2 / frontend/src/hooks/useWizard.js + both Wizard.jsx callers: new optional `resetOnRehydrate(mergedData)` predicate. When it returns true (e.g. wizard state references a File that no longer survives JSON), the rehydrate force-resets to step 0 with initialData and clears localStorage. Eliminates the stranded-state bug where users could navigate past Categorize/Settings into a step that always fails. - M3 / frontend/src/pages/ShortForm/steps/Processing.jsx: polling effect now uses an AbortController + cancelled flag, fetchStatus accepts a signal, and the setData updater skips writes when the job has already moved to 'complete' or 'error'. Drops stale 'processing' responses that race past newer terminal updates. Tests / verification: - frontend npm run build: clean, 0 warnings (down from 2). - backend pytest -m "not e2e": 61/62 pass (test_openapi_dump_matches _baseline drifts on pydantic-emitted contentMediaType vs the baseline's format:binary — pre-existing, unrelated to these diffs, no route added/removed). - manual smoke: all 5 sidebar pages, all 4 legacy routes, short-form end-to-end (POST /api/process now succeeds; transcription runs; fails at Gemini with dummy key as expected), long-form end-to-end (simulated progress reaches 100%, Editor opens, Export modal works). security_baseline: applies: true surfaces: - id: POST /api/process (this change) tier: STATE-MUTATING controls: C3_input: { status: covered, mechanism: "extension + ftyp magic-bytes check before disk write" } C1_auth: { status: covered, mechanism: "X-Gemini-Key header (BYOK)" } C2_rate_limit: { status: opted_out, justification: "self-hosted single-tenant deployment; per-IP cap on the host process queue (MAX_CONCURRENT_JOBS) is the effective ceiling. Tracking: full rate-limit pass under /gsd-secure-phase." } C4_timeout: { status: opted_out, justification: "subprocess is intentionally unbounded — clip generation legitimately takes 5-60min. Tracking: kill switch via abuse detection in /gsd-secure-phase." } C7_idempotency: { status: opted_out, justification: "client retries would re-submit a fresh job_id; dedup is at user discretion. Tracking: idempotency-key in /gsd-secure-phase." } C8_concurrency: { status: covered, mechanism: "asyncio.Semaphore(MAX_CONCURRENT_JOBS) gate in process_queue" } C9_audit: { status: covered, mechanism: "attestation log line with IP + UA + timestamp + source per job" } C10_abuse: { status: opted_out, justification: "BYOK — cost is on the user's Gemini account, not the host. Tracking: per-user spend cap if multi-tenant ever ships." } Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restructures ROADMAP.md into two top-level sections: 1. Product roadmap — user-facing feature backlog, tiered as Shipped / Stubbed-in-v1 / Later. Each Stubbed item names the backend TODO that unblocks full functionality so the wiring map is unambiguous. Covers Short-form, Long-form, Clip Generator, Dashboard, Settings, and the Notifications system. 2. Technical roadmap — unchanged content for Features A/B/C designs and the deferred-refactor table, kept under a clearly marked heading. The frontend-restructure row in the deferred table is updated to "superseded by the 4-phase UI overhaul". Adds a Follow-ups section capturing what the smoke-test pass surfaced but didn't ship: - Backend security-baseline gaps for POST /api/process (C2 rate limit, C4 timeout/breaker, C7 idempotency, C10 abuse cap) - Three frontend polish items (Dashboard caption mismatch, Skip/Review both disabled on all-error, useRef tidy-up) - Two infra gotchas (Docker /app/node_modules anonymous volume, OpenAPI snapshot Pydantic version drift) - Codex re-run note (task-mpdeyzjz-vpdetv reference for the baseline audit that produced the H1/H2/M3 list) Updates the "What landed" log with the most recent commits (brand kit, 4-phase UI overhaul, smoke-test fix). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surfaced by the first real Gemini-key smoke test. The short-form
Processing step polled `/api/status/{job_id}` and trusted the keys
verbatim, but the backend contract speaks a different vocab than the
wizard:
backend wizard expected
---- ----
status=completed status=complete
status=failed status=error
result=... result=... (wizard was reading data.results)
So a successful job never tripped the overallStatus → 'complete'
transition (Review button stayed disabled) and the clip metadata
never reached `j.result` (Review step would have shown an empty list).
The legacy useJobPolling.js already mapped both at the boundary —
this mirror that here.
Added `normalizeJobPayload(data)` next to `fetchStatus` so the rest
of the component stays in the wizard's vocab. Verified end-to-end
on the demo MP4: Gemini returns 1 viral clip, wizard auto-advances
to Review, PhoneFrame renders the 17 s vertical output with title,
description, Download, Publish×5, Schedule×5 buttons.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Project-specific guidance now lives under a "## OpenShorts (project-specific)" H2 section in the user's global CLAUDE.md. scripts/update_claude_md.py is retargeted to that file (overridable via OPENSHORTS_CLAUDE_MD) and remains idempotent; the project-root CLAUDE.md is removed. Trade-off: the OpenShorts module map + env table is now loaded by Claude in every project session, not just this repo's. Re-pointable later via the env var. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Upload.jsx now reads videoElement.duration via a hidden <video preload="metadata"> when files are added, so Processing.jsx can show a real ETA instead of a hashed placeholder. HANDOFF.md captures the session-end state for the next agent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tles Phase 1 of the polish work. Two layers of changes landed together because they touch the same files: Auto-pipeline (new): - POST /api/process now accepts category + 4 auto-edit toggles (auto_edit, auto_subtitles, color_grade, silence_removal) plus a subtitle_style JSON from useBrandKit(). All bounds-checked via the new SubtitleStyle pydantic model in main.py and _parse_subtitle_style; invalid input returns 400. - After the CLI subprocess produces raw reframed clips, run_job calls _run_auto_pipeline (new) which chains AI edit -> color grade (Phase 2 stub) -> silence removal (Phase 2 stub) -> subtitles per clip. Each step writes a sibling file; originals are preserved so Phase 3's per-clip Review toggles can swap URLs without re-rendering. Per-clip failures log but never fail the whole job. - Helpers moved to backend/app/editing/auto_pipeline.py so the route handlers (/api/edit, /api/subtitle) keep working unchanged for the legacy ResultCard. - status='completed' is now flipped AFTER the auto-pipeline finishes, so the wizard never navigates to Review with raw URLs mid-polish. - Frontend: Categorize relabels faceLayout -> autoEdit, reorders to match the backend chain. Processing.jsx hooks useBrandKit() and sends the new Form fields. Brand-kit 3x3 positions are aliased server-side to the burner's top/middle/bottom. - backend/tests/unit/test_auto_pipeline_config.py (NEW, 48 tests) pins bounds, hex-color rejection, position aliasing, bool coercion, and the category allowlist. - backend/tests/snapshots/baseline.openapi.json regenerated for the 6 new Form fields. Carried in from the prior session (intermingled in the same files): - Categorize.jsx adds an amber 'no Gemini key' gate banner + disables Start Processing when the key is missing. - Processing.jsx becomes reactive to keys.gemini (drops startedRef), surfaces ETA from probed duration, and splits Queued into awaiting_key / uploading / queued / processing / complete / error with specific captions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…eview Adds a 5-button segmented control (Original | + Edit | + Grade | + Cut | + Subs) beneath the phone preview that swaps the displayed variant URL on click. The auto-pipeline already emits a chain of variants per clip (original → edited → graded → silencecut → subtitled); this lets the user step through them. Missing variants get a [+] icon. Clicking [+] calls the matching endpoint (/api/edit, /api/colorgrade, /api/silencecut, /api/subtitle) with the most recent existing variant as input_filename, then merges the returned filename into wizard.data and switches the displayed stage to it. A LUT picker dropdown (cool / noir / teal_orange / vivid / warm) appears under the segmented control when the graded stage is selected. Changing the LUT triggers /api/colorgrade with the chosen lut_name; the dropdown is disabled while the request is in flight. Stage selection + chosen LUT persist per-clip in wizard.data.clipStages / clipLuts so reloads keep the user's choices (modulo the existing rehydrate-bounce when the source File handle is lost — pre-existing wizard behavior, unchanged). Smoke-tested in browser: stage swap updates Download URL, LUT change to noir re-grades the on-disk file (mtime confirmed), /api/silencecut returns the expected response shape. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AI Restyle is a new sibling to Short-form / Long-form. User uploads a video they made; product relights and re-backgrounds it using a Nano Banana frame as the style reference for a video-to-video model, preserving the original motion, content, and audio. v1 scope: - Sidebar entry between Long-form and Short-form - 3-step wizard: Upload → Configure → Review - Two preset dimensions (Background + Lighting), 5 hand-tuned seed presets each, CRUD in Settings → "AI Restyle" tab - Per-job prompt override via inline textarea - 30s duration cap - Original audio preserved bit-for-bit - No editing/subs/color-grade (those belong to Short-form; "Send to Short-form" CTA closes the loop) Video-to-video model choice deferred to implementation Phase 0 spike (Wan v2.5 / Luma Ray2 / Runway Gen-3 candidates; cost ≤$2 per 30s). 10 explicit decisions documented (D1-D10), including the 30s cap rationale, the no-editing scope choice, and the Phase 0 model-selection deferral. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Plan breaks the 7-phase milestone into bite-sized TDD tasks: Phase 0 model spike, Phase 1 frame_extract + frame_relight, Phase 2 video_restyle + restyle_pipeline orchestrator, Phase 3 routes + OpenAPI snapshot, Phase 4 preset store + wizard pages, Phase 5 Settings tab CRUD, Phase 6 smoke test + Codex review + ROADMAP/CLAUDE.md refresh. Each task is 2-5 minutes (test → run-fail → impl → run-pass → commit) per the writing-plans skill conventions. ROADMAP.md gets a new top-of-product-roadmap section pointing at both the spec and the plan, with the v1 scope (30s cap, 3-step wizard, preset CRUD) and the explicit out-of-scope items (>30s chunking, ShortForm bridge, AI preset suggestion). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Backend - New backend/app/video/merge.py: concat_clips() normalizes each input to 1080x1920@30fps + AAC 48 kHz stereo, then stitches with FFmpeg concat=v:a. All FFmpeg ops funnel through the wrapper per Convention #1. - POST /api/merge: bounds-checks clip_indices against the job's clip count, dedups while preserving user-picked order, allowlists transition ("cut"), rejects single-clip requests at the schema layer. Output filename encodes the ordered indices so the same merge is naturally idempotent. Frontend (Review.jsx) - Per-clip checkboxes in the sidebar; selection locks to a single job_id. - "Merge N selected" CTA appears in the export bar at ≥2 selections. - Confirmation modal with up/down reorder + remove; "Re-render" calls /api/merge and pushes the result into wizard.data.mergedClips for persistence across reloads. - New "Merged outputs" sidebar section previews merged files; stage selector, LUT picker, and before/after toggle are hidden while previewing a merge. Tests - backend/tests/unit/test_merge.py (8 cases) covers the normalize filter, concat arg composition, empty/single-input rejection, missing-file detection, and the "ffmpeg produced empty output" guard. - backend/tests/api/test_merge_endpoint.py (6 cases) covers HTTP-layer validation, dedup, and the happy-path response shape. - baseline.openapi.json regenerated to lock in the new route. - pytest 164/164 (up from 150), npm run build 0 warnings. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codex audit flagged a real race in the Phase 4 merge helper: two clients
POSTing /api/merge with the same clip_indices both write to the same
deterministic `merged_{indices}.mp4` filename with ffmpeg's `-y` flag.
A reader hitting `/videos/{job_id}/merged_*.mp4` between writer-A start
and writer-B finish could see a partial / mid-write file.
Fix: concat_clips writes to `{output}.partial-{nonce}.mp4`, then
os.replace()s onto the public path. Idempotency on the URL is preserved
(filename stable), but the public path is never half-written: readers
see the prior file or the new file, never a partial.
On ffmpeg failure the partial is cleaned up; the stable output stays
intact.
Tests: 3 new unit cases (partial path used, cleanup on failure, unique
partials across calls) + 3 new API cases (negative-index reject, case-
normalized transition, concurrent identical merges converge on same
public path). Suite: 170/170 (up from 164).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-clip routes Codex audit found 5 routes that index `clips[req.clip_index]` with only a `>= len(clips)` check, so a request with `clip_index=-1` would silently mutate the LAST clip: - /api/edit (main.py:776, no bounds check at all) - /api/effects/generate (main.py:993, no bounds check at all) - /api/clip/.../transcript (main.py:910, only >=) - /api/subtitle (main.py:1097, only >=) - /api/hook (main.py:1453, only >=) Phase-2 routes (/api/colorgrade, /api/silencecut) already go through `_resolve_clip_input` which validates negatives; /api/merge has its own explicit `idx < 0` check. This brings the legacy surface to parity. Also defends `_persist_clip_url` and the inline persistence in /api/subtitle (L1175/1180/1261/1270) with `0 <= idx < len` — even though the route entries now block negatives, helpers should not assume callers have validated. Tests: 5 new contract cases in test_legacy_negative_clip_index.py — one per affected route, asserting 400/404/422 for `clip_index=-1`. Before the route guards these tests hang because the routes fall through to real Gemini/FFmpeg/Whisper work on the last clip (which has fake bytes). Suite: 175/175 (up from 170). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d-C4) Codex audit flagged that ``app.video.ffmpeg.run`` defaulted to ``timeout=None``, making any state-mutating ffmpeg call (`/api/merge`, ``/api/colorgrade``, ``/api/silencecut``, etc.) a DoS primitive: a hostile or corrupt input could pin a worker thread forever. The deferred C4 control was therefore exploitable today even with auth/rate-limit deferred. Fix: * ``DEFAULT_TIMEOUT`` (1800s, override via ``FFMPEG_TIMEOUT_SECONDS``) applied when caller passes ``timeout=None``. * ``DEFAULT_PROBE_TIMEOUT`` (30s, override via ``FFPROBE_TIMEOUT_SECONDS``) applied to ``probe_resolution``/``probe_duration``. * ``subprocess.TimeoutExpired`` wraps into ``FFmpegError(returncode=-1)`` so callers see a single exception type and the error message carries the configured timeout for triage. Tunables on env vars (defensible 30-min worst case for a 50-clip merge of 60s sources; production can lower with ``FFMPEG_TIMEOUT_SECONDS=600``). Tests: 6 new unit cases in tests/unit/test_ffmpeg_wrapper.py covering default application, explicit pass-through, timeout-to-FFmpegError wrapping, probe defaults, sanity of constant values. Suite: 181/181 (up from 175). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex audit (focus 2, 3 BLOCKERs) confirmed that ``/api/edit``, ``/api/subtitle``, and ``/api/hook`` were calling ``subprocess.run(['ffmpeg', ...])`` and ``subprocess.check_output(['ffprobe', ...])`` directly instead of going through ``app.video.ffmpeg``. That dodged: * the wrapper's UTF-8 locale setup (the ``ai_filters.py`` site even had a bytes-encoding workaround for the missing locale) * the Phase-5 B-2 default timeout (deferred-C4 DoS) * the uniform ``FFmpegError`` surface Migrated: * ``app/editing/ai_filters.py`` — ``apply_edits`` copy fallback, ``probe_resolution`` for input dimensions, main filter run. The bytes-encoding hack at L218-232 disappears because the wrapper sets ``LANG=C.UTF-8`` / ``LC_ALL=C.UTF-8``. * ``app/overlays/subtitles_render.py`` — ``burn_subtitles`` routes through ``ffmpeg_wrapper.run``; ``FFmpegError`` already carries the stderr payload so the manual decode/raise is gone. * ``app/overlays/hooks.py`` — both the input probe and the overlay burn-in now use the wrapper. Added regression test ``tests/unit/test_ffmpeg_wrapper_invariant.py`` that pins the three migrated files: any future commit that re-introduces a direct ``subprocess.*(['ffmpeg' / 'ffprobe' ...])`` call in those modules fails the suite (6 cases, 2 invariants × 3 files). Out of scope (documented in the test docstring): ``video/pipeline.py`` (per-frame Popen with stdin streaming), ``cli.py``, ``saas/pipeline.py``, and one remaining ffprobe call in ``main.py`` /api/effects/generate. Those need wrapper helpers we don't have yet — separate /gsd-secure-phase sweep. Suite: 187/187 (up from 181). Live curl against /api/subtitle and /api/hook returns the expected 404/422 responses post-restart. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex audit (focus 1, BLOCKER) showed that /api/edit + auto_pipeline's apply_ai_edit executed Gemini-produced filter strings via FFmpeg ``-vf`` with only a comparison-operator cleanup pass. A malicious response like ``movie=/etc/passwd,scale=1:1`` would trigger filesystem reads through the ``movie`` filter. Same risk for ``amovie``, ``subtitles``, ``ass``, ``concat`` (file:= option), ``sendcmd``, ``asendcmd``. Fix: strict allowlist + explicit deny list in ``app/utils/filters.py``: * ``_ALLOWED_FILTERS`` enumerates safe filters used by the prompt (zoompan, eq, hue, curves, unsharp, …) plus pipeline essentials (scale, setsar, fps, format, fade, …) and a few common visual safe primitives (vignette, drawbox, lutyuv/rgb, gblur, …). * ``_DISALLOWED_FILTERS`` explicitly bans movie/amovie/subtitles/ass/ concat/sendcmd/asendcmd as a defense-in-depth backstop. * Parser strips ``[label]`` brackets, splits the chain on both ``,`` and ``;`` (filter_complex), extracts the leading filter name, and fails closed on anything outside the allowlist. * ``UnsafeFilterError(ValueError)`` is raised on rejection. * ``VideoEditor.apply_edits`` calls the validator AFTER comparison- operator sanitization (since the post-sanitization form is what executes) and BEFORE the FFmpeg invocation. * ``/api/edit`` route handler surfaces ``UnsafeFilterError`` as a 400 with a frontend-friendly message instead of a generic 500. Tests: 14 new cases in ``tests/unit/test_filter_safety.py`` covering the Codex reproducer ``movie=/etc/passwd``, plus amovie/subtitles/ass/ concat, chain-position attacks (evil filter after legit one, evil after ``;``), unknown filters, bracket-label handling, whitespace tolerance, error-message specificity, and TypeError on non-strings. Suite: 208/208 (up from 187). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex audit (focus 4, 2 BLOCKERs) flagged concurrent mutations on
``jobs[]`` and ``metadata.json``:
main.py:39 — jobs dict mutated by route handlers + executor threads
with no synchronization
main.py:1263 — _persist_clip_url's read-modify-write on metadata.json
can lose updates when /api/colorgrade and
/api/silencecut fire concurrently on the same clip
Fix:
* ``_JOB_LOCKS: Dict[str, threading.Lock]`` keyed by job_id, with a
guard lock around the dict-of-locks. ``_job_lock(job_id)`` creates
the lock lazily.
* ``_atomic_write_json(path, data)`` writes via ``.tmp-{pid}-{tid}``
+ ``os.replace`` so a crashed writer cannot leave a half-written
metadata.json on disk.
* ``_persist_clip_url`` now holds the per-job lock for the entire
read-modify-write window AND writes via atomic-rename.
* The inline metadata persistence in ``/api/subtitle`` (L1175-1190)
uses the same lock + atomic write — Codex specifically called out
that this path duplicated the unsynchronized pattern.
Threading.Lock (not asyncio.Lock) because mutators are called from
inside ``run_in_executor`` (sync code on worker threads). Lock is
per-job_id, so unrelated jobs don't contend.
Tests: 6 new cases in ``tests/unit/test_job_lock.py`` covering:
- lock identity (same lock for same job, different for different)
- lock type (threading.Lock acquire/release)
- persist writes to memory + disk in one shot
- atomic-write never leaves a partial file
- 32 concurrent _persist_clip_url calls across 4 clips on the same
job all land (vs. previous behavior where some updates would be
lost to stale-read writers).
Suite: 214/214 (up from 208).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… drift Pre-existing drift: the running FastAPI/Pydantic stack renders binary upload fields as ``contentMediaType: application/octet-stream`` instead of the older ``format: binary``. Path keys are identical — no contract change, just a library-version rendering tweak. Regenerating so each downstream commit can pass the contract test independently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 Task 1.1 of the AI Restyle plan. Extracts the frame at t=0 of a video to PNG via the central FFmpeg wrapper — the first frame is what Nano Banana (next task) will relight to seed the v2v style reference. Test uses a synthetic ``testsrc=duration=2`` clip so it runs anywhere ffmpeg is available, no binary checked into git, no dependency on ``demo-openshorts.mp4`` (which sits outside the backend volume mount). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…raints Phase 1 Task 1.2 of the AI Restyle plan. Sends the first frame of the source video to Gemini's image-preview model with a user-supplied background+lighting prompt plus hard-coded safety constraints that pin the subject, pose, composition, and framing. Returns the relit PNG, which Phase 2 will hand to the video-to-video model as a style reference. Plan deviation: uses ``gemini-3.1-flash-image-preview`` to match the codebase's existing thumbnails/images.py call site, instead of the plan's stale ``gemini-2.5-flash-image-preview`` (the codebase moved forward between plan writing and execution). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 Task 2.1 of the AI Restyle plan. Two new modules:
1. ``backend/app/integrations/fal.py`` — public submit_and_poll +
upload_file helpers wrapping fal.ai's queue + storage REST APIs.
No new dependency: httpx only. The legacy SaaSShorts pipeline keeps
its private ``_fal_run`` for now; DRY refactor deferred to keep this
PR scoped to AI Restyle.
2. ``backend/app/ml/video_restyle.py`` — restyle_video(video, ref, out)
uploads both inputs, runs the v2v model, downloads the result.
Plan deviations:
- Used raw httpx instead of the ``fal-client`` SDK to match how
``app/saas/pipeline.py:_fal_run`` already calls fal.ai (and avoid a
new pinned dependency).
- Model defaults to ``fal-ai/wan/v2.5/turbo/video-to-video`` ahead of
the Phase 0 spike. The spike is parked until the user provides a
FAL_KEY for cross-model benchmarking; swap MODEL_ID once it runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 Task 2.2 of the AI Restyle plan. Async coroutine that drives duration probe → first-frame extract → Nano Banana relight → fal.ai v2v → audio mux, with each ML/FFmpeg call dispatched to a worker thread via ``loop.run_in_executor`` (matches Short-form's pattern so the FastAPI event loop is never blocked by ffmpeg / a fal.ai poll). Mutates the supplied ``jobs[job_id]`` dict in place so the route handler + frontend polling see live progress + status. Any exception flips status to 'failed' and appends to logs — the coroutine itself never raises. Plan deviation: file lives in ``backend/app/restyle/pipeline.py`` (new product folder, mirrors ``saas/``) rather than the plan's ``app/saas/restyle_pipeline.py``. Per CLAUDE.md Convention mutonby#7 (short-form / long-form / new products stay isolated), AI Restyle's backend gets its own package — same rule the frontend follows for pages/AIRestyle/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the AI Restyle pipeline into the public API:
- POST /api/restyle — multipart upload + background_prompt + lighting_prompt
form fields. Requires X-Gemini-Key + X-Fal-Key headers.
Persists upload via the existing _ensure_video_upload
guard (MIME+ftyp), enforces 500-char prompt cap and
2GB file cap, seeds jobs[job_id] with product="ai-restyle",
schedules run_restyle_job via FastAPI BackgroundTasks.
- GET /api/restyle/{job_id} → RestyleStatus pydantic model surfacing the
live jobs[job_id] {status, logs, progress_pct, result}.
First populated module under backend/app/routes/ — establishes the pattern
for the eventual main.py router split.
Plan deviations (from docs/superpowers/plans/2026-05-20-ai-restyle.md):
1. Import path is `app.restyle.pipeline` not `app.saas.restyle_pipeline`
(per D11 — AI Restyle is its own product per CLAUDE.md Convention mutonby#7).
2. Auth check moved BEFORE prompt-length / file-signature checks (matches
the existing /api/process pattern — minor info-leak avoidance).
3. background_tasks.add_task(run_restyle_job, **kwargs) directly — no
asyncio.create_task wrapper. FastAPI 0.136.1 awaits async tasks natively.
4. Added file-size cap (MAX_FILE_SIZE_MB=2048) matching main.py — plan did
not enforce one, which would have invited a disk-fill DoS.
5. 8 tests, not 4 — adds happy-path enqueue (mocked pipeline), lighting-
prompt-too-long, non-MP4 rejection, and seeded GET happy path.
OpenAPI baseline regenerated; diff confirms /api/restyle and
/api/restyle/{job_id} are the only path additions (rest of contract
unchanged). Backend pytest 250/250 green (242 prior + 8 new).
security_baseline (per ~/.claude/skills/securing-http-and-llm-endpoints):
applies: true
surfaces:
- id: POST /api/restyle
tier: LLM-CALL # downstream Gemini Nano-Banana + fal.ai v2v
controls:
C1_auth: { status: covered, mechanism: "X-Gemini-Key + X-Fal-Key headers required; matches /api/process" }
C2_rate_limit: { status: opted_out, justification: "no per-IP/user RL anywhere in OpenShorts; systemic gap tracked for /gsd-secure-phase" }
C3_input: { status: covered, mechanism: "_ensure_video_upload MIME+ftyp; 500-char prompt cap; 2GB file cap; 30s duration cap downstream in restyle/pipeline" }
C4_timeout: { status: covered, mechanism: "integrations/fal.py 600s timeout w/ poll; ffmpeg.py FFMPEG_TIMEOUT_SECONDS=1800 default" }
C5_output_rate: { status: covered, mechanism: "async job; finite JSON result; no streaming surface" }
C6_redaction: { status: opted_out, justification: "user supplies own prompt text; no user PII flows into LLM beyond user-authored prompt" }
C9_audit: { status: partial, mechanism: "jobs[].logs captures pipeline steps; no separate audit log (systemic gap)" }
C10_abuse: { status: partial, mechanism: "30s duration cap bounds per-job spend to ~\$1.24 worst case (Nano-Banana ~\$0.04 + Wan v2.5 ~\$0.04/s); no per-user/IP daily quota" }
- id: GET /api/restyle/{job_id}
tier: PUBLIC-READ # mirrors /api/status/{job_id} which is also unauth
controls:
C2_rate_limit: { status: opted_out, justification: "matches existing /api/status pattern; systemic gap" }
C3_input: { status: covered, mechanism: "job_id is path str; 404 on miss; no parser surface" }
C4_timeout: { status: covered, mechanism: "in-memory dict lookup; no I/O" }
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two dimensions (backgrounds + lightings), 5 hand-tuned seed presets each. Mirrors keysStore.js — localStorage-backed, broadcasts on a custom event + listens on the storage event for multi-tab consistency. Public surface: - getPresets() - setDefault(dimension, id) - upsertPreset(dimension, preset) - deletePreset(dimension, id) (refuses to delete the default) - useAIRestylePresets() React hook Shared dependency for the wizard (Configure step) and the Settings tab (preset CRUD). Built first so both downstream surfaces have a stable contract to import. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a "General → AI Restyle" panel under Settings. Two stacked lists
(Backgrounds + Lightings), each with star/edit/delete affordances and an
"Add preset" button. Edit modal enforces 40-char label cap + 500-char
prompt cap with a live counter; default preset cannot be deleted.
Plan deviation: the plan suggested a component-array tab list shape
(`{ id, label, component }`), but the existing Settings page uses
react-router child routes with a NavLink-driven left rail. Followed
the existing pattern instead — one nav entry in Settings/index.jsx +
one <Route> in App.jsx wires it up cleanly.
Wired through:
- frontend/src/pages/Settings/index.jsx — new "AI Restyle" item under General
- frontend/src/App.jsx — child route `general/ai-restyle`
No wizard yet — that ships in Phase 4b. The Settings UI is functional in
isolation: preset edits broadcast via the storage event so any future
subscribed component (i.e. the wizard's preset dropdowns) will re-render
without a reload.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the full user-facing surface for AI Restyle. Sidebar entry sits
between Long-form and Short-form (Wand2 icon). Wizard mounts under
/ai-restyle/* with Wizard/History sibling tabs (mirrors ShortForm shape).
New pages:
- pages/AIRestyle/index.jsx — Outlet shell + Wizard/History NavLinks
- pages/AIRestyle/Wizard.jsx — 3-step state machine via useWizard
- pages/AIRestyle/History.jsx — placeholder ("tracking lands in follow-up")
- pages/AIRestyle/steps/Upload.jsx — client-side duration probe; 30s cap
- pages/AIRestyle/steps/Configure.jsx — preset dropdowns + 2 prompt textareas
+ POST /api/restyle
- pages/AIRestyle/steps/Review.jsx — 2s polling; phone preview with
Before/After toggle; Download +
Send-to-Short-form CTAs
Plan deviations:
1. Configure step uses TWO separate textareas (background + lighting)
instead of the plan's single combined textarea with newline-split. The
split-by-newline approach is fragile to multi-line user edits and
the backend already accepts the two prompts as separate form fields —
so the simpler UX matches the wire contract 1:1.
2. Configure step renders a yellow "Missing API keys → Settings" banner
when keys.gemini or keys.fal is empty (plan only surfaced this as an
error on Submit). Clearer UX, single Link to /settings/system/api-keys.
Send-to-Short-form: stashes {url, name} in sessionStorage under
'openshorts.shortForm.handoff' and navigates to /short-form. Full
wire-through into ShortForm/Upload to consume that payload is documented
as a follow-up per the plan's "stubbed in v1" note.
Sidebar position respects the handover constraint: between Long-form and
Short-form (Wand2 icon). Per CLAUDE.md Convention mutonby#7 the new pages/ live
in their own folder and DO NOT cross-import with ShortForm or LongForm —
the only cross-product touchpoint is the sessionStorage handoff, which
is a one-way browser-storage handshake, not a code dependency.
Frontend build: clean, 0 errors.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses two Codex HIGH findings from the adversarial review of PR mutonby#35 (commit 0cc0c3e, Codex job task-mpeq4ynb-qd7sfx): HIGH-1 (SSRF / internal-network fetch): ml/video_restyle.py:67 streamed `response["video"]["url"]` directly from the fal API response with no host validation. A compromised or malicious fal payload could redirect a server-side fetch at e.g. http://169.254.169.254/... (cloud-instance metadata) or any internal service the backend can reach. HIGH-2 (credential exfiltration via queue URLs): integrations/fal.py:83-88 trusted `status_url` and `response_url` from the submit response and then issued `Authorization: Key <fal_key>` to those URLs. If those response fields were ever attacker-influenced, the fal API key would have been sent to an attacker-controlled host. Fix: two validators in integrations/fal.py. - require_fal_queue_url(): HTTPS-only, host must equal queue.fal.run. Called on status_url + response_url before any Authorization-header request is built. - require_fal_download_url(): HTTPS-only, host must be fal.ai / fal.run / fal.media or a subdomain. Called on the response['video']['url'] before httpx.stream() opens the connection. Both raise FalError on mismatch; the route handler propagates the failure through the existing run_restyle_job exception path (jobs[id].status = "failed", message lands in jobs[id].logs). Tests: - tests/unit/test_fal_integration.py: * test_submit_and_poll_rejects_attacker_controlled_status_url * test_submit_and_poll_rejects_attacker_controlled_response_url * test_submit_and_poll_rejects_non_https_queue_url - tests/unit/test_video_restyle.py: * test_restyle_video_rejects_untrusted_download_url - Updated existing test_video_restyle.py mocks from "cdn.fal" (not a real fal hostname) to "v3.fal.media/files/..." so the legitimate path still passes the allowlist. Backend pytest 255/255 (was 250 + 5 new). Codex MEDIUM follow-ups (deferred to separate issues / PRs): - Prompt-injection hardening (delimit user prompts as untrusted data) - AI Restyle write path bypasses _job_lock (consistency-only risk today) - Gemini content-policy refusal catches broad Exception - fal client poll/result responses miss raise_for_status Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses Codex HIGH-3 from the adversarial review of PR mutonby#35 (commit 0cc0c3e, Codex job task-mpeq4ynb-qd7sfx): Original behavior: POST /api/restyle accepted up to 2GB (matching /api/process). The 30s duration cap was enforced AFTER the file landed on disk via the run_restyle_job probe (restyle/pipeline.py:71-76). An attacker could repeatedly upload 2GB files of any duration and force ~2GB of uploads/ disk churn per request before the duration check rejected them. Fix: 1. Lower the AI Restyle cap to 250MB (overrideable via AI_RESTYLE_MAX_FILE_SIZE_MB env var). 30s of video at 67 Mbps already covers streaming-quality bitrates; pathological multi-GB 30s clips don't need to be accepted. 2. Add a Content-Length header preflight BEFORE allocating any disk space. Clients can lie about Content-Length, but for legitimate uploads (which honestly report size) this rejects fast. 3. Keep the streaming size check as backstop for clients that send chunked-transfer or that lie about Content-Length. Tests: - tests/api/test_ai_restyle.py: * test_post_restyle_rejects_oversize_content_length — monkeypatches MAX_FILE_SIZE_MB to 1MB, uploads 2MB, expects 413. Backend pytest still 255/255 (test was already passing on the streaming backstop; the preflight is a defense-in-depth improvement the test does not specifically isolate). Codex MEDIUM follow-ups remain (will land as separate issues / PRs): - Prompt-injection hardening (delimit user prompts as untrusted data) - AI Restyle write path bypasses _job_lock (consistency-only risk today) - Gemini content-policy refusal catches broad Exception - fal client poll/result responses miss raise_for_status Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex adversarial security review — completeRan FindingsHIGH (3) — all fixed inline in this PR:
Fixes added MEDIUM (4) — deferred to separate follow-ups:
Controls confirmed by Codex:
Overall risk assessment (Codex verbatim): "AI Restyle is not exposing obvious shell injection, raw API-key storage in job state, or React HTML injection. The main deploy blockers are server-side network trust around fal-provided URLs and disk-exhaustion pressure from accepting very large files before the 30s duration gate." — Both addressed by Test statusBackend pytest 255/255 green (+8 new test cases this PR). Frontend build clean. Known systemic gaps (already documented in commit security_baseline YAML, NOT new findings)
Ready for review. |
Reframe AI Restyle's ROADMAP.md section from "planned" to the Shipped / Stubbed in v1 / Later tiering used by Short-form and Long-form, now that the feature is in review on PR mutonby#35. - Shipped: enumerates the wizard, preset store, Settings CRUD, 7-step async pipeline, /api/restyle routes (with the 250MB cap + Content-Length preflight + fal hostname allowlist), and the Codex audit outcome. - Stubbed in v1: Phase 0 model spike (blocked on FAL_KEY), placeholder History tab, and the 4 deferred Codex MEDIUM follow-ups. - Later: keeps the pre-existing scope-deferral list and adds the saas/pipeline.py:_fal_run DRY cleanup as a separate-PR item. Also adds AI_RESTYLE_MAX_FILE_SIZE_MB=250 to .env.example so the new upload cap is discoverable and picked up by the auto-managed ENV table in ~/.claude/CLAUDE.md. scripts/update_claude_md.py was run; the module-map now includes integrations/fal.py, ml/frame_{extract,relight}.py, ml/video_restyle.py, restyle/pipeline.py, and routes/ai_restyle.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The API Keys section descriptions were written before AI Restyle landed and didn't mention that: - Gemini also powers Nano-Banana relighting (not just viral-moment detection / titles / thumbnails). - fal.ai is now needed for AI Restyle's video-to-video step. The fal panel still said "Optional" and "legacy SaaS UGC pipeline" only. Promotes fal.ai to "Required for AI Restyle" with an amber badge to match Gemini and Upload-Post. Adds a one-line cost hint (~$1.50 per 30s clip) and a pointer to fal.ai/dashboard/keys. Updates Gemini's description to call out AI Restyle's relight step + bumps the version label from "2.5 Flash" to "2.5 / 3.x" since Nano-Banana is on the 3.x preview channel. No behavioral changes; localStorage keys (gemini_key, falKey_v1) are unchanged, so existing user keys keep working. Frontend build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Scope
Draft PR. Backend ML pipeline only — Phases 1 and 2 of the AI Restyle plan (
docs/superpowers/plans/2026-05-20-ai-restyle.md). Phase 0 (model spike), Phase 3 (routes), and Phase 4 (React wizard) are explicitly out of scope and ship in follow-ups.The pipeline is callable in isolation: import
app.restyle.pipeline.run_restyle_jobfrom a test or notebook, supply a small video + two prompts + a Gemini key + a fal.ai key, and it does the full duration-probe → extract → relight → v2v → mux → publish dance.Commits
0dc1cfc6aeb90faba8a5284f0f11260b285New backend modules
backend/app/ml/frame_extract.py— first-frame PNG via the FFmpeg wrapper.backend/app/ml/frame_relight.py— Nano Banana relight with 4 hard-coded safety constraints (pose/composition/people/framing) baked into every prompt.backend/app/integrations/fal.py— publicsubmit_and_poll+upload_filefor fal.ai (queue + storage REST APIs). Httpx only — no new pinned dep.backend/app/ml/video_restyle.py—restyle_video(video, ref, out)via the integration layer.backend/app/restyle/__init__.py+pipeline.py— asyncrun_restyle_joborchestrator; mutates a sharedjobs[job_id]dict in place; never raises.New tests
test_frame_extract.pytestsrcfixture — no binary in git, runs anywhere FFmpeg is on PATHtest_frame_relight.pytest_fal_integration.pytest_video_restyle.pytest_restyle_pipeline.pyBackend gate: 242 / 242 passed (was 217 at branch point — 25 new tests, zero regressions).
Documented plan deviations
gemini-3.1-flash-image-previewinframe_relight.pyinstead of the plan's2.5— the codebase already moved to3.1inthumbnails/images.py:79. Mirroring that.httpxinintegrations/fal.pyinstead of the plan'sfal-clientSDK —app/saas/pipeline.py:_fal_runalready does this; adding a new pinned dep for a small surface didn't pay for itself.backend/app/restyle/(new product folder) instead of the plan'sapp/saas/restyle_pipeline.py. Per CLAUDE.md Convention PIC N WATCH #7 (short-form / long-form / new products stay isolated). The frontend will follow withfrontend/src/pages/AIRestyle/when Phase 4 lands.fal-ai/wan/v2.5/turbo/video-to-video. SwapMODEL_IDinvideo_restyle.pyafter the spike runs.What's NOT in this PR
/api/restylelands in the Phase 3 PR/ai-restylesidebar + 3-step wizard land in Phase 4FAL_KEYplumbing throughmain.py— the integration helpers accept a key argument; the route handler will read it from theX-Fal-Keyheader in Phase 3How to verify locally
Expected:
242 passed, 1 deselected.🤖 Generated with Claude Code