Skip to content

AI Restyle: Phase 1+2 backend ML pipeline (no routes/UI yet)#35

Open
vansteenbergenmatisse wants to merge 56 commits into
mutonby:mainfrom
vansteenbergenmatisse:feat/ai-restyle
Open

AI Restyle: Phase 1+2 backend ML pipeline (no routes/UI yet)#35
vansteenbergenmatisse wants to merge 56 commits into
mutonby:mainfrom
vansteenbergenmatisse:feat/ai-restyle

Conversation

@vansteenbergenmatisse
Copy link
Copy Markdown

Scope

Draft PR. Backend ML pipeline only — Phases 1 and 2 of the AI Restyle plan (docs/superpowers/plans/2026-05-20-ai-restyle.md). Phase 0 (model spike), Phase 3 (routes), and Phase 4 (React wizard) are explicitly out of scope and ship in follow-ups.

The pipeline is callable in isolation: import app.restyle.pipeline.run_restyle_job from a test or notebook, supply a small video + two prompts + a Gemini key + a fal.ai key, and it does the full duration-probe → extract → relight → v2v → mux → publish dance.

Commits

Commit Change
0dc1cfc chore(tests): regenerate OpenAPI baseline for FastAPI/Pydantic render drift
6aeb90f feat(ai-restyle): ml/frame_extract first-frame extractor
aba8a52 feat(ai-restyle): ml/frame_relight Nano Banana wrapper + safety constraints
84f0f11 feat(ai-restyle): integrations/fal + ml/video_restyle (v2v wrapper)
260b285 feat(ai-restyle): restyle/pipeline 7-step orchestrator

New backend modules

  • backend/app/ml/frame_extract.py — first-frame PNG via the FFmpeg wrapper.
  • backend/app/ml/frame_relight.py — Nano Banana relight with 4 hard-coded safety constraints (pose/composition/people/framing) baked into every prompt.
  • backend/app/integrations/fal.py — public submit_and_poll + upload_file for fal.ai (queue + storage REST APIs). Httpx only — no new pinned dep.
  • backend/app/ml/video_restyle.pyrestyle_video(video, ref, out) via the integration layer.
  • backend/app/restyle/__init__.py + pipeline.py — async run_restyle_job orchestrator; mutates a shared jobs[job_id] dict in place; never raises.

New tests

Suite Cases Notes
test_frame_extract.py 3 Synthetic testsrc fixture — no binary in git, runs anywhere FFmpeg is on PATH
test_frame_relight.py 7 Asserts model id, prompt safety constraints, content-policy refusal path
test_fal_integration.py 8 Submit/poll, sync fast-path, FAILED status, timeout, auth header, two-step upload
test_video_restyle.py 5 Model id, args shape, missing inputs, malformed response, output dir creation
test_restyle_pipeline.py 5 Happy path, mid-pipeline failure, > 30 s rejection, monotonic progress, missing-job-id resilience

Backend gate: 242 / 242 passed (was 217 at branch point — 25 new tests, zero regressions).

Documented plan deviations

  1. gemini-3.1-flash-image-preview in frame_relight.py instead of the plan's 2.5 — the codebase already moved to 3.1 in thumbnails/images.py:79. Mirroring that.
  2. Raw httpx in integrations/fal.py instead of the plan's fal-client SDK — app/saas/pipeline.py:_fal_run already does this; adding a new pinned dep for a small surface didn't pay for itself.
  3. backend/app/restyle/ (new product folder) instead of the plan's app/saas/restyle_pipeline.py. Per CLAUDE.md Convention PIC N WATCH #7 (short-form / long-form / new products stay isolated). The frontend will follow with frontend/src/pages/AIRestyle/ when Phase 4 lands.
  4. Phase 0 spike deferred — model defaults to fal-ai/wan/v2.5/turbo/video-to-video. Swap MODEL_ID in video_restyle.py after the spike runs.

What's NOT in this PR

  • No new HTTP route — /api/restyle lands in the Phase 3 PR
  • No frontend — /ai-restyle sidebar + 3-step wizard land in Phase 4
  • No FAL_KEY plumbing through main.py — the integration helpers accept a key argument; the route handler will read it from the X-Fal-Key header in Phase 3
  • No router split — same incremental migration plan as the rest of the codebase

How to verify locally

docker compose up --build backend
docker exec -w /app openshorts-backend pytest -m "not e2e" -q

Expected: 242 passed, 1 deselected.

🤖 Generated with Claude Code

vansteenbergenmatisse and others added 30 commits May 19, 2026 15:37
Locks in current behavior so the upcoming package restructure can be
verified one move at a time:

- tests/unit/: pure-Python tests for SmoothedCameraman, SpeakerTracker,
  VideoEditor filter sanitization + zoompan enforcement, generate_srt /
  format_srt_block / hex_to_ass_color, create_hook_image (real PIL),
  and translate.SUPPORTED_LANGUAGES.
- tests/api/: FastAPI TestClient contract checks. Captures the full
  openapi.json into tests/snapshots/baseline.openapi.json (32 routes)
  so any drift across the restructure fails loudly.
- tests/e2e/: real-ffmpeg pipeline smoke test, skipped unless a
  fixture video and all production deps are present.
- conftest.py stubs heavy ML deps (cv2, mediapipe, ultralytics, torch,
  yt_dlp, scenedetect, google.genai, faster_whisper) via sys.modules
  so unit + api tests run on a stock laptop.
- requirements-dev.txt + pyproject.toml pull pytest, respx, fastapi,
  pillow, boto3, etc. and configure the e2e marker.

Result on the unchanged flat codebase:
  pytest -m \"not e2e\" -> 62 passed in 0.6s

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ject

Phase 1 step 0: create the package skeleton with __init__.py docstrings
in every target folder so subsequent moves have a destination. No code
moves yet — tests stay 62/62 green.

Extends pyproject.toml with [build-system] + [project] + setuptools
package discovery so `pip install -e .` exposes the new package.
requires-python pinned to >=3.9 to match the local dev venv (Docker
still uses 3.11).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 step 1: relocate the leaf module with the fewest reverse imports
(only app.py imports from s3_uploader). Adds a re-export shim at the old
path so existing `from s3_uploader import ...` keeps working through the
restructure. Tests stay 62/62 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…abs.py

Phase 1 step 2: relocate the ElevenLabs dubbing client. Only app.py
imports from translate. Adds a re-export shim at the old path. Tests
stay 62/62 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 step 3: relocate the PIL hook-card generator + FFmpeg overlay
helper. Imported by app.py and the three root verify_*.py scripts.
Adds a re-export shim at the old path. Tests stay 62/62 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ender}.py

Phase 1 step 4: separate concerns. Generation (faster-whisper + SRT
writing) lives in subtitles_generate.py; FFmpeg burn-in + ASS color
conversion lives in subtitles_render.py. Shim at the old path preserves
existing imports. Tests stay 62/62 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pts + utils/filters

Phase 1 step 5: separate the Gemini-driven filter generator. The
VideoEditor class moves to openshorts/editing/ai_filters.py; the two
long Gemini prompt strings move to openshorts/editing/prompts.py as
functions; the previously private filter helpers (sanitize_filter_string,
enforce_zoompan_output_size, split_filter_chain) move to
openshorts/utils/filters.py so the future motion-graphics and audio
compositors can reuse them.

VideoEditor still re-exposes the helpers as static/classmethods so the
existing characterization tests pass unchanged. Shim at editor.py keeps
`from editor import VideoEditor` working. Tests stay 62/62 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…riptions}.py

Phase 1 step 6: separate the thumbnail workflow into three modules.
Each concern is < 100 lines and independently testable. Shim at the old
path preserves existing imports. Tests stay 62/62 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 step 7 (biggest split): main.py is broken into eight modules:
- openshorts/video/tracking.py        SmoothedCameraman, SpeakerTracker
- openshorts/video/scene_analysis.py  detect_scenes, analyze_scenes_strategy
- openshorts/video/reframing.py       create_general_frame
- openshorts/video/pipeline.py        process_video_to_vertical (the hot loop)
- openshorts/ml/detection.py          detect_face_candidates, detect_person_yolo
- openshorts/ml/transcription.py      transcribe_video
- openshorts/ml/viral_extraction.py   GEMINI_PROMPT_TEMPLATE, get_viral_clips
- openshorts/ingest/youtube.py        download_youtube_video, sanitize_filename

main.py becomes a thin shim that re-exports the public surface for backwards
compatibility AND preserves the CLI entrypoint (`python main.py -i ... -o ...`)
in a private `_cli()` function. Tests stay 62/62 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 step 8: relocate the SaaS UGC pipeline as a single module. The
plan calls for an internal split into research / scripting / media /
compositing / pipeline; doing that as one move keeps the change small,
ships the file into the right folder, and defers the function-level
split to a follow-up commit. Shim at saasshorts.py preserves existing
`from saasshorts import ...` calls. No direct test coverage for this
module; the openapi.json contract still passes. Tests stay 62/62 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oint

Phase 1 step 9 (minimum viable): expose the FastAPI app at
``openshorts.app:app`` so the Dockerfile / docker-compose entrypoint can
target the package path. The actual route handlers still live in the
root-level app.py (2256 lines, 32 routes) during the restructure; the
full split into routers (process, editing, subtitles, hooks, translation,
thumbnails, saasshorts, social + the future audio/layouts/motion_graphics
domains) is intentionally deferred to a follow-up commit to keep this
change focused. The plan and ROADMAP track the deferred work.

Tests stay 62/62 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 step 10: introduce the single FFmpeg wrapper module that the
plan calls for. The scaffold exposes the helpers needed by the existing
call sites (run, probe_resolution, probe_duration, cut, extract_audio,
mux_video_audio, overlay_png) plus a build_filter_complex composer that
the future motion-graphics compositor and audio mixer will use to batch
overlay/eq/amix operations into a single ffmpeg invocation.

Migration of every existing ``subprocess.run(['ffmpeg', ...])`` call to
this wrapper is deferred — it's incremental per-caller work that
benefits from running between commits with the test suite green. The
ROADMAP documents the migration as a follow-up.

Tests stay 62/62 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 step 11: update the container entrypoint to the new package path.
openshorts.app re-exports the FastAPI instance from the root-level app.py
(it inserts the repo root on sys.path itself, so no editable install is
needed). docker-compose.yml inherits this via the backend service.

Tests stay 62/62 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3: replace the AWS-only stub with the complete set of env vars
the codebase reads via os.getenv:
- GEMINI_API_KEY (required — viral clip extractor)
- AWS_* (optional — S3 clip/actor/video galleries)
- DISABLE_YOUTUBE_URL (gate the YouTube tab)
- YOUTUBE_COOKIES (yt-dlp bot-detection workaround)
- RENDER_SERVICE_URL (Remotion proxy)
- MAX_CONCURRENT_JOBS (asyncio semaphore in job queue)
- VITE_API_URL + VITE_ENCRYPTION_KEY (frontend)

Documents that ELEVENLABS_API_KEY / UPLOAD_POST_API_KEY / FAL_KEY come
from the browser via headers (encrypted in localStorage), not server-
side env — they're listed at the bottom as commented hints in case a
deployer wants to wire a server default later.

Tests stay 62/62 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 4: enforce the "every Python module ships with a one-line
docstring" convention mechanically so CLAUDE.md stays in sync with
the codebase rather than relying on advisory adherence.

- scripts/update_claude_md.py walks openshorts/, parses each module's
  ast for its docstring + public surface, reads .env.example, and
  rewrites the three auto-managed sections of CLAUDE.md between
  marker comments (REPO-MAP, MODULE-MAP, ENV). It exits non-zero
  with a list of offenders if any module lacks a docstring — that
  failure mode is what enforces the convention.

- scripts/install_hooks.sh: one-liner that runs `pre-commit install`.

- .pre-commit-config.yaml: runs the updater on every commit. Since
  the hook regenerates CLAUDE.md and the resulting changes need to be
  re-staged, developers should rerun `git add CLAUDE.md && git commit`
  after a code change touches module structure.

CLAUDE.md itself stays untouched in this commit — the actual rewrite
with markers in the right place happens in Phase 2.

Tests stay 62/62 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ions

Phase 2: replace the old top-down listing with a tree-aware version
that tells Claude (and humans) where new code lands.

Sections (in order):
- Project + quick start (docker compose / backend only / frontend only)
- "Where things go" decision table — the heart of the file. 11 rows
  mapping intent ("add a new HTTP endpoint") to destination
  ("openshorts/routes/<domain>.py"). Plus the removal checklist.
- Repo layout — top-level folders + backend-package subfolders with
  one-liner rules each. Top-level table is AUTO-MANAGED.
- Module map — every .py under openshorts/ with its docstring +
  public surface. AUTO-MANAGED.
- Processing pipeline — 11 stages with function-level references to
  the new module paths.
- API surface — 12-row table; full inventory in the openapi snapshot.
- Environment — AUTO-MANAGED from .env.example.
- Conventions — six opinionated rules including the "single FFmpeg
  wrapper" and "every module has a docstring" rules that the
  pre-commit hook enforces mechanically.
- Pointers, Tech stack.

Auto-managed sections are filled by scripts/update_claude_md.py
between marker comments (REPO-MAP, MODULE-MAP, ENV). Includes a
small filter fix to exclude .egg-info / .dist-info from REPO-MAP.

Tests stay 62/62 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five small CLAUDE.md files at directory boundaries. Each carries the
one rule that's easy to violate when working inside that subtree:

- openshorts/video/        — FFmpeg only via ffmpeg.py
- openshorts/layouts/      — subclass Layout; don't bypass in callers
- openshorts/motion_graphics/ — register effects + batch via compositor
- openshorts/audio/        — never mix audio inside video/
- openshorts/prompts/      — one .md per prompt; loaded by name

Per the brainstorming / web-research guidance the user requested:
sub-CLAUDE.md files at directory boundaries keep guidance scoped and
discoverable without bloating the root CLAUDE.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 5: write ROADMAP.md with three feature designs and a candid
account of what was deferred from Phase 1.

Ordering rationale (lowest blast radius first):
1. Motion Graphics Library  — reuses overlay pattern, ships first
   because its compositor is the prerequisite for A's audio batching.
2. Background Soundtracks + Ducking — self-contained at the audio
   layer once the FFmpeg wrapper is migrated.
3. Layout Templates (educational, side-by-side, picture-in-picture) —
   last because it touches the per-frame loop in pipeline.py.

Each feature section includes: rationale, architecture sketch, files
to add, integration points (referencing the new module paths from
the restructure), API surface, and risks.

Deferred refactors documented honestly:
- Full router split of app.py
- subprocess.run -> openshorts/video/ffmpeg.py migration
- Internal split of openshorts/saas/pipeline.py
- openshorts/core/{job_store,api_keys}.py extraction
- Frontend restructure (always out of scope this round)

Plus the commit log of what landed in this restructure and the revert
point: `git reset --hard pre-restructure-20260519-1526`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+ assets/

Top-level cleanup per fastapi/full-stack-fastapi-template conventions.
The root is now a clear monorepo with three deployable services and no
loose Python files.

Layout:
  backend/    Python FastAPI (was: openshorts/ + root .py monoliths)
  frontend/   React + Vite dashboard (renamed from dashboard/)
  renderer/   Remotion service + compositions (was: render-service/ + remotion/)
  assets/     Committed fonts + screenshots (was: fonts/ + screenshots/)
  scripts/    Dev tooling (unchanged)

Highlights:
- openshorts/ Python package renamed to backend/app/ to match FastAPI
  template convention (uvicorn app.main:app).
- Root app.py (the 2256-line FastAPI monolith) moved to
  backend/app/main.py; its shim imports now point at app.integrations.s3,
  app.editing.ai_filters, app.overlays.subtitles_*, app.thumbnails.*,
  app.saas.pipeline, etc.
- Root main.py CLI moved to backend/app/cli.py with rewritten imports.
- All root .py shims deleted (editor.py, hooks.py, subtitles.py,
  translate.py, s3_uploader.py, thumbnail.py, saasshorts.py) plus the
  three verify_*.py scripts that the test suite replaced.
- backend/Dockerfile uses uvicorn app.main:app entrypoint.
- renderer/service/Dockerfile updated for new compositions/ path.
- docker-compose.yml updated: backend builds ./backend, frontend builds
  ./frontend, renderer builds renderer/service/Dockerfile.
- Font path now auto-resolves walking up from hooks.py until it finds
  assets/fonts/, so tests work whether run from repo root or backend/.
- scripts/update_claude_md.py: PACKAGE_ROOT -> backend/app, repo-map
  descriptions updated for new top-level layout.
- CLAUDE.md: hand-written sections rewritten for new paths; auto-managed
  REPO-MAP/MODULE-MAP/ENV sections regenerated by the updater.
- .gitignore: snapshot/fixture paths repointed under backend/tests/.

Tests: 62/62 green (pytest -m "not e2e" from backend/).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brand Kit feature: 9-anchor position grid, per-ratio (9:16/16:9) sizing,
font upload (system/bundled/user), and live chunk-cycling preview.
Subtitle + Hook modals pre-fill from the live brand kit via useBrandKit
so changes propagate without a page reload.

Backend: new font endpoints (/api/fonts, /api/fonts/upload, /api/fonts/file/*),
words_per_line threaded through SubtitleRequest -> generate_srt(max_words).
OpenAPI snapshot regenerated for the new font routes.

Also bundles the previous session's port refresh
(3001/3002/3003 host mappings) and the HANDOFF.md briefing doc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the App.jsx tab switcher with react-router-dom (HashRouter) and
a new platform shell:
- 210px fixed Sidebar with 5 items (Dashboard, Short-form, Long-form,
  Clip Generator, Settings).
- 50px persistent Header (page title + notification bell stub).
- New theme tokens in tailwind.config.js (bg #0c0c0c, sidebar #111,
  surface #141414, indigo accent #5b5ef4, platform colors).

Extract cross-page state out of App.jsx:
- state/keysStore.js — Gemini/Upload-Post/ElevenLabs/fal keys + profile.
- state/jobStore.js — jobId/status/results/processingMedia/session.
- hooks/useJobPolling.js — /api/status/{id} polling loop.
- lib/crypto.js — XOR+Base64 helpers extracted from App.jsx.

Pages:
- /clip-generator carries the existing process flow (uses extracted stores).
- /settings keeps existing config UI (Gemini, Brand Kit, Upload-Post,
  ElevenLabs, fal.ai) until Phase 2 rebuilds with VS Code layout.
- /dashboard, /short-form, /long-form are stubs until phases 3-4.

Legacy code preserved at /legacy/saasshorts, /legacy/thumbnails,
/legacy/ugc, /legacy/ai-agent — hidden from sidebar, reachable by URL.

main.jsx wraps App in HashRouter; resolveView treats hash starting with
'#/' as in-app so deep links survive reloads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rebuild Settings with a 180px left nav grouped into General / Platforms /
System. Each item is its own route under /settings/*:
- General: Brand Kit (live), subtitle style / color presets / export
  defaults (placeholders documenting future controls).
- Platforms: shared PlatformSection driven by :platform route param,
  one panel per YouTube/TikTok/Instagram/Snapchat/Facebook.
- System: API Keys (Gemini + Upload-Post + ElevenLabs + fal.ai with
  the same connect/profile flow as before), Processing history
  (placeholder).

Notification system:
- state/notificationsStore.js — localStorage-backed feed with
  pushNotification / markRead / clearNotifications + useNotifications().
- components/ui/NotificationBell.jsx — Header dropdown with unread
  badge, platform-colored dots, "Mark all read" + "Clear all".
- ResultCard + ScheduleWeekModal now push a 'submitted' (or
  'scheduled' / 'failed') notification per platform on /api/social/post.

UI primitives:
- components/ui/Tooltip.jsx — CSS-only group-hover label, no deps.
- components/ui/InfoIcon.jsx — small lucide Info wrapped in Tooltip,
  used next to API key panels.

Backend gap (plan TODO mutonby#9) still open: /api/social/post stays
synchronous, so 'submitted' is the terminal client-side status until
a publish_jobs queue + GET /api/social/publish/status/{id} exists.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the ShortForm placeholder with a 4-step wizard (Upload → Categorize
→ Processing → Review) and the supporting UI primitives. Backend integration
uses existing /api/process per file — batch endpoint is plan TODO #1.

Wizard state:
- New `useWizard` hook (useReducer + localStorage rehydrate) with optional
  `lock` flag on the Processing step. Persists step + serializable data;
  File handles don't survive JSON round-trips, so reloads recover the step
  index but require re-upload to retry a lost batch.
- Step indicator with back-navigation disabled during locked steps.

Steps:
- Upload: drag-drop or click-to-browse, up to 5 files, MP4/MOV ≤ 2 GB,
  client-side type + size validation.
- Categorize: 4 category cards per clip (Educational / Yap / Live / Viral,
  defaults pre-filled — AI categorization is plan TODO mutonby#2) plus an
  auto-edit settings panel (color grade, auto subs, silence removal,
  face-focus layout).
- Processing: parallel POST /api/process per file, per-row status, Skip
  enables once any clip completes, Review unlocks when every file
  reaches complete/error. SnakeGame fills the wait.
- Review: 230px clip list (left) + phone-framed video preview with
  Before/After (blob URL for the original) + export bar (Download,
  Publish, Schedule, Send to CapCut). Publish/Schedule pushes a
  notification via the bell store (real /api/social/post wiring blocked
  on plan TODO mutonby#9).

UI primitives:
- `PhoneFrame` — 9:16 bezel with notch (sm/md/lg).
- `SnakeGame` — self-contained 20×20 grid, arrows/WASD, space pause,
  auto-pauses on document.hidden.
- `PlatformBadge` — color-coded chip per platform, reuses the
  bg-platform-* tokens.
- `StatCard` — Dashboard stat panel (Phase 4 will consume it).

History:
- /short-form/history reads `openshorts.shortForm.history` (written by
  Processing on completion). Backend index endpoint is plan TODO mutonby#10.

App.jsx import updated to `./pages/ShortForm/index.jsx`; the old
single-file `ShortForm.jsx` is removed. Build verified: 1610 modules,
1264 KB JS chunk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final phase of the UI/UX overhaul. Replaces the LongForm + Dashboard
placeholders with a single-file 4-step wizard (Upload → Settings →
Processing → Editor) and a real Dashboard that consumes the
notifications store + local histories.

Long-form wizard:
- Reuses `useWizard` from Phase 3.
- Step 1 Upload: single MP4/MOV up to 8 GB (4K), drag-drop or browse.
- Step 2 Settings: 5 toggles — color grade, auto subtitles, chapter
  detection, description/tags, intro/outro. Each toggle is annotated
  with its backend TODO #.
- Step 3 Processing: simulated 5-stage progress bar with `SnakeGame`
  on the side. The real pipeline branches (silence removal, LUT,
  chapter detection, intro/outro) are plan TODOs mutonby#4mutonby#8; the timer
  here lets the rest of the wizard be exercised end-to-end.
- Step 4 Editor: 16:9 video preview + chapter timeline scrubber +
  right-panel tabs (Chapters / Subtitles / Export). Chapters are
  seeded from Step 3 (placeholder until backend TODO mutonby#6 ships).
  Inline rename + seek-on-click. "Export segment as short" opens a
  modal that documents the pending /api/long-form/export-segment
  route (plan TODO mutonby#7).
- History tab reads `openshorts.longForm.history` (written on
  processing complete).

Dashboard:
- Three StatCards: clips processed (sum of short-form clip counts +
  long-form edits), scheduled (notifications with status='scheduled'),
  published (notifications submitted/published). Deltas surface the
  next platform on deck and the latest publish.
- Upcoming uploads panel: filtered notifications list with platform
  badges and timestamps.
- Recent activity panel: last 8 notifications, any type.
- All values derive locally — the live backend feed lands with plan
  TODO mutonby#10 (GET /api/clips/recent).

Wiring:
- App.jsx import switched to ./pages/LongForm/index.jsx; the old
  single-file LongForm.jsx is removed.
- Build verified: 1616 modules, 1288 KB JS chunk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surfaced by the first browser smoke test after the 4-phase UI overhaul
(npm run build was green but never exercised the running app), plus
follow-ups for the Codex adversarial review's HIGH/MEDIUM findings.

Bugs caught by smoke test:

- backend/app/main.py: run_job invoked `python -u main.py` which no
  longer exists post-restructure. Container log: "python: can't open
  file '/app/main.py'". Switched to `python -u -m app.cli`. Without
  this, every short-form Processing job exits with code 2.

- frontend/src/pages/LongForm/steps/Processing.jsx: under React
  StrictMode (dev) the `startedRef` gate paired with the cleanup
  `clearInterval` caused mount #1 to start the timer, cleanup to
  clear it, and mount mutonby#2 to bail early — simulated progress stuck
  at 0%. Removed the gate; idempotent setData prevents double-reset.

- frontend/src/components/ProcessingAnimation.jsx:228-229: two
  unescaped `>` chars in JSX text → `{'>'}`. Removes the only build
  warnings.

Codex adversarial review remediation:

- H1 / backend/app/main.py (input validation for STATE-MUTATING
  /api/process): new `_ensure_video_upload(filename, first_chunk)`
  rejects on extension (.mp4/.mov) and on missing MP4/MOV `ftyp`
  signature at byte offset 4. Validation runs before any disk write,
  so junk uploads no longer reach the pipeline. Returns 415 with a
  precise reason. Verified: text-content-with-.mp4-extension → 415
  ftyp; real-mp4-with-.txt-extension → 415 ext; real .mp4 → 200.

- H2 / frontend/src/hooks/useWizard.js + both Wizard.jsx callers:
  new optional `resetOnRehydrate(mergedData)` predicate. When it
  returns true (e.g. wizard state references a File that no longer
  survives JSON), the rehydrate force-resets to step 0 with
  initialData and clears localStorage. Eliminates the stranded-state
  bug where users could navigate past Categorize/Settings into a
  step that always fails.

- M3 / frontend/src/pages/ShortForm/steps/Processing.jsx: polling
  effect now uses an AbortController + cancelled flag, fetchStatus
  accepts a signal, and the setData updater skips writes when the
  job has already moved to 'complete' or 'error'. Drops stale
  'processing' responses that race past newer terminal updates.

Tests / verification:

- frontend npm run build: clean, 0 warnings (down from 2).
- backend pytest -m "not e2e": 61/62 pass (test_openapi_dump_matches
  _baseline drifts on pydantic-emitted contentMediaType vs the
  baseline's format:binary — pre-existing, unrelated to these diffs,
  no route added/removed).
- manual smoke: all 5 sidebar pages, all 4 legacy routes, short-form
  end-to-end (POST /api/process now succeeds; transcription runs;
  fails at Gemini with dummy key as expected), long-form end-to-end
  (simulated progress reaches 100%, Editor opens, Export modal works).

security_baseline:
  applies: true
  surfaces:
    - id: POST /api/process (this change)
      tier: STATE-MUTATING
      controls:
        C3_input: { status: covered, mechanism: "extension + ftyp magic-bytes check before disk write" }
        C1_auth:        { status: covered, mechanism: "X-Gemini-Key header (BYOK)" }
        C2_rate_limit:  { status: opted_out, justification: "self-hosted single-tenant deployment; per-IP cap on the host process queue (MAX_CONCURRENT_JOBS) is the effective ceiling. Tracking: full rate-limit pass under /gsd-secure-phase." }
        C4_timeout:     { status: opted_out, justification: "subprocess is intentionally unbounded — clip generation legitimately takes 5-60min. Tracking: kill switch via abuse detection in /gsd-secure-phase." }
        C7_idempotency: { status: opted_out, justification: "client retries would re-submit a fresh job_id; dedup is at user discretion. Tracking: idempotency-key in /gsd-secure-phase." }
        C8_concurrency: { status: covered, mechanism: "asyncio.Semaphore(MAX_CONCURRENT_JOBS) gate in process_queue" }
        C9_audit:       { status: covered, mechanism: "attestation log line with IP + UA + timestamp + source per job" }
        C10_abuse:      { status: opted_out, justification: "BYOK — cost is on the user's Gemini account, not the host. Tracking: per-user spend cap if multi-tenant ever ships." }

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restructures ROADMAP.md into two top-level sections:

1. Product roadmap — user-facing feature backlog, tiered as
   Shipped / Stubbed-in-v1 / Later. Each Stubbed item names the
   backend TODO that unblocks full functionality so the wiring map
   is unambiguous. Covers Short-form, Long-form, Clip Generator,
   Dashboard, Settings, and the Notifications system.

2. Technical roadmap — unchanged content for Features A/B/C
   designs and the deferred-refactor table, kept under a clearly
   marked heading. The frontend-restructure row in the deferred
   table is updated to "superseded by the 4-phase UI overhaul".

Adds a Follow-ups section capturing what the smoke-test pass
surfaced but didn't ship:

- Backend security-baseline gaps for POST /api/process
  (C2 rate limit, C4 timeout/breaker, C7 idempotency, C10 abuse cap)
- Three frontend polish items (Dashboard caption mismatch,
  Skip/Review both disabled on all-error, useRef tidy-up)
- Two infra gotchas (Docker /app/node_modules anonymous volume,
  OpenAPI snapshot Pydantic version drift)
- Codex re-run note (task-mpdeyzjz-vpdetv reference for the
  baseline audit that produced the H1/H2/M3 list)

Updates the "What landed" log with the most recent commits
(brand kit, 4-phase UI overhaul, smoke-test fix).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surfaced by the first real Gemini-key smoke test. The short-form
Processing step polled `/api/status/{job_id}` and trusted the keys
verbatim, but the backend contract speaks a different vocab than the
wizard:

  backend       wizard expected
  ----          ----
  status=completed     status=complete
  status=failed        status=error
  result=...           result=...   (wizard was reading data.results)

So a successful job never tripped the overallStatus → 'complete'
transition (Review button stayed disabled) and the clip metadata
never reached `j.result` (Review step would have shown an empty list).
The legacy useJobPolling.js already mapped both at the boundary —
this mirror that here.

Added `normalizeJobPayload(data)` next to `fetchStatus` so the rest
of the component stays in the wizard's vocab. Verified end-to-end
on the demo MP4: Gemini returns 1 viral clip, wizard auto-advances
to Review, PhoneFrame renders the 17 s vertical output with title,
description, Download, Publish×5, Schedule×5 buttons.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Project-specific guidance now lives under a "## OpenShorts
(project-specific)" H2 section in the user's global CLAUDE.md.
scripts/update_claude_md.py is retargeted to that file (overridable
via OPENSHORTS_CLAUDE_MD) and remains idempotent; the project-root
CLAUDE.md is removed.

Trade-off: the OpenShorts module map + env table is now loaded by
Claude in every project session, not just this repo's. Re-pointable
later via the env var.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Upload.jsx now reads videoElement.duration via a hidden <video
preload="metadata"> when files are added, so Processing.jsx can show
a real ETA instead of a hashed placeholder. HANDOFF.md captures the
session-end state for the next agent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tles

Phase 1 of the polish work. Two layers of changes landed together
because they touch the same files:

Auto-pipeline (new):
- POST /api/process now accepts category + 4 auto-edit toggles
  (auto_edit, auto_subtitles, color_grade, silence_removal) plus a
  subtitle_style JSON from useBrandKit(). All bounds-checked via the
  new SubtitleStyle pydantic model in main.py and _parse_subtitle_style;
  invalid input returns 400.
- After the CLI subprocess produces raw reframed clips, run_job calls
  _run_auto_pipeline (new) which chains AI edit -> color grade (Phase 2
  stub) -> silence removal (Phase 2 stub) -> subtitles per clip. Each
  step writes a sibling file; originals are preserved so Phase 3's
  per-clip Review toggles can swap URLs without re-rendering. Per-clip
  failures log but never fail the whole job.
- Helpers moved to backend/app/editing/auto_pipeline.py so the route
  handlers (/api/edit, /api/subtitle) keep working unchanged for the
  legacy ResultCard.
- status='completed' is now flipped AFTER the auto-pipeline finishes,
  so the wizard never navigates to Review with raw URLs mid-polish.
- Frontend: Categorize relabels faceLayout -> autoEdit, reorders to
  match the backend chain. Processing.jsx hooks useBrandKit() and
  sends the new Form fields. Brand-kit 3x3 positions are aliased
  server-side to the burner's top/middle/bottom.
- backend/tests/unit/test_auto_pipeline_config.py (NEW, 48 tests)
  pins bounds, hex-color rejection, position aliasing, bool coercion,
  and the category allowlist.
- backend/tests/snapshots/baseline.openapi.json regenerated for the
  6 new Form fields.

Carried in from the prior session (intermingled in the same files):
- Categorize.jsx adds an amber 'no Gemini key' gate banner + disables
  Start Processing when the key is missing.
- Processing.jsx becomes reactive to keys.gemini (drops startedRef),
  surfaces ETA from probed duration, and splits Queued into
  awaiting_key / uploading / queued / processing / complete / error
  with specific captions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
vansteenbergenmatisse and others added 21 commits May 20, 2026 14:16
…eview

Adds a 5-button segmented control (Original | + Edit | + Grade | + Cut | + Subs)
beneath the phone preview that swaps the displayed variant URL on click. The
auto-pipeline already emits a chain of variants per clip (original → edited →
graded → silencecut → subtitled); this lets the user step through them.

Missing variants get a [+] icon. Clicking [+] calls the matching endpoint
(/api/edit, /api/colorgrade, /api/silencecut, /api/subtitle) with the most
recent existing variant as input_filename, then merges the returned filename
into wizard.data and switches the displayed stage to it.

A LUT picker dropdown (cool / noir / teal_orange / vivid / warm) appears
under the segmented control when the graded stage is selected. Changing the
LUT triggers /api/colorgrade with the chosen lut_name; the dropdown is
disabled while the request is in flight. Stage selection + chosen LUT
persist per-clip in wizard.data.clipStages / clipLuts so reloads keep
the user's choices (modulo the existing rehydrate-bounce when the source
File handle is lost — pre-existing wizard behavior, unchanged).

Smoke-tested in browser: stage swap updates Download URL, LUT change to
noir re-grades the on-disk file (mtime confirmed), /api/silencecut
returns the expected response shape.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AI Restyle is a new sibling to Short-form / Long-form. User uploads a video
they made; product relights and re-backgrounds it using a Nano Banana frame
as the style reference for a video-to-video model, preserving the original
motion, content, and audio.

v1 scope:
- Sidebar entry between Long-form and Short-form
- 3-step wizard: Upload → Configure → Review
- Two preset dimensions (Background + Lighting), 5 hand-tuned seed presets
  each, CRUD in Settings → "AI Restyle" tab
- Per-job prompt override via inline textarea
- 30s duration cap
- Original audio preserved bit-for-bit
- No editing/subs/color-grade (those belong to Short-form; "Send to
  Short-form" CTA closes the loop)

Video-to-video model choice deferred to implementation Phase 0 spike
(Wan v2.5 / Luma Ray2 / Runway Gen-3 candidates; cost ≤$2 per 30s).

10 explicit decisions documented (D1-D10), including the 30s cap rationale,
the no-editing scope choice, and the Phase 0 model-selection deferral.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Plan breaks the 7-phase milestone into bite-sized TDD tasks: Phase 0 model
spike, Phase 1 frame_extract + frame_relight, Phase 2 video_restyle +
restyle_pipeline orchestrator, Phase 3 routes + OpenAPI snapshot, Phase 4
preset store + wizard pages, Phase 5 Settings tab CRUD, Phase 6 smoke
test + Codex review + ROADMAP/CLAUDE.md refresh.

Each task is 2-5 minutes (test → run-fail → impl → run-pass → commit) per
the writing-plans skill conventions.

ROADMAP.md gets a new top-of-product-roadmap section pointing at both the
spec and the plan, with the v1 scope (30s cap, 3-step wizard, preset CRUD)
and the explicit out-of-scope items (>30s chunking, ShortForm bridge, AI
preset suggestion).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Backend
- New backend/app/video/merge.py: concat_clips() normalizes each input to
  1080x1920@30fps + AAC 48 kHz stereo, then stitches with FFmpeg concat=v:a.
  All FFmpeg ops funnel through the wrapper per Convention #1.
- POST /api/merge: bounds-checks clip_indices against the job's clip count,
  dedups while preserving user-picked order, allowlists transition ("cut"),
  rejects single-clip requests at the schema layer. Output filename encodes
  the ordered indices so the same merge is naturally idempotent.

Frontend (Review.jsx)
- Per-clip checkboxes in the sidebar; selection locks to a single job_id.
- "Merge N selected" CTA appears in the export bar at ≥2 selections.
- Confirmation modal with up/down reorder + remove; "Re-render" calls
  /api/merge and pushes the result into wizard.data.mergedClips for
  persistence across reloads.
- New "Merged outputs" sidebar section previews merged files; stage selector,
  LUT picker, and before/after toggle are hidden while previewing a merge.

Tests
- backend/tests/unit/test_merge.py (8 cases) covers the normalize filter,
  concat arg composition, empty/single-input rejection, missing-file
  detection, and the "ffmpeg produced empty output" guard.
- backend/tests/api/test_merge_endpoint.py (6 cases) covers HTTP-layer
  validation, dedup, and the happy-path response shape.
- baseline.openapi.json regenerated to lock in the new route.
- pytest 164/164 (up from 150), npm run build 0 warnings.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codex audit flagged a real race in the Phase 4 merge helper: two clients
POSTing /api/merge with the same clip_indices both write to the same
deterministic `merged_{indices}.mp4` filename with ffmpeg's `-y` flag.
A reader hitting `/videos/{job_id}/merged_*.mp4` between writer-A start
and writer-B finish could see a partial / mid-write file.

Fix: concat_clips writes to `{output}.partial-{nonce}.mp4`, then
os.replace()s onto the public path. Idempotency on the URL is preserved
(filename stable), but the public path is never half-written: readers
see the prior file or the new file, never a partial.

On ffmpeg failure the partial is cleaned up; the stable output stays
intact.

Tests: 3 new unit cases (partial path used, cleanup on failure, unique
partials across calls) + 3 new API cases (negative-index reject, case-
normalized transition, concurrent identical merges converge on same
public path). Suite: 170/170 (up from 164).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-clip routes

Codex audit found 5 routes that index `clips[req.clip_index]` with only a
`>= len(clips)` check, so a request with `clip_index=-1` would silently
mutate the LAST clip:

  - /api/edit            (main.py:776, no bounds check at all)
  - /api/effects/generate (main.py:993, no bounds check at all)
  - /api/clip/.../transcript (main.py:910, only >=)
  - /api/subtitle        (main.py:1097, only >=)
  - /api/hook            (main.py:1453, only >=)

Phase-2 routes (/api/colorgrade, /api/silencecut) already go through
`_resolve_clip_input` which validates negatives; /api/merge has its own
explicit `idx < 0` check. This brings the legacy surface to parity.

Also defends `_persist_clip_url` and the inline persistence in
/api/subtitle (L1175/1180/1261/1270) with `0 <= idx < len` — even though
the route entries now block negatives, helpers should not assume callers
have validated.

Tests: 5 new contract cases in test_legacy_negative_clip_index.py — one
per affected route, asserting 400/404/422 for `clip_index=-1`. Before
the route guards these tests hang because the routes fall through to
real Gemini/FFmpeg/Whisper work on the last clip (which has fake bytes).
Suite: 175/175 (up from 170).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d-C4)

Codex audit flagged that ``app.video.ffmpeg.run`` defaulted to
``timeout=None``, making any state-mutating ffmpeg call (`/api/merge`,
``/api/colorgrade``, ``/api/silencecut``, etc.) a DoS primitive: a
hostile or corrupt input could pin a worker thread forever. The deferred
C4 control was therefore exploitable today even with auth/rate-limit
deferred.

Fix:
* ``DEFAULT_TIMEOUT`` (1800s, override via ``FFMPEG_TIMEOUT_SECONDS``)
  applied when caller passes ``timeout=None``.
* ``DEFAULT_PROBE_TIMEOUT`` (30s, override via ``FFPROBE_TIMEOUT_SECONDS``)
  applied to ``probe_resolution``/``probe_duration``.
* ``subprocess.TimeoutExpired`` wraps into ``FFmpegError(returncode=-1)``
  so callers see a single exception type and the error message carries
  the configured timeout for triage.

Tunables on env vars (defensible 30-min worst case for a 50-clip merge
of 60s sources; production can lower with ``FFMPEG_TIMEOUT_SECONDS=600``).

Tests: 6 new unit cases in tests/unit/test_ffmpeg_wrapper.py covering
default application, explicit pass-through, timeout-to-FFmpegError
wrapping, probe defaults, sanity of constant values. Suite: 181/181
(up from 175).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex audit (focus 2, 3 BLOCKERs) confirmed that ``/api/edit``,
``/api/subtitle``, and ``/api/hook`` were calling
``subprocess.run(['ffmpeg', ...])`` and ``subprocess.check_output(['ffprobe', ...])``
directly instead of going through ``app.video.ffmpeg``. That dodged:

* the wrapper's UTF-8 locale setup (the ``ai_filters.py`` site even had
  a bytes-encoding workaround for the missing locale)
* the Phase-5 B-2 default timeout (deferred-C4 DoS)
* the uniform ``FFmpegError`` surface

Migrated:

* ``app/editing/ai_filters.py`` — ``apply_edits`` copy fallback,
  ``probe_resolution`` for input dimensions, main filter run. The
  bytes-encoding hack at L218-232 disappears because the wrapper sets
  ``LANG=C.UTF-8`` / ``LC_ALL=C.UTF-8``.
* ``app/overlays/subtitles_render.py`` — ``burn_subtitles``
  routes through ``ffmpeg_wrapper.run``; ``FFmpegError`` already carries
  the stderr payload so the manual decode/raise is gone.
* ``app/overlays/hooks.py`` — both the input probe and the overlay
  burn-in now use the wrapper.

Added regression test ``tests/unit/test_ffmpeg_wrapper_invariant.py``
that pins the three migrated files: any future commit that re-introduces
a direct ``subprocess.*(['ffmpeg' / 'ffprobe' ...])`` call in those
modules fails the suite (6 cases, 2 invariants × 3 files).

Out of scope (documented in the test docstring): ``video/pipeline.py``
(per-frame Popen with stdin streaming), ``cli.py``, ``saas/pipeline.py``,
and one remaining ffprobe call in ``main.py`` /api/effects/generate.
Those need wrapper helpers we don't have yet — separate /gsd-secure-phase
sweep.

Suite: 187/187 (up from 181). Live curl against /api/subtitle and
/api/hook returns the expected 404/422 responses post-restart.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex audit (focus 1, BLOCKER) showed that /api/edit + auto_pipeline's
apply_ai_edit executed Gemini-produced filter strings via FFmpeg ``-vf``
with only a comparison-operator cleanup pass. A malicious response like
``movie=/etc/passwd,scale=1:1`` would trigger filesystem reads through
the ``movie`` filter. Same risk for ``amovie``, ``subtitles``, ``ass``,
``concat`` (file:= option), ``sendcmd``, ``asendcmd``.

Fix: strict allowlist + explicit deny list in
``app/utils/filters.py``:

* ``_ALLOWED_FILTERS`` enumerates safe filters used by the prompt
  (zoompan, eq, hue, curves, unsharp, …) plus pipeline essentials
  (scale, setsar, fps, format, fade, …) and a few common visual safe
  primitives (vignette, drawbox, lutyuv/rgb, gblur, …).
* ``_DISALLOWED_FILTERS`` explicitly bans movie/amovie/subtitles/ass/
  concat/sendcmd/asendcmd as a defense-in-depth backstop.
* Parser strips ``[label]`` brackets, splits the chain on both ``,``
  and ``;`` (filter_complex), extracts the leading filter name, and
  fails closed on anything outside the allowlist.
* ``UnsafeFilterError(ValueError)`` is raised on rejection.
* ``VideoEditor.apply_edits`` calls the validator AFTER comparison-
  operator sanitization (since the post-sanitization form is what
  executes) and BEFORE the FFmpeg invocation.
* ``/api/edit`` route handler surfaces ``UnsafeFilterError`` as a 400
  with a frontend-friendly message instead of a generic 500.

Tests: 14 new cases in ``tests/unit/test_filter_safety.py`` covering
the Codex reproducer ``movie=/etc/passwd``, plus amovie/subtitles/ass/
concat, chain-position attacks (evil filter after legit one, evil after
``;``), unknown filters, bracket-label handling, whitespace tolerance,
error-message specificity, and TypeError on non-strings. Suite:
208/208 (up from 187).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex audit (focus 4, 2 BLOCKERs) flagged concurrent mutations on
``jobs[]`` and ``metadata.json``:

  main.py:39   — jobs dict mutated by route handlers + executor threads
                  with no synchronization
  main.py:1263 — _persist_clip_url's read-modify-write on metadata.json
                  can lose updates when /api/colorgrade and
                  /api/silencecut fire concurrently on the same clip

Fix:

* ``_JOB_LOCKS: Dict[str, threading.Lock]`` keyed by job_id, with a
  guard lock around the dict-of-locks. ``_job_lock(job_id)`` creates
  the lock lazily.
* ``_atomic_write_json(path, data)`` writes via ``.tmp-{pid}-{tid}``
  + ``os.replace`` so a crashed writer cannot leave a half-written
  metadata.json on disk.
* ``_persist_clip_url`` now holds the per-job lock for the entire
  read-modify-write window AND writes via atomic-rename.
* The inline metadata persistence in ``/api/subtitle`` (L1175-1190)
  uses the same lock + atomic write — Codex specifically called out
  that this path duplicated the unsynchronized pattern.

Threading.Lock (not asyncio.Lock) because mutators are called from
inside ``run_in_executor`` (sync code on worker threads). Lock is
per-job_id, so unrelated jobs don't contend.

Tests: 6 new cases in ``tests/unit/test_job_lock.py`` covering:

  - lock identity (same lock for same job, different for different)
  - lock type (threading.Lock acquire/release)
  - persist writes to memory + disk in one shot
  - atomic-write never leaves a partial file
  - 32 concurrent _persist_clip_url calls across 4 clips on the same
    job all land (vs. previous behavior where some updates would be
    lost to stale-read writers).

Suite: 214/214 (up from 208).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… drift

Pre-existing drift: the running FastAPI/Pydantic stack renders binary upload
fields as ``contentMediaType: application/octet-stream`` instead of the older
``format: binary``. Path keys are identical — no contract change, just a
library-version rendering tweak. Regenerating so each downstream commit can
pass the contract test independently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 Task 1.1 of the AI Restyle plan. Extracts the frame at t=0 of a
video to PNG via the central FFmpeg wrapper — the first frame is what
Nano Banana (next task) will relight to seed the v2v style reference.

Test uses a synthetic ``testsrc=duration=2`` clip so it runs anywhere
ffmpeg is available, no binary checked into git, no dependency on
``demo-openshorts.mp4`` (which sits outside the backend volume mount).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…raints

Phase 1 Task 1.2 of the AI Restyle plan. Sends the first frame of the
source video to Gemini's image-preview model with a user-supplied
background+lighting prompt plus hard-coded safety constraints that pin
the subject, pose, composition, and framing. Returns the relit PNG,
which Phase 2 will hand to the video-to-video model as a style
reference.

Plan deviation: uses ``gemini-3.1-flash-image-preview`` to match the
codebase's existing thumbnails/images.py call site, instead of the
plan's stale ``gemini-2.5-flash-image-preview`` (the codebase moved
forward between plan writing and execution).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 Task 2.1 of the AI Restyle plan. Two new modules:

1. ``backend/app/integrations/fal.py`` — public submit_and_poll +
   upload_file helpers wrapping fal.ai's queue + storage REST APIs.
   No new dependency: httpx only. The legacy SaaSShorts pipeline keeps
   its private ``_fal_run`` for now; DRY refactor deferred to keep this
   PR scoped to AI Restyle.

2. ``backend/app/ml/video_restyle.py`` — restyle_video(video, ref, out)
   uploads both inputs, runs the v2v model, downloads the result.

Plan deviations:
  - Used raw httpx instead of the ``fal-client`` SDK to match how
    ``app/saas/pipeline.py:_fal_run`` already calls fal.ai (and avoid a
    new pinned dependency).
  - Model defaults to ``fal-ai/wan/v2.5/turbo/video-to-video`` ahead of
    the Phase 0 spike. The spike is parked until the user provides a
    FAL_KEY for cross-model benchmarking; swap MODEL_ID once it runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 Task 2.2 of the AI Restyle plan. Async coroutine that drives
duration probe → first-frame extract → Nano Banana relight → fal.ai
v2v → audio mux, with each ML/FFmpeg call dispatched to a worker
thread via ``loop.run_in_executor`` (matches Short-form's pattern so
the FastAPI event loop is never blocked by ffmpeg / a fal.ai poll).

Mutates the supplied ``jobs[job_id]`` dict in place so the route
handler + frontend polling see live progress + status. Any exception
flips status to 'failed' and appends to logs — the coroutine itself
never raises.

Plan deviation: file lives in ``backend/app/restyle/pipeline.py``
(new product folder, mirrors ``saas/``) rather than the plan's
``app/saas/restyle_pipeline.py``. Per CLAUDE.md Convention mutonby#7
(short-form / long-form / new products stay isolated), AI Restyle's
backend gets its own package — same rule the frontend follows for
pages/AIRestyle/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the AI Restyle pipeline into the public API:
- POST /api/restyle  — multipart upload + background_prompt + lighting_prompt
                       form fields. Requires X-Gemini-Key + X-Fal-Key headers.
                       Persists upload via the existing _ensure_video_upload
                       guard (MIME+ftyp), enforces 500-char prompt cap and
                       2GB file cap, seeds jobs[job_id] with product="ai-restyle",
                       schedules run_restyle_job via FastAPI BackgroundTasks.
- GET  /api/restyle/{job_id} → RestyleStatus pydantic model surfacing the
                       live jobs[job_id] {status, logs, progress_pct, result}.

First populated module under backend/app/routes/ — establishes the pattern
for the eventual main.py router split.

Plan deviations (from docs/superpowers/plans/2026-05-20-ai-restyle.md):
1. Import path is `app.restyle.pipeline` not `app.saas.restyle_pipeline`
   (per D11 — AI Restyle is its own product per CLAUDE.md Convention mutonby#7).
2. Auth check moved BEFORE prompt-length / file-signature checks (matches
   the existing /api/process pattern — minor info-leak avoidance).
3. background_tasks.add_task(run_restyle_job, **kwargs) directly — no
   asyncio.create_task wrapper. FastAPI 0.136.1 awaits async tasks natively.
4. Added file-size cap (MAX_FILE_SIZE_MB=2048) matching main.py — plan did
   not enforce one, which would have invited a disk-fill DoS.
5. 8 tests, not 4 — adds happy-path enqueue (mocked pipeline), lighting-
   prompt-too-long, non-MP4 rejection, and seeded GET happy path.

OpenAPI baseline regenerated; diff confirms /api/restyle and
/api/restyle/{job_id} are the only path additions (rest of contract
unchanged). Backend pytest 250/250 green (242 prior + 8 new).

security_baseline (per ~/.claude/skills/securing-http-and-llm-endpoints):
  applies: true
  surfaces:
    - id: POST /api/restyle
      tier: LLM-CALL  # downstream Gemini Nano-Banana + fal.ai v2v
      controls:
        C1_auth:        { status: covered,   mechanism: "X-Gemini-Key + X-Fal-Key headers required; matches /api/process" }
        C2_rate_limit:  { status: opted_out, justification: "no per-IP/user RL anywhere in OpenShorts; systemic gap tracked for /gsd-secure-phase" }
        C3_input:       { status: covered,   mechanism: "_ensure_video_upload MIME+ftyp; 500-char prompt cap; 2GB file cap; 30s duration cap downstream in restyle/pipeline" }
        C4_timeout:     { status: covered,   mechanism: "integrations/fal.py 600s timeout w/ poll; ffmpeg.py FFMPEG_TIMEOUT_SECONDS=1800 default" }
        C5_output_rate: { status: covered,   mechanism: "async job; finite JSON result; no streaming surface" }
        C6_redaction:   { status: opted_out, justification: "user supplies own prompt text; no user PII flows into LLM beyond user-authored prompt" }
        C9_audit:       { status: partial,   mechanism: "jobs[].logs captures pipeline steps; no separate audit log (systemic gap)" }
        C10_abuse:      { status: partial,   mechanism: "30s duration cap bounds per-job spend to ~\$1.24 worst case (Nano-Banana ~\$0.04 + Wan v2.5 ~\$0.04/s); no per-user/IP daily quota" }
    - id: GET /api/restyle/{job_id}
      tier: PUBLIC-READ  # mirrors /api/status/{job_id} which is also unauth
      controls:
        C2_rate_limit:  { status: opted_out, justification: "matches existing /api/status pattern; systemic gap" }
        C3_input:       { status: covered,   mechanism: "job_id is path str; 404 on miss; no parser surface" }
        C4_timeout:     { status: covered,   mechanism: "in-memory dict lookup; no I/O" }

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two dimensions (backgrounds + lightings), 5 hand-tuned seed presets each.
Mirrors keysStore.js — localStorage-backed, broadcasts on a custom event
+ listens on the storage event for multi-tab consistency. Public surface:

- getPresets()
- setDefault(dimension, id)
- upsertPreset(dimension, preset)
- deletePreset(dimension, id)  (refuses to delete the default)
- useAIRestylePresets()        React hook

Shared dependency for the wizard (Configure step) and the Settings tab
(preset CRUD). Built first so both downstream surfaces have a stable
contract to import.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a "General → AI Restyle" panel under Settings. Two stacked lists
(Backgrounds + Lightings), each with star/edit/delete affordances and an
"Add preset" button. Edit modal enforces 40-char label cap + 500-char
prompt cap with a live counter; default preset cannot be deleted.

Plan deviation: the plan suggested a component-array tab list shape
(`{ id, label, component }`), but the existing Settings page uses
react-router child routes with a NavLink-driven left rail. Followed
the existing pattern instead — one nav entry in Settings/index.jsx +
one <Route> in App.jsx wires it up cleanly.

Wired through:
- frontend/src/pages/Settings/index.jsx — new "AI Restyle" item under General
- frontend/src/App.jsx — child route `general/ai-restyle`

No wizard yet — that ships in Phase 4b. The Settings UI is functional in
isolation: preset edits broadcast via the storage event so any future
subscribed component (i.e. the wizard's preset dropdowns) will re-render
without a reload.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the full user-facing surface for AI Restyle. Sidebar entry sits
between Long-form and Short-form (Wand2 icon). Wizard mounts under
/ai-restyle/* with Wizard/History sibling tabs (mirrors ShortForm shape).

New pages:
- pages/AIRestyle/index.jsx       — Outlet shell + Wizard/History NavLinks
- pages/AIRestyle/Wizard.jsx      — 3-step state machine via useWizard
- pages/AIRestyle/History.jsx     — placeholder ("tracking lands in follow-up")
- pages/AIRestyle/steps/Upload.jsx     — client-side duration probe; 30s cap
- pages/AIRestyle/steps/Configure.jsx  — preset dropdowns + 2 prompt textareas
                                          + POST /api/restyle
- pages/AIRestyle/steps/Review.jsx     — 2s polling; phone preview with
                                          Before/After toggle; Download +
                                          Send-to-Short-form CTAs

Plan deviations:
1. Configure step uses TWO separate textareas (background + lighting)
   instead of the plan's single combined textarea with newline-split. The
   split-by-newline approach is fragile to multi-line user edits and
   the backend already accepts the two prompts as separate form fields —
   so the simpler UX matches the wire contract 1:1.
2. Configure step renders a yellow "Missing API keys → Settings" banner
   when keys.gemini or keys.fal is empty (plan only surfaced this as an
   error on Submit). Clearer UX, single Link to /settings/system/api-keys.

Send-to-Short-form: stashes {url, name} in sessionStorage under
'openshorts.shortForm.handoff' and navigates to /short-form. Full
wire-through into ShortForm/Upload to consume that payload is documented
as a follow-up per the plan's "stubbed in v1" note.

Sidebar position respects the handover constraint: between Long-form and
Short-form (Wand2 icon). Per CLAUDE.md Convention mutonby#7 the new pages/ live
in their own folder and DO NOT cross-import with ShortForm or LongForm —
the only cross-product touchpoint is the sessionStorage handoff, which
is a one-way browser-storage handshake, not a code dependency.

Frontend build: clean, 0 errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses two Codex HIGH findings from the adversarial review of PR mutonby#35
(commit 0cc0c3e, Codex job task-mpeq4ynb-qd7sfx):

HIGH-1 (SSRF / internal-network fetch):
  ml/video_restyle.py:67 streamed `response["video"]["url"]` directly from
  the fal API response with no host validation. A compromised or malicious
  fal payload could redirect a server-side fetch at e.g.
  http://169.254.169.254/... (cloud-instance metadata) or any internal
  service the backend can reach.

HIGH-2 (credential exfiltration via queue URLs):
  integrations/fal.py:83-88 trusted `status_url` and `response_url` from
  the submit response and then issued `Authorization: Key <fal_key>` to
  those URLs. If those response fields were ever attacker-influenced,
  the fal API key would have been sent to an attacker-controlled host.

Fix: two validators in integrations/fal.py.

- require_fal_queue_url(): HTTPS-only, host must equal queue.fal.run.
  Called on status_url + response_url before any Authorization-header
  request is built.

- require_fal_download_url(): HTTPS-only, host must be fal.ai / fal.run /
  fal.media or a subdomain. Called on the response['video']['url'] before
  httpx.stream() opens the connection.

Both raise FalError on mismatch; the route handler propagates the failure
through the existing run_restyle_job exception path (jobs[id].status =
"failed", message lands in jobs[id].logs).

Tests:
- tests/unit/test_fal_integration.py:
  * test_submit_and_poll_rejects_attacker_controlled_status_url
  * test_submit_and_poll_rejects_attacker_controlled_response_url
  * test_submit_and_poll_rejects_non_https_queue_url
- tests/unit/test_video_restyle.py:
  * test_restyle_video_rejects_untrusted_download_url
- Updated existing test_video_restyle.py mocks from "cdn.fal" (not a real
  fal hostname) to "v3.fal.media/files/..." so the legitimate path still
  passes the allowlist.

Backend pytest 255/255 (was 250 + 5 new).

Codex MEDIUM follow-ups (deferred to separate issues / PRs):
- Prompt-injection hardening (delimit user prompts as untrusted data)
- AI Restyle write path bypasses _job_lock (consistency-only risk today)
- Gemini content-policy refusal catches broad Exception
- fal client poll/result responses miss raise_for_status

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses Codex HIGH-3 from the adversarial review of PR mutonby#35
(commit 0cc0c3e, Codex job task-mpeq4ynb-qd7sfx):

Original behavior:
  POST /api/restyle accepted up to 2GB (matching /api/process). The 30s
  duration cap was enforced AFTER the file landed on disk via the
  run_restyle_job probe (restyle/pipeline.py:71-76). An attacker could
  repeatedly upload 2GB files of any duration and force ~2GB of
  uploads/ disk churn per request before the duration check rejected
  them.

Fix:
1. Lower the AI Restyle cap to 250MB (overrideable via
   AI_RESTYLE_MAX_FILE_SIZE_MB env var). 30s of video at 67 Mbps
   already covers streaming-quality bitrates; pathological multi-GB
   30s clips don't need to be accepted.
2. Add a Content-Length header preflight BEFORE allocating any disk
   space. Clients can lie about Content-Length, but for legitimate
   uploads (which honestly report size) this rejects fast.
3. Keep the streaming size check as backstop for clients that send
   chunked-transfer or that lie about Content-Length.

Tests:
- tests/api/test_ai_restyle.py:
  * test_post_restyle_rejects_oversize_content_length — monkeypatches
    MAX_FILE_SIZE_MB to 1MB, uploads 2MB, expects 413.

Backend pytest still 255/255 (test was already passing on the streaming
backstop; the preflight is a defense-in-depth improvement the test does
not specifically isolate).

Codex MEDIUM follow-ups remain (will land as separate issues / PRs):
- Prompt-injection hardening (delimit user prompts as untrusted data)
- AI Restyle write path bypasses _job_lock (consistency-only risk today)
- Gemini content-policy refusal catches broad Exception
- fal client poll/result responses miss raise_for_status

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vansteenbergenmatisse
Copy link
Copy Markdown
Author

Codex adversarial security review — complete

Ran /codex:rescue with a deep audit prompt covering OWASP top 10 + LLM-specific concerns across all new files in this PR (backend ML pipeline, FastAPI routes, frontend wizard + Settings). Codex session: 019e47d4-f0a7-7333-a93b-00d5c2956950, completed in 2m 54s.

Findings

HIGH (3) — all fixed inline in this PR:

# Finding Commit
H1 SSRF via untrusted v2v download URL (ml/video_restyle.py:67 streamed response['video']['url'] without host validation) d5eb949
H2 fal API-key exfiltration via attacker-influenced status_url / response_url echoed back from queue.fal.run submit response d5eb949
H3 Disk-DoS via 2GB upload-then-reject (30s duration cap was enforced AFTER the file landed on disk) 17b784b

Fixes added require_fal_queue_url() (HTTPS + queue.fal.run) and require_fal_download_url() (HTTPS + .fal.{ai,run,media} allowlist) in app/integrations/fal.py. Cap reduced from 2GB → 250MB on /api/restyle with Content-Length preflight before any disk write. 5 new tests cover the rejection paths.

MEDIUM (4) — deferred to separate follow-ups:

  1. Prompt-injection hardening — user prompts in ml/frame_relight.py:26-35 are interpolated raw alongside the hard-coded SAFETY_CONSTRAINTS block. Constraints come AFTER user input (safer order) but user text isn't delimited as untrusted data.
  2. Job-state lock bypass — AI Restyle's run_restyle_job mutates jobs[job_id] directly without _job_lock() (used by Short-form). Today's risk is consistency-only (single writer per job_id, reader is GET-only), but isolating AI Restyle state behind a small job-store helper would prevent future TOCTOU.
  3. Gemini content-policy refusalframe_relight.py:63-78 infers refusal from "no image parts" rather than catching the specific google-genai exception class. The pipeline's broad except Exception (restyle/pipeline.py:129-132) further masks types.
  4. fal client error mappingintegrations/fal.py:97-105 and :109-112 don't raise_for_status() on poll/result responses; permanent 401/403/404s during poll get retried until timeout.

Controls confirmed by Codex:

  • FFmpeg uses list-argv (no shell-string injection)
  • Upload filename traversal mitigated by os.path.basename()
  • API keys never stored in the AI Restyle job dict (kept on the run_restyle_job parameter stack)
  • GET /api/restyle/{job_id} is unauthenticated but uses UUID4 IDs and projects through RestyleStatus pydantic model
  • No dangerouslySetInnerHTML for preset prompts; React renders as text / textarea value
  • The falKey_v1 "encryption" in crypto.js is confirmed XOR+base64 obfuscation (pre-existing in repo, not introduced by this PR)

Overall risk assessment (Codex verbatim): "AI Restyle is not exposing obvious shell injection, raw API-key storage in job state, or React HTML injection. The main deploy blockers are server-side network trust around fal-provided URLs and disk-exhaustion pressure from accepting very large files before the 30s duration gate." — Both addressed by d5eb949 + 17b784b.

Test status

Backend pytest 255/255 green (+8 new test cases this PR). Frontend build clean.

Known systemic gaps (already documented in commit security_baseline YAML, NOT new findings)

  • C2 rate limiting: no per-IP/user RL anywhere in OpenShorts (tracked for /gsd-secure-phase)
  • C9 audit logging: jobs[].logs captures pipeline steps; no separate audit log
  • C10 cost cap: 30s duration bounds per-job spend to ~$1.24 worst case; no per-user daily quota

Ready for review.

@vansteenbergenmatisse vansteenbergenmatisse marked this pull request as ready for review May 21, 2026 00:36
Reframe AI Restyle's ROADMAP.md section from "planned" to the Shipped /
Stubbed in v1 / Later tiering used by Short-form and Long-form, now that
the feature is in review on PR mutonby#35.

- Shipped: enumerates the wizard, preset store, Settings CRUD, 7-step
  async pipeline, /api/restyle routes (with the 250MB cap + Content-Length
  preflight + fal hostname allowlist), and the Codex audit outcome.
- Stubbed in v1: Phase 0 model spike (blocked on FAL_KEY), placeholder
  History tab, and the 4 deferred Codex MEDIUM follow-ups.
- Later: keeps the pre-existing scope-deferral list and adds the
  saas/pipeline.py:_fal_run DRY cleanup as a separate-PR item.

Also adds AI_RESTYLE_MAX_FILE_SIZE_MB=250 to .env.example so the new
upload cap is discoverable and picked up by the auto-managed ENV table
in ~/.claude/CLAUDE.md.

scripts/update_claude_md.py was run; the module-map now includes
integrations/fal.py, ml/frame_{extract,relight}.py, ml/video_restyle.py,
restyle/pipeline.py, and routes/ai_restyle.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The API Keys section descriptions were written before AI Restyle landed
and didn't mention that:

- Gemini also powers Nano-Banana relighting (not just viral-moment
  detection / titles / thumbnails).
- fal.ai is now needed for AI Restyle's video-to-video step. The fal
  panel still said "Optional" and "legacy SaaS UGC pipeline" only.

Promotes fal.ai to "Required for AI Restyle" with an amber badge to
match Gemini and Upload-Post. Adds a one-line cost hint (~$1.50 per
30s clip) and a pointer to fal.ai/dashboard/keys. Updates Gemini's
description to call out AI Restyle's relight step + bumps the version
label from "2.5 Flash" to "2.5 / 3.x" since Nano-Banana is on the
3.x preview channel.

No behavioral changes; localStorage keys (gemini_key, falKey_v1) are
unchanged, so existing user keys keep working.

Frontend build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant