Skip to content

namprice227/trendsync

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TripStory AI: Multilingual Holiday Recap Studio

TripStory AI turns a pile of holiday clips into a polished travel recap plan and stitched video.

Companion Documents

For deep technical and product context regarding the architecture design, refer to:

  • PRD.md: Product vision, target audience, and MVP milestones.
  • SCHEMA.md: Core JSON contracts (Scene Memory, Story Plan).
  • PROMPTS.md: VLM and LLM system prompts.
  • API_SPEC.md: REST API contracts.
  • EVAL_PLAN.md: Evaluation metrics for narrative coherence and narration restraint.

The current product flow is:

  1. Upload clips and videos from a trip.
    • The editor shows upload progress in the Media tab.
    • Uploaded clips are analyzed and kept as source media; they do not appear in the edit timeline until a story plan selects clip moments.
  2. Answer context questions:
    • Where did you go?
    • How long was the trip?
    • Which places did you visit?
    • Who was there?
    • What moments mattered most?
    • What language and tone should the voiceover use?
  3. Optionally draft and approve an AI producer brief from the clip intelligence.
  4. Generate a multilingual story plan from persisted scene memory. This creates story beats, narration lines, voiceover segments, captions, and the edit_decisions timeline.
  5. Render the final playable recap video from the generated story plan.

The LLM layer is vendor-neutral. By default the app works with a deterministic fallback, and the backend can use OpenAI, Gemini, or DeepSeek from server-side environment variables.


Architecture

flowchart LR
    user[Expo web or mobile app] --> api[FastAPI API<br/>api_server.py]
    api --> db[(SQLite<br/>sessions + jobs)]
    api --> media[(trip_sessions/<br/>uploaded media + renders)]
    api --> redis[(Redis queue)]
    redis --> worker[RQ worker<br/>worker.py]
    worker --> db
    worker --> media
    worker --> ffmpeg[ffmpeg / ffprobe]
    worker --> llm[LLM providers<br/>DeepSeek / OpenAI / Gemini / custom]
    worker --> vision[Vision analysis<br/>Gemini or OpenAI-compatible]
    worker --> tts[TTS / transcription<br/>Gemini or OpenAI optional]
    api --> user
Loading

The API remains responsive during story generation and rendering. It creates a job row in SQLite, enqueues the job in Redis/RQ, and the frontend polls /sessions/{session_id} for phase, screen, progress_percent, and active job details.

sequenceDiagram
    participant UI as Expo app
    participant API as FastAPI
    participant DB as SQLite
    participant Redis as Redis/RQ
    participant W as Worker
    participant F as ffmpeg
    participant AI as AI providers

    UI->>API: POST /sessions/{id}/generate-story
    API->>DB: create jobs row, state=queued
    API->>Redis: enqueue run_tripstory_job(job_id)
    API-->>UI: 202-style session response
    UI->>API: GET /sessions/{id} polling
    Redis->>W: deliver job
    W->>DB: state=analyzing/planning
    W->>F: probe media and audio levels
    W->>AI: vision/story/TTS calls when configured
    W->>DB: progress + final session update
    API-->>UI: active_job_state/progress/current_step
Loading

Key Files

File Purpose
api_server.py FastAPI app for trip sessions, media upload, story generation, rendering, and eval
worker.py RQ worker entrypoint for queued story and render jobs
llm_provider.py Vendor-neutral OpenAI-compatible chat client with rate limiting and retries
media_intelligence.py ffprobe/ffmpeg clip analysis and optional Gemini vision/transcription
scene_memory.py Evidence-linked scene memory artifact builder
trip_story.py Multilingual narrative and voiceover generation
trip_renderer.py Lightweight video assembly from uploaded clips via ffmpeg
tts_provider.py Server-side TTS narration via Gemini or OpenAI
tripstory_logging.py Structured API/worker logging helpers
mobile/App.tsx Expo app root and navigation orchestration
mobile/src/screens/ Four screens: Dashboard, Context, Plan (editor), Output
mobile/src/components/ 20+ reusable components (TimelineStrip, ProducerBriefPanel, Sidebar, etc.)
mobile/src/api.ts Mobile API client
mobile/src/types.ts Trip session, context, media, and story types

Legacy trend-analysis modules are still present in the repository for reference, but the active mobile/API product is now TripStory.

Environment Setup

Recommended Miniconda setup from the repository root:

conda env create -p "$PWD/.conda/trendsync-py312" -f environment.yml
conda activate "$PWD/.conda/trendsync-py312"
python -m pip install -r requirements.txt

The Conda environment provides Python 3.12, ffmpeg, Node.js, and npm. Python package dependencies are installed with pip from requirements.txt.

If your shell has user-site Python packages visible, keep the Conda environment isolated while installing or running:

export PYTHONNOUSERSITE=1

Run Redis, Worker, And API

Start Redis and the worker for queued story/render jobs:

redis-server
python worker.py

The Conda environment includes Redis. If using a plain venv, install/run Redis separately.

Keep the worker in the foreground while developing, or run it under a process supervisor. If you suspend the worker terminal with shell job control, the active worker child and any child ffmpeg process can stop and block the queue. The media probes are hardened with -nostdin, stdin=DEVNULL, and subprocess timeouts, but a stopped old worker must still be restarted.

Then start the API:

python -m uvicorn api_server:app --host 0.0.0.0 --port 8010

Optional LLM configuration:

cp .env.example .env

Edit .env, set TRIPSTORY_LLM_PROVIDER to your preferred provider (openai, gemini, deepseek, or local), fill in the corresponding API key, and set GEMINI_API_KEY for video-frame understanding and TTS. Set TRIPSTORY_TTS_PROVIDER=gemini (or openai) to enable narration. The API loads .env on startup. The frontend never receives or submits provider API keys.

Story generation uses the backend provider from .env by default. Generated plans include a generation block that records whether the plan came from the configured LLM provider or the deterministic local fallback. If the UI or logs report a fallback, check the provider/key settings or the recorded fallback reason.

LLM calls are serialized server-side to avoid accidental concurrent requests. Tune retry/rate behavior with:

TRIPSTORY_LLM_MIN_INTERVAL_SECONDS=3
TRIPSTORY_LLM_MAX_RETRIES=2
TRIPSTORY_LLM_TIMEOUT=120
TRIPSTORY_STORY_MAX_TOKENS=4096

DeepSeek reasoning models can return an empty content field when the output budget is too small for the story-planning JSON, and can take longer than smaller chat models. Keep TRIPSTORY_STORY_MAX_TOKENS at 4096 or higher if the story job reports an empty provider response, and keep TRIPSTORY_LLM_TIMEOUT around 120 seconds if DeepSeek times out.

Backend logs are structured as timestamped key/value JSON fields in the API and worker terminals. Optional logging controls:

TRIPSTORY_LOG_LEVEL=INFO
TRIPSTORY_LOG_HTTP_REQUESTS=1
TRIPSTORY_LOG_FILE=/tmp/tripstory.log
TRIPSTORY_LOG_API_PAYLOADS=0

With HTTP request logging enabled, session polling emits structured app-state fields such as session_phase, session_progress_percent, active_job_state, and active_job_step in addition to Uvicorn's access log. Set TRIPSTORY_LOG_LEVEL=DEBUG to include request-start records too.

Queued jobs older than TRIPSTORY_STALE_JOB_SECONDS are marked failed on the next session poll, so a lost Redis/RQ job does not leave the frontend spinner running forever. SQLite connections use WAL mode with TRIPSTORY_SQLITE_TIMEOUT_SECONDS=20 by default so the API and worker can share the local job/session database more reliably. Media probing uses explicit ffmpeg timeouts so broken clips cannot occupy the only worker forever:

TRIPSTORY_FFMPEG_PROBE_TIMEOUT=30
TRIPSTORY_FFMPEG_AUDIO_TIMEOUT=45
TRIPSTORY_FFMPEG_RENDER_TIMEOUT=300
TRIPSTORY_FFMPEG_AUDIO_MIX_TIMEOUT=300
TRIPSTORY_FFMPEG_BIN=
TRIPSTORY_FFPROBE_BIN=

The renderer resolves ffmpeg and ffprobe from PATH, then from the active Python environment's bin directory. Set TRIPSTORY_FFMPEG_BIN and TRIPSTORY_FFPROBE_BIN only if your binaries live somewhere else.

Watch live logs in the terminals running python -m uvicorn ... and python worker.py, or follow the optional file:

tail -f /tmp/tripstory.log

To confirm provider calls are serialized, start one story generation and watch for one llm_request_attempt at a time followed by llm_request_complete or llm_request_retry. Narration uses the same pattern with tts_request_attempt.

Default models:

  • OpenAI: gpt-4o-mini
  • Gemini: gemini-3.1-flash-lite
  • DeepSeek: deepseek-v4-pro

You can also export the variables directly instead of using .env:

export TRIPSTORY_LLM_PROVIDER="openai"   # openai, gemini, deepseek, local, or custom
export OPENAI_API_KEY="your-openai-key"

Provider-specific key variables are OPENAI_API_KEY, GEMINI_API_KEY, and DEEPSEEK_API_KEY. Provider-specific model overrides are TRIPSTORY_OPENAI_MODEL, TRIPSTORY_GEMINI_MODEL, and TRIPSTORY_DEEPSEEK_MODEL.

Recommended DeepSeek + Gemini split:

export TRIPSTORY_LLM_PROVIDER="deepseek"
export DEEPSEEK_API_KEY="your-deepseek-key"
export TRIPSTORY_DEEPSEEK_MODEL="deepseek-v4-pro"
export GEMINI_API_KEY="your-gemini-key"
export TRIPSTORY_VISION_PROVIDER="gemini"
export TRIPSTORY_ENABLE_VISION_ANALYSIS=1

Note: Avoid setting TRIPSTORY_DEEPSEEK_THINKING=enabled with TRIPSTORY_DEEPSEEK_REASONING_EFFORT=high for story generation. The model spends its token budget on reasoning before writing the JSON response, which causes repeated timeouts and can block the worker for 5–10 minutes. Leave thinking disabled or use reasoning_effort=low if you need it.

Narration uses server-side TTS when configured. Recommended Gemini TTS setup:

TRIPSTORY_TTS_PROVIDER=gemini
TRIPSTORY_TTS_MODEL=gemini-3.1-flash-tts-preview
TRIPSTORY_TTS_VOICE=Kore
TRIPSTORY_TTS_MIN_INTERVAL_SECONDS=3
TRIPSTORY_TTS_MAX_RETRIES=2

Gemini TTS uses GEMINI_API_KEY and writes voiceover.wav. OpenAI TTS is still supported with TRIPSTORY_TTS_PROVIDER=openai, OPENAI_API_KEY, gpt-4o-mini-tts, and an OpenAI voice such as coral.

Clip speech transcription is off by default because it sends extracted audio to OpenAI:

TRIPSTORY_ENABLE_TRANSCRIPTION=1

Sampled-frame visual understanding is enabled when GEMINI_API_KEY is present and TRIPSTORY_ENABLE_VISION_ANALYSIS=1. This adds one serialized Gemini vision request per uploaded clip so the DeepSeek story planner can see visible subjects, scenes, actions, best frame descriptions, and avoid reasons.

TRIPSTORY_ENABLE_VISION_ANALYSIS=1
TRIPSTORY_VISION_PROVIDER=gemini
TRIPSTORY_VISION_MODEL=gemini-3.1-flash-lite
TRIPSTORY_VISION_MAX_FRAMES=3
TRIPSTORY_VISION_MIN_INTERVAL_SECONDS=3

For a custom OpenAI-compatible endpoint:

export TRIPSTORY_LLM_URL="http://localhost:8000/v1"
export TRIPSTORY_LLM_MODEL="your-model-name"
export TRIPSTORY_LLM_API_KEY="optional-key"

If TRIPSTORY_LLM_URL is not set, TripStory uses a local fallback that still returns a usable narrative plan.

Run The Mobile App

cd mobile
npm ci
npm run web -- --clear --port 8081

Default API URL:

  • Local iOS/Web: http://localhost:8010
  • Android emulator: http://10.0.2.2:8010
  • Expo web on another hostname, including LAN or Tailscale: same protocol and hostname on port 8010, for example http://100.68.189.117:8010 when the frontend is opened at http://100.68.189.117:8081.

If TRIPSTORY_CORS_ORIGINS is restricted in .env, add the exact frontend origin you open in the browser, such as:

TRIPSTORY_CORS_ORIGINS=https://mangasmith.com,https://www.mangasmith.com,http://100.68.189.117:8081

Restart the API after changing CORS settings.

Current Output

TripStory renders a stitched recap video, analyzes uploaded clips, saves scene_memory.json and the generated story plan beside it as JSON, and mixes generated narration into the video when server-side TTS is configured. Source clips are listed in the Media tab; the preview player currently focuses on the final rendered video.

Operations Notes

Use three long-running processes locally:

Process Command Responsibility
Redis redis-server Stores queued and started RQ jobs
Worker python worker.py Runs clip analysis, story planning, TTS, and render jobs
API python -m uvicorn api_server:app --host 0.0.0.0 --port 8010 Serves the frontend, persists sessions, and reports job progress

If the frontend spinner is stuck, check /sessions/{session_id} logs first. A healthy queued render behind another active job looks like active_job_state=queued. A blocked worker usually shows one old started RQ job, one busy worker, and no recent job_progress lines.

For ffmpeg-specific stalls, check process state. STAT T or Tl means the process is stopped by job control, not doing slow encoding. Restart the worker so it loads the latest subprocess hardening and releases the queue.

Implemented Now

  • FastAPI session API with health check, session creation, context save, media upload, story generation, and render endpoints.
  • Expo mobile app for connecting to the API, uploading video files with progress feedback, entering trip context, drafting a producer brief, choosing voiceover language, generating a story plan, and previewing the rendered output.
  • Vendor-neutral LLM client with OpenAI, Gemini, DeepSeek, custom OpenAI-compatible endpoints, and a deterministic local fallback when no API key or endpoint is configured.
  • Multilingual story-plan generation contract with title, language, tone, narrative arc, voiceover script, edit notes, and clip plan.
  • Server-side Gemini or OpenAI text-to-speech narration and ffmpeg audio mixing under the final video when TTS is configured.
  • Clip intelligence for uploaded videos: duration, resolution, scenes, blur/quality, face hits, audio levels, best-moment timestamps, scenic candidates, optional speech transcription, and optional sampled-frame visual summaries.
  • Persisted scene memory with transcript, visual summary, evidence, narrative role candidates, energy/confidence scores, risks, and a downloadable scene_memory.json artifact.
  • LLM-driven smart edit planning with concrete edit_decisions: clip ID, source start time, duration, role, transition, caption, audio strategy, and clip-grounded reasoning for every selected segment.
  • Separate normalized story_beats, narration_lines, and timeline-aligned voiceover_segments that make the generated plan inspectable and reusable.
  • Story-aware rendering that follows generated edit_decisions, trims around chosen moments, adds fades, creates a title/date card, supports portrait/landscape/square exports, and saves segment-timed SRT/VTT subtitles plus edit_decisions.json.
  • Target render duration controls for short social cuts, with the backend scaling segment durations and validating the final media length.
  • RQ + Redis queueing for story generation and rendering, with a SQLite-backed jobs table and frontend-visible job progress.
  • SQLite-backed session/project persistence with JSON backup compatibility, project listing, share tokens, and optional API token authentication.
  • Upload hardening with file type checks, upload size limits, render progress, event logs, and server-side cleanup-ready project deletion.
  • Mobile project library, favorite/exclude/reorder clip controls, scene pin/exclude controls, generated timeline timing/reorder/script editing, export controls, share action, upload progress, and render progress display.
  • Backend smoke test covering session creation, context save, upload, story generation, render, and persistence reload.

Remaining Production Gaps

  • Full identity provider login, password reset, billing, teams, and production-grade role management. The MVP has owner headers and optional API token auth.
  • A fully distributed production workflow system. The MVP uses RQ + Redis locally with SQLite job state.
  • Dedicated landmark-recognition model. The MVP can use sampled-frame visual summaries and context hints, but it does not run a specialist landmark classifier.
  • Native mobile camera capture, offline/resumable uploads, drag-and-drop gestures, source-clip preview before render, native share sheets, and a full nonlinear editor.
  • The older TrendFlow TikTok analysis modules are not folded into the TripStory product beyond remaining available as separate legacy modules.

Development Roadmap

  1. Stabilize the TripStory MVP: keep the backend smoke flow and mobile typecheck passing, and add a small manual QA checklist for real device uploads.
  2. Upgrade persistence/auth for deployment: replace owner headers with real account sessions, add roles, and move background work to a queue.
  3. Improve rendering and preview: add source-clip playback before render, true xfade transitions, music library selection, map cards, subtitle burn-in, and a timeline preview.
  4. Improve narration controls: add voice selection, playback, subtitles styling, and per-language narration tuning.
  5. Improve media understanding: extract thumbnails, true landmark recognition, GPS/date metadata when available, and stronger story-aware highlight selection.
  6. Build editing controls in mobile: [Implemented] mark favorites, reorder/exclude clips, pin/exclude scenes, edit generated segment scripts, adjust segment start/duration, reorder generated segments, choose tone/language, and select output aspect ratio. Still needed: source-clip playback, drag gestures, and richer timeline preview.
  7. Evaluate output: [Implemented] /eval/dashboard endpoint for TripStory narrative metrics. Still needed: automated narration density and restraint scoring.
  8. Prepare for production: [Partially implemented] deployment scripts and systemd service files added under scripts/ and deployment/. Still needed: resumable uploads, full auth, and monitoring.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors