Skip to content

Origami AI: Automate your video production pipeline. Transforms static PDF documents into professional narrated presentations using advanced script synthesis and programmatic video rendering.

License

Notifications You must be signed in to change notification settings

IslandApps/Origami-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Origami

License Issues Stars Forks

Origami is a high-fidelity automated video generation platform that transforms PDF slides into cinematic narrated content. By leveraging advanced AI for script synthesis, professional Text-to-Speech (TTS), and programmatic video rendering, Origami streamlines the creation of engaging presentations, tutorials, and educational media directly from your static documents.

Table of Contents

Features

  • PDF to Presentation: Upload PDF slides and automatically extract them into a sequence of video scenes.
  • AI-Powered Scripting: Integrated with Google Gemini AI and WebLLM (Local Browser Inference) to transform fragmented slide notes into coherent, professional scripts.
  • High-Quality TTS: Supports local and cloud-based Text-to-Speech using Kokoro-js.
    • Local Inference: Run TTS entirely locally via Dockerized Kokoro FastAPI.
    • Hybrid Voices: Create custom voice blends by mixing two models with adjustable weights.
  • Rich Media Support: Insert MP4 videos and GIFs seamlessly between slides.
  • Programmatic Video Rendering: Frame-perfect assembly powered by high-performance canvas rendering.
  • Smart Audio Engineering:
    • Auto-Ducking: Background music volume automatically lowers during voiceovers.
    • Normalization: Final render is automatically normalized to YouTube standards (-14 LUFS).
  • Interactive Slide Editor: Drag-and-drop reordering, real-time preview, and batch script updates.

Getting Started

Prerequisites

Installation

  1. Clone the repository:

    git clone https://github.com/IslandApps/Origami-AI.git
    cd Origami-AI
  2. Install dependencies:

    npm install
  3. Start the development server (runs both Vite and the rendering server):

    npm run dev

The application will be available at http://localhost:3000.

Deployment (Docker)

To deploy this application using Docker, you must first clone the repository, as the image is built locally from the source.

  1. Clone the repository:
    git clone https://github.com/IslandApps/Origami-AI.git
    cd Origami-AI

Using Docker Compose (Recommended)

A docker-compose.yml file is provided in the root directory. To start the application, run:

docker-compose up -d

Example docker-compose.yml:

services:
  origami-ai:
    build: .
    container_name: origami-ai
    ports:
      - "3000:3000"
    restart: unless-stopped
    environment:
      - PORT=3000
      - NODE_ENV=production

Using Docker CLI

  1. Build the image:

    docker build -t origami-ai .
  2. Run the container:

    docker run -d -p 3000:3000 --name origami-ai origami-ai

Self-Hosting with Dokploy

This project is designed to be easily self-hosted using Dokploy. Simply point Dokploy to this repository, and it will automatically detect the Dockerfile and docker-compose.yml to orchestrate the deployment.

The application will be available at http://localhost:3000.

Usage

1. Upload & Analyze

Drag and drop your presentation PDF into the main upload area. The application will process text from each page to create initial slides.

2. Configure & Enhance

Scroll down to the Configure Slides panel to manage your project globally:

  • Global Settings: Set a global voice (or create a custom Hybrid Voice), adjust post-slide delays, or run batch operations like "Find & Replace".
  • Media Assets: Click Insert Video to add MP4 clips or GIFs between slides.
  • Audio Mixing: Upload custom background music or select from the library (e.g., "Modern EDM"). Use the sliders to mix volume levels.

3. Creating the Narrative

In the Slide Editor grid:

  • AI Scripting: Click the AI Fix Script button (Sparkles icon) to have Gemini rewrite raw slide text into a natural spoken script.
  • Manual Editing: Edit scripts directly. Highlight specific text sections to generate/regenerate audio for just that part.
  • Generate Output: Click the Generate TTS button (Speech icon) to create voiceovers.
  • Preview: Click the Play button to hear the result or click the slide thumbnail to expand the visual preview.

4. Render

Click the Download Video button. The application will:

  1. Pre-process the slide configuration in your browser.
  2. Render frames in parallel using your browser's resources.
  3. Process the final video and audio mix using client-side FFmpeg WASM.
  4. Normalize the audio to -14 LUFS and download the resulting MP4.

Configuration

Open the Settings Modal (Gear Icon) to customize the application:

1. API Keys (Script Generation)

Configure the AI model used for script refinement ("AI Fix Script").

  • Google Gemini: Built-in and recommended. Requires a Google AI Studio API Key.
  • Custom/OpenAI-Compatible: Point to any OpenAI-compatible endpoint (e.g., LocalAI, Ollama, vLLM).
    • Base URL: Enter your provider's URL (e.g., http://localhost:11434/v1).
    • Model Name: Specify the model ID (e.g., llama-3).
    • API Key: Enter if required by your provider.

2. Text-to-Speech (TTS)

  • Engine: Choose between the internal Web Worker (client-side) or a local Dockerized Kokoro instance (faster/server-side).
  • Audio Defaults: Set default voice models and quantization levels (q4/q8).

3. Background Music Library

You can build your own library of background music tracks that will be available in the dropdown menus:

  1. Navigate to the src/assets/music/ directory.
  2. Paste your .mp3 files here.
  3. The application will automatically detect these files and list them in the UI (e.g., my_cool_track.mp3 becomes "My Cool Track").

Tech Stack

  • Frontend: React 19, Vite, Tailwind CSS (v4)
  • Video Engine: FFmpeg WASM (Client-side)
  • AI: Google Gemini API & WebLLM (Local Browser Inference)
  • TTS: Kokoro (FastAPI / Web Worker)
  • Backend: Express.js (serving as a rendering orchestration layer)
  • Utilities: Lucide React (icons), dnd-kit (drag & drop), pdfjs-dist (PDF processing)

Project Structure

  • src/components/: React UI components (Slide Editor, Modals, Uploaders).
  • src/services/: Core logic for AI, TTS, PDF processing, and local storage.
  • server.ts: Express server handling static file serving and SPA routing.

Roadmap & TODO

  • YouTube Metadata Generator: Automatically generate optimized titles and descriptions using Gemini.
  • Thumbnail Generator: Create custom YouTube thumbnails based on slide content.
  • Voiceover Recording: Support for recording custom voiceovers directly within the app using a microphone.
  • Header Layout Optimization: Refactor and organize the application header for better aesthetics and usability.

Acknowledgements

This project is made possible by the following incredible open-source libraries and projects:

  • FFmpeg.wasm: Enabling frame-perfect video assembly directly in the browser.
  • WebLLM: Bringing high-performance local LLM inference to the web.
  • Kokoro-js: Providing high-quality, local Text-to-Speech capabilities.
  • Hugging Face Transformers: Powering state-of-the-art machine learning in the browser.
  • PDF.js: The standard for parsing and rendering PDF documents.
  • Lucide React: Beautifully crafted open-source icons.
  • dnd-kit: A modern, lightweight toolkit for drag-and-drop interfaces.
  • Dokploy: The open-source platform used for seamless self-hosting and deployment.
  • Google Antigravity: The AI-powered IDE used to build and refine this project.

License

MIT

About

Origami AI: Automate your video production pipeline. Transforms static PDF documents into professional narrated presentations using advanced script synthesis and programmatic video rendering.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages