A web application that transcribes and summarizes audio/video content using OpenAI or AssemblyAI. Users can upload audio files or provide YouTube URLs to get AI-generated transcripts and summaries with plain text export.
- ποΈ Audio Transcription - Upload MP3, WAV, MP4, or M4A files
- π¬ YouTube Support - Transcribe videos directly from YouTube URLs
- π AI Summarization - Generate concise summaries with chunking for long transcripts
- π Provider Switch - Toggle between OpenAI Whisper and AssemblyAI
- ποΈ Model Picker - Choose transcription models per provider (Whisper-1, GPT-4o Transcription, Universal, Slam-1)
- π Text Export - Download transcripts and summaries as plain text files
- βοΈ Configurable - Customize API keys, models, and parameters per request
- π¨ Modern UI - Clean interface with drag-and-drop, light/dark themes
- π Progress Tracking - Real-time upload progress indicators
- Framework: Next.js with App Router
- UI: Tailwind CSS + shadcn/ui components
- Key Features:
- Drag-and-drop file upload
- YouTube URL input
- Real-time progress tracking
- Provider switch between OpenAI Whisper and AssemblyAI
- Model picker per provider (Whisper-1, GPT-4o Transcription, Universal, Slam-1)
- Copy to clipboard & text download
- Configurable API settings
- Framework: FastAPI with async support
- Core Services:
- OpenAI Whisper API integration
- AssemblyAI transcription integration with speech model selection
- Text summarization with smart chunking
- YouTube audio extraction via
yt-dlp - Plain text transcript export
- In-memory session management
- Node.js 18+ and
pnpm - Python 3.10+
- OpenAI API Key
- ffmpeg (for YouTube support)
# Install dependencies
pnpm install
# Run development server
pnpm devFrontend runs on http://localhost:3000
# Navigate to backend directory
cd backend
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY
# Run the server
uvicorn app.main:app --reloadBackend runs on http://localhost:8000
Create a .env file in the backend/ directory:
| Variable | Description | Default |
|---|---|---|
OPENAI_API_KEY |
OpenAI API key (required unless provided per-request) | - |
OPENAI_API_BASE |
Custom API base for compatible endpoints | - |
ASSEMBLYAI_API_KEY |
AssemblyAI API key (required when using AssemblyAI) | - |
TRANSCRIPTION_PROVIDER |
Default provider: openai or assemblyai |
openai |
STT_MODEL_NAME |
OpenAI speech-to-text model | gpt-4o-transcription |
ASSEMBLYAI_SPEECH_MODEL |
AssemblyAI speech model | universal |
SUMMARY_MODEL_NAME |
Summarization model | gpt-4o-mini |
SUMMARY_MAX_TOKENS |
Max tokens for summaries | 300 |
SUMMARY_CHUNK_WORDS |
Chunk size for long transcripts | 1200 |
REQUEST_TIMEOUT_SECONDS |
API call timeout | 600 |
SESSION_TTL_MINUTES |
Session lifetime | 240 |
MAX_UPLOAD_SIZE_MB |
File size limit | 200 |
CORS_ALLOW_ORIGINS |
Allowed origins | * |
LOG_LEVEL |
Logging verbosity | INFO |
Create a .env.local file in the project root (optional):
TRANSCRIPTION_API_URL=http://localhost:8000
NEXT_PUBLIC_DEFAULT_OPENAI_STT_MODEL=gpt-4o-transcription
NEXT_PUBLIC_DEFAULT_ASSEMBLY_MODEL=universal
NEXT_PUBLIC_DEFAULT_SUMMARY_MODEL=gpt-4o-mini
NEXT_PUBLIC_DEFAULT_SUMMARY_MAX_TOKENS=300
NEXT_PUBLIC_DEFAULT_TRANSCRIPTION_PROVIDER=openaiUpdate public/site-config.json to change the app name, tagline, or icon paths:
{
"name": "Transcribly",
"tagline": "Transcribe and summarize audio & video content",
"assets": {
"logo": "/logo.svg",
"logoDark": "/logo-dark.svg",
"favicon": "/favicon.svg",
"appleTouchIcon": "/apple-touch-icon.png"
}
}Replace the files in public/ (logo.svg, logo-dark.svg, favicon.svg, apple-touch-icon.png) with your own assets to rebrand the UI instantly.
POST /upload-audio- Upload audio file for transcriptionPOST /youtube-transcribe- Transcribe YouTube video by URLGET /download-transcript?session_id=...- Download transcript/summary as plain textGET /health- Health check endpoint
Requests can override defaults via:
X-API-KeyheaderX-AssemblyAI-Keyheader (AssemblyAI provider)- JSON fields:
apiKey,assemblyApiKey,assemblyModel,sttModel,summaryModel,summaryMaxTokens,provider - Multipart form fields with the same names
# Build Docker image
cd backend
docker build -t transcription-backend .
# Run container
docker run \
--env-file .env \
-p 8000:8000 \
transcription-backend- Connect your Git repository to Cloudflare Pages
- Configure build settings:
- Framework preset: Next.js
- Build command:
pnpm install && pnpm build - Build output directory:
.next
- Set environment variable:
TRANSCRIPTION_API_URL= Your backend URL
See DEPLOYMENT.md for detailed instructions.
transcription-app/
βββ app/ # Next.js app router pages
β βββ api/ # API route handlers
β βββ globals.css # Global styles
β βββ layout.tsx # Root layout
β βββ page.tsx # Home page
βββ components/ # React components
β βββ transcription-app.tsx
β βββ file-upload.tsx
β βββ youtube-input.tsx
β βββ transcript-display.tsx
βββ lib/ # Utility functions
βββ hooks/ # Custom React hooks
βββ backend/ # Python FastAPI backend
β βββ app/
β β βββ main.py # FastAPI app
β β βββ models.py # Pydantic models
β β βββ services/ # Core services
β βββ Dockerfile
β βββ requirements.txt
β βββ .env.example
βββ public/ # Static assets
βββ package.json
- Next.js 15, React 19, TypeScript
- Tailwind CSS + shadcn/ui
- React Dropzone
- Radix UI primitives
- FastAPI + Uvicorn
- OpenAI SDK
- yt-dlp (YouTube downloads)
- python-multipart (file uploads)
pnpm dev # Start dev server
pnpm build # Build for production
pnpm start # Start production server
pnpm lint # Run ESLintuvicorn app.main:app --reload # Dev server with hot reload
python -m pytest # Run tests (if configured)For issues and questions:
- Create an issue in the repository
- Check existing documentation in DEPLOYMENT.md and backend/README.md