Skip to content

nilukush/youtube-transcript

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

23 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

YouTube Transcript Fetcher

πŸš€ Try Live Demo | ⭐ Star on GitHub | πŸ’» CLI Guide

A powerful tool to fetch YouTube video transcripts via Web UI or CLI, with intelligent proxy support to bypass rate limiting.

Tests Python License PyPI version

Features

  • Web UI: Browser-based interface for fetching transcripts
  • CLI: Command-line interface for automation and scripting
  • Smart Proxy Support: Automatic proxy configuration to bypass YouTube rate limiting
  • Multiple Languages: Fetch transcripts in different languages
  • Multiple Formats: Output as plain text or JSON
  • Smart Caching: Database-backed caching to avoid redundant API calls

Quick Start πŸš€

Option 1: Web UI (Easiest - No Installation) 🌐

πŸš€ Try Live Demo

Web UI Demo

Works instantly in your browser - no installation required!

Perfect for: Quick transcripts, testing, non-technical users


Option 2: CLI (Install Locally) πŸ’»

Fetch transcripts from the command line. See Installation below for all installation methods.

# Example: Fetch transcript by URL
ytt fetch "https://youtu.be/dQw4w9WgXcQ"

Perfect for: Automation, scripting, power users


Option 3: Self-Hosted (Deploy Yourself) πŸ”§

Deploy your own instance:

πŸ“– Deployment Guide

Perfect for: Production use, custom configuration, full control


Installation

Option 1: Homebrew (macOS/Linux) ⭐

brew tap nilukush/ytt
brew install youtube-transcript-tools

Why Homebrew?

  • βœ… Single command installation
  • βœ… Automatic dependency management
  • βœ… Easy updates: brew upgrade youtube-transcript-tools
  • βœ… Native macOS package manager

Option 2: pipx (Isolated Environment)

pipx installs Python CLI tools in isolated environments, avoiding PEP 668 "externally-managed-environment" errors on macOS.

# Install pipx (one-time setup)
brew install pipx
pipx ensurepath

# Install ytt
pipx install youtube-transcript-tools

Why pipx?

  • βœ… No system Python conflicts
  • βœ… PEP 668 compliant
  • βœ… Easy updates: pipx upgrade ytt
  • βœ… Isolated from other tools

Option 3: pip (Virtual Environment)

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install
pip install youtube-transcript-tools

Option 4: pip (System-wide)

pip install youtube-transcript-tools

Note: If you see error: externally-managed-environment, use Option 1 (Homebrew), Option 2 (pipx), or Option 3 (virtual environment).


Option 5: From Source

git clone https://github.com/nilukush/youtube-transcript.git
cd youtube-transcript
pip install -e .

Development Installation

pip install -e ".[dev]"

Usage

Web UI

The Web UI provides the simplest way to fetch transcripts:

Starting the server:

uvicorn youtube_transcript.api.app:create_app --reload --host localhost --port 8888

Then open http://localhost:8888 in your browser.

Supported URL formats:

  • https://youtu.be/dQw4w9WgXcQ (shortened)
  • https://www.youtube.com/watch?v=dQw4w9WgXcQ (full URL)
  • dQw4w9WgXcQ (video ID only)

CLI

The CLI uses a fetch command to retrieve transcripts.

Quick examples:

# Fetch by URL
ytt fetch "https://youtu.be/dQw4w9WgXcQ"

# Fetch by video ID
ytt fetch dQw4w9WgXcQ

# Save to file
ytt fetch dQw4w9WgXcQ -o transcript.txt

# Output as JSON
ytt fetch dQw4w9WgXcQ --json

Basic usage:

ytt fetch "https://youtu.be/dQw4w9WgXcQ"

Advanced options:

# Language preference
ytt fetch dQw4w9WgXcQ --lang en

# Multiple languages
ytt fetch dQw4w9WgXcQ --lang en,es,fr

# Save to file
ytt fetch dQw4w9WgXcQ -o transcript.txt

# JSON output
ytt fetch dQw4w9WgXcQ --json

# Verbose mode
ytt fetch dQw4w9WgXcQ --verbose

All options:

Usage: ytt fetch [OPTIONS] URL_OR_ID

Options:
  --lang, -l      TEXT  Preferred language codes (comma-separated)
  --output, -o    TEXT  Output file path
  --json                Output in JSON format
  --verbose            Show detailed information
  --help, -h           Show this message

Troubleshooting

"No such command" Error

Wrong:

ytt "https://youtu.be/dQw4w9WgXcQ"

Correct:

ytt fetch "https://youtu.be/dQw4w9WgXcQ"

"Transcript Not Found" Error

This means:

  • The video doesn't have captions/subtitles enabled
  • The transcript is disabled by the uploader
  • The video ID is incorrect

Verification: Check if the video has captions on YouTube:

  1. Open the video on YouTube
  2. Click the "..." (more) button
  3. Look for "Show transcript" option

Rate Limiting (HTTP 429)

If you experience rate limiting:

  1. The application automatically uses proxy configuration (if set by the service provider)
  2. Try again later - rate limits reset over time
  3. Some videos may have stricter rate limits than others

CLI Not Found

If ytt command is not found:

# Reinstall the package
pip install -e .

# Or use Python module directly
python -m youtube_transcript.cli fetch "https://youtu.be/dQw4w9WgXcQ"

Development

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=src/youtube_transcript --cov-report=html

# Run specific test file
pytest tests/test_fetcher.py -v

Code Quality

# Format code
black src/ tests/

# Lint code
ruff check src/ tests/

# Type check
mypy src/

Project Structure

youtube-transcript/
β”œβ”€β”€ src/youtube_transcript/
β”‚   β”œβ”€β”€ api/              # FastAPI endpoints and web routes
β”‚   β”œβ”€β”€ cache/            # Redis caching layer
β”‚   β”œβ”€β”€ config/           # Configuration management
β”‚   β”œβ”€β”€ models/           # SQLModel database models
β”‚   β”œβ”€β”€ repository/       # Database repository layer
β”‚   β”œβ”€β”€ services/         # Business logic (fetcher, orchestrator)
β”‚   β”œβ”€β”€ static/           # CSS and static assets
β”‚   β”œβ”€β”€ templates/        # Jinja2 HTML templates
β”‚   β”œβ”€β”€ utils/            # URL parsing utilities
β”‚   └── cli.py            # CLI entry point
β”œβ”€β”€ tests/                # Pytest tests
└── pyproject.toml        # Project configuration

API Endpoints

The web server exposes the following endpoints:

  • GET / - Web UI homepage
  • GET /transcript?url=URL - Fetch transcript via GET
  • GET /transcript/{video_id} - Fetch transcript by video ID
  • GET /htmx/transcript?url=URL - HTMX endpoint for dynamic updates
  • GET /docs - Interactive API documentation (FastAPI auto-docs)

Performance

Metric Target Status
Cached Response p95 < 500ms βœ… Met
Uncached Response p95 < 10s βœ… Met
Test Coverage > 80% βœ… Met (100%)
URL Parse Success > 99.5% βœ… Met

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Write tests for your changes
  4. Ensure all tests pass
  5. Submit a pull request

For Application Owners

If you're deploying this application as a service, see DEPLOYMENT.md for:

  • Proxy configuration
  • Environment variables
  • Production deployment
  • Scaling considerations

License

MIT License - see LICENSE file for details.

Acknowledgments

Support

Releases

No releases published

Packages

No packages published