Text-to-Image, Text-to-Video, and Text-to-Speech in a single pipeline — runs on Apple Silicon (MPS), NVIDIA GPU (CUDA), or free Google Colab T4.
Generate rich media from text prompts — images, videos, and narrated audio — all from one Gradio interface. Designed to run free on Google Colab (T4 GPU) or locally on Mac M1/M2/M3 with Apple MPS acceleration.
| Mode | Model | Speed | Quality |
|---|---|---|---|
| Text-to-Image | SDXL-Turbo (Stability AI) | ~2s (MPS) | High |
| Text-to-Video | ModelScope text-to-video | ~30s (T4) | Medium |
| Text-to-Speech | Bark (Suno) | ~10s | High quality |
| Text-to-Speech | Edge TTS (Microsoft) | ~1s | Fast, free |
| Combined | Video + Audio narration | — | Full pipeline |
Text Prompt
│
├──▶ ImageGenerator ──▶ SDXL-Turbo (stabilityai/sdxl-turbo)
│ AutoPipelineForText2Image
│ Apple MPS / CUDA / CPU
│
├──▶ VideoGenerator ──▶ ModelScope (damo-vilab/text-to-video-ms-1.7b)
│ TextToVideoSDPipeline
│
├──▶ AudioGenerator ──▶ Bark (suno/bark) — high quality
│ ──▶ Edge TTS — fast, free
│
└──▶ Pipeline ──▶ Combined: video + audio → final output
- Open
notebooks/AI_Content_Generator_MVP.ipynb - Upload to Google Colab
- Runtime → Change runtime type → T4 GPU
- Run all cells → Gradio public URL is generated automatically
git clone https://github.com/apuroopy1-prog/ai-video-generator.git
cd ai-video-generator
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python app.py
# Open http://localhost:7860ai-content-generator/
├── app.py # Main Gradio web UI
├── quick_start.py # Minimal Python API example
├── requirements.txt
├── notebooks/
│ └── AI_Content_Generator_MVP.ipynb # Colab notebook
├── src/
│ ├── image_generator.py # SDXL-Turbo text-to-image
│ ├── video_generator.py # ModelScope text-to-video
│ ├── audio_generator.py # Bark + Edge TTS
│ └── pipeline.py # Combined video+audio pipeline
├── configs/ # Model config files
├── huggingface-space/ # HuggingFace Spaces deployment
└── outputs/ # Generated files (gitignored)
from src.image_generator import ImageGenerator
from src.video_generator import VideoGenerator
from src.audio_generator import AudioGenerator
# Generate image
img_gen = ImageGenerator()
image = img_gen.generate("A futuristic city at sunset, cyberpunk style")
# Generate video
vid_gen = VideoGenerator()
video = vid_gen.generate("A robot walking through a forest")
# Generate speech
audio_gen = AudioGenerator()
audio = audio_gen.generate("Welcome to the future of AI content creation")| Hardware | Image | Video | Audio |
|---|---|---|---|
| Apple M1/M2/M3 (MPS) | ✅ Fast | ✅ Slow | ✅ |
| NVIDIA GPU (CUDA) | ✅ Fast | ✅ Fast | ✅ |
| Google Colab T4 | ✅ Fast | ✅ Fast | ✅ |
| CPU only | ✅ Very slow | ✅ |
| Component | Technology |
|---|---|
| Text-to-Image | Diffusers, SDXL-Turbo, AutoPipelineForText2Image |
| Text-to-Video | ModelScope, TextToVideoSDPipeline |
| Text-to-Speech | Bark (suno), Edge TTS |
| UI | Gradio |
| Deep Learning | PyTorch (MPS + CUDA) |
| Models | HuggingFace Hub |
Apuroop Yarabarla — AI/ML Engineer & AI Product Owner