A production-grade content engine that crawls viral TikTok and Instagram posts, extracts hook patterns, and auto-generates ready-to-shoot reel scripts + SEO captions for @pixii_ai.
Built for the Pixii.ai Founding Engineer assignment.
One click: scrape > filter > analyze > extract > generate.
- Scrapes TikTok + Instagram via Apify (hashtag + keyword search, 3 different actors)
- Filters by language (fast-langdetect) and relevance (GPT-4o-mini)
- Analyzes media — Whisper transcribes spoken hooks, GPT-4o Vision reads text overlays from thumbnails
- Extracts hooks — classifies into 8 pattern types (Contrarian, How-To, Story-Open, etc.)
- Detects trends — repeated audio tracks across posts = trending
- Auto-generates 5 reel scripts for the top hooks — 4-part viral formula (Hook → Value Bomb → Proof → CTA) + SEO captions with 2026 keyword strategy
- Runs weekly via APScheduler cron job
The entire pipeline is brand-aware — voice, audience, pain points, CTAs are stored in Supabase and dynamically injected into every prompt. Edit the brand profile in the UI, and all future generations adapt.
Frontend (Next.js 16 + Tailwind + shadcn/ui)
↕ REST API
Backend (FastAPI + Python 3.11)
├── Apify (TikTok scraper, Instagram API scraper, Instagram keyword scraper)
├── OpenAI GPT-4o (extraction, script generation)
├── OpenAI GPT-4o-mini (relevance filtering)
├── OpenAI Whisper (video transcription)
├── OpenAI GPT-4o Vision (thumbnail text extraction)
├── LiteLLM (model routing + automatic fallback)
└── APScheduler (weekly auto-mining)
↕
Supabase (hooks, generated_scripts, mining_config, brand_config)
| Tool | Purpose |
|---|---|
| Apify | TikTok + Instagram scraping (3 actors) |
| OpenAI GPT-4o | Hook extraction, reel script generation, thumbnail vision analysis |
| OpenAI GPT-4o-mini | Relevance filtering (cheap/fast) |
| OpenAI Whisper | Video audio transcription |
| LiteLLM | Model routing with automatic fallback |
| Supabase | Database (hooks, scripts, config, brand profile) |
| APScheduler | Weekly auto-mining cron |
| fast-langdetect | Language detection for non-English filtering |
Thumbnails, pattern badges, engagement scores, transcript snippets, video links.
After mining: trending audio detection + 5 auto-generated reel scripts with SEO captions.
Editable brand profile with AI regeneration — voice, audience, content pillars, CTAs.
Tag-based inputs for hashtags and keywords per platform.
- Python 3.11+
- Node.js 18+
- Supabase project (free tier works)
- API keys: OpenAI, Apify, (optional: Anthropic)
git clone https://github.com/your-username/hook-mining-engine.git
cd hook-mining-engine
cp .env.example .env
# Fill in your API keys in .envRun these SQL files in order in your Supabase SQL editor:
supabase/migrations/001_create_hooks_table.sql
supabase/migrations/002_add_media_and_trends.sql
supabase/migrations/003_create_generated_scripts.sql
supabase/migrations/004_add_languages_config.sql
supabase/migrations/005_create_brand_config.sql
supabase/migrations/006_seed_mining_keywords.sql
cd backend
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r ../requirements.txt
cp ../.env .env
uvicorn app.main:app --reload --port 8000cd frontend
npm install
cp ../.env.example .env.local
# Set NEXT_PUBLIC_API_URL=http://localhost:8000
npm run dev"What does the user need?" — A social media manager doesn't want a database of hooks. They want ready-to-shoot scripts. So mining = scrape + analyze + extract + generate, all in one flow.
Brand-aware pipeline — Brand config lives in Supabase, not hardcoded. Edit voice/audience/CTAs in the UI → all future scripts adapt. AI can regenerate any section.
Resilient by default — Every async step retries once on failure then moves on. One failed transcription doesn't block 200 hooks. Individual LLM parse errors skip the bad item, not the whole batch.
Research-backed prompts — The 4-part viral hook formula (Emotional Trigger → Value Bomb → Proof → CTA) with tone/angle randomization (5 tones x 6 angles = 30 unique combos) prevents repetitive scripts.
Cost-conscious — Language filter runs before any LLM call. Relevance filter uses GPT-4o-mini (cheap). Only top 10 posts get media analysis. Only top 5 hooks get scripts generated.
- A/B tracking — Track which scripts get posted and their performance, feed results back into generation
- Multi-brand support — Brand config is already in Supabase; extend to multiple brands per account
- Webhook notifications — Slack/email alert when weekly mining finds high-engagement hooks
- Export to scheduling tools — One-click export scripts + captions to Buffer/Later/Hootsuite
Prantik Seal — prantik0004@gmail.com