Fat Loss Insights Engine is a local Python pipeline for discovering Instagram fat-loss creators, scoring profiles, scraping posts, enriching media, classifying content with AI, and surfacing patterns in a Streamlit dashboard.
- Discovers public Instagram accounts from seed hashtags.
- Scores candidate profiles and keeps the top profiles for scraping.
- Scrapes posts and stores them in SQLite.
- Downloads audio, runs transcription and OCR, and enriches post metadata.
- Classifies posts into content categories for marketing analysis.
- Generates summary metrics and a local Streamlit dashboard.
- Python 3.11+
- Instaloader for Instagram scraping
- yt-dlp for media download
- Groq for transcription
- EasyOCR for local OCR
- Google Gemini 2.0 Flash for classification and strategy generation
- SQLAlchemy with SQLite for storage
- Streamlit, Pandas, and Plotly for the dashboard
main.py- pipeline orchestratordb.py- SQLAlchemy models and session factoryconfig.py- constants, seed hashtags, and scoring settingsscraper/- Instagram discovery, scoring, and scraping helpersenrichment/- audio download, transcription, and OCRclassifier/- classification prompts and batch classificationanalysis/- metrics and insight generationdashboard/app.py- Streamlit dashboard
- Create and activate a virtual environment.
- Install dependencies:
pip install -r requirements.txt- Create a
.envfile with the required API and Instagram credentials.
The pipeline expects these values in .env:
INSTAGRAM_USERNAMEINSTAGRAM_PASSWORDor cookie-based auth viainstagram_cookies.txtINSTAGRAM_COOKIE_BROWSERINSTAGRAM_COOKIE_FILEGEMINI_API_KEYPROBE_USERNAMEfor the optional rate-limit probe step
Run the pipeline step by step or end to end:
python main.py --step discover
python main.py --step score
python main.py --step scrape
python main.py --step enrich
python main.py --step classify
python main.py --step analyze
python main.py --step all
python main.py --step probeLaunch the dashboard:
streamlit run dashboard/app.py- SQLite database:
fat_loss_insights.db - Discovered profiles:
data/discovered_profiles.txtanddata/discovered_profiles.json - Insights summary:
data/insights_summary.json
- Add delays between Instagram requests to avoid rate limits and account bans.
- Keep Gemini calls paced at roughly one request every 4.1 seconds.
- Do not use a personal Instagram account for scraping.
The pipeline is resumable. Each stage writes to disk or the database before the next stage runs, so you can stop and continue later without starting over.