A production-ready web scraping platform with a beautiful dark-themed dashboard UI, built with FastAPI and Next.js.
- Scrapling-Powered Scraping Engine β Three scraping modes powered by Scrapling:
- Fast Mode β Pure HTTP fetching via
Fetcher(fastest, for static sites) - Dynamic Mode β Playwright-based via
DynamicFetcher(for JS-rendered pages) - Stealth Mode β Max anti-detection via
StealthyFetcher(bypasses Cloudflare, WAFs, and bot protection) - Auto-fallback: fast β dynamic β stealth if no results found
- Fast Mode β Pure HTTP fetching via
- Advanced Anti-Detection β Scrapling's built-in TLS fingerprinting, CDP leak fix, WebRTC leak fix, canvas noise injection, headless bypass, timezone matching, plus 50+ rotating User-Agents, header randomization, proxy rotation, CAPTCHA detection, exponential backoff
- Beautiful Dashboard β Dark-themed UI built with Next.js, shadcn/ui, Recharts, and Framer Motion
- Job Scheduling β Run scraping jobs on demand or schedule them with cron expressions
- Real-time Updates β WebSocket support for live job progress tracking
- Data Export β Export scraped data as CSV or JSON
- Fully Dockerized β One command to start everything
The frontend works standalone with realistic mock data β just run
npm run devto see the full UI.
| Dashboard | Jobs |
|---|---|
| Stats, activity charts, recent jobs | Create, manage, and monitor scraping jobs |
| Results | Settings |
|---|---|
| Search, filter, and export scraped data | Configure proxies, anti-detection, and exports |
- FastAPI β Async Python web framework
- SQLAlchemy + aiosqlite β Async SQLite database
- Scrapling β Advanced anti-bot scraping framework (3 fetcher modes)
- httpx β Async HTTP client
- Patchright + Playwright β Anti-detection headless browsers for JS-heavy pages
- APScheduler β Job scheduling
- BeautifulSoup4 β HTML parsing
- Next.js 14 β React framework with App Router
- shadcn/ui β Radix UI + Tailwind CSS component library
- Recharts β Charting library
- Framer Motion β Animations
- Lucide Icons β Icon set
- TypeScript β Type safety
docker-compose up --build- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
playwright install chromium
python -m patchright install chromium
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000cd frontend
npm install
npm run devOpen http://localhost:3000 in your browser.
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/jobs |
Create a scraping job |
GET |
/api/jobs |
List all jobs |
GET |
/api/jobs/{id} |
Get job details + results |
POST |
/api/jobs/{id}/run |
Run a job immediately |
DELETE |
/api/jobs/{id} |
Delete a job |
GET |
/api/results/{job_id} |
Get scraped results |
GET |
/api/results/{job_id}/export?format=csv |
Export as CSV |
GET |
/api/results/{job_id}/export?format=json |
Export as JSON |
GET |
/api/stats |
Dashboard statistics |
WS |
/ws/jobs/{id} |
Real-time job progress |
| Mode | Fetcher | Best For | Speed |
|---|---|---|---|
| Fast | Fetcher |
Static HTML pages, APIs | Fastest |
| Dynamic | DynamicFetcher |
JS-rendered SPAs, infinite scroll | Medium |
| Stealth | StealthyFetcher |
Cloudflare, DataDome, bot-protected sites | Slowest |
The engine automatically falls back through modes (fast β dynamic β stealth) if no results are found.
Scrapling built-in:
- TLS fingerprint mimicry
- CDP (Chrome DevTools Protocol) leak fix
- WebRTC leak prevention
- Canvas fingerprint noise injection
- Headless browser detection bypass
- Timezone and locale matching
- Adaptive element tracking (resilient to site layout changes)
Additional layers:
- 50+ real browser User-Agent strings
- Randomized request headers (Accept-Language, Accept-Encoding, Sec-Fetch-*)
- Configurable request delays (2-5s default)
- HTTP/SOCKS5 proxy support with rotation
- CAPTCHA detection (reCAPTCHA, hCaptcha, Cloudflare)
- Referer chain simulation
- Exponential backoff with configurable retries
- Cookie and session management
smart-scraper/
βββ backend/
β βββ app/
β β βββ main.py # FastAPI app entry
β β βββ api/
β β β βββ routes.py # All API endpoints
β β β βββ schemas.py # Pydantic models
β β βββ scraper/
β β β βββ engine.py # Core scraping engine
β β β βββ anti_detect.py # Anti-detection system
β β β βββ parsers.py # HTML parsing
β β β βββ scheduler.py # Job scheduling
β β βββ db/
β β β βββ database.py # SQLite setup
β β β βββ models.py # DB models
β β βββ export/
β β βββ exporter.py # CSV/JSON export
β βββ requirements.txt
β βββ Dockerfile
βββ frontend/
β βββ src/
β β βββ app/ # Next.js pages
β β βββ components/ # React components
β β βββ lib/ # Utilities + API client
β βββ package.json
β βββ Dockerfile
βββ docker-compose.yml
βββ README.md
See ROADMAP.md for planned features:
- Phase 1 β JS Interactions (click, scroll, load more)
- Phase 2 β Cookie/Session Injection (authenticated scraping)
- Phase 3 β CAPTCHA Solving Service (2Captcha, CapSolver)
- Phase 4 β LLM Auto-Selector (auto-generate CSS selectors)
- Phase 5 β Site Template Library (pre-built configs for common sites)
- Phase 6 β Webhook & Notifications
MIT