Government data is public. Accountability shouldn't require a PhD.
CivLib turns 91,000-row civic datasets into actionable intelligence โ in seconds.
Live Demo ยท Features ยท Architecture ยท Quick Start ยท Screenshots
CivLib is an open-source civic intelligence platform built for Bengaluru (and any city that publishes open government data). It aggregates datasets from official portals, runs automated statistical audits, flags anomalies, and lets any citizen โ researcher, journalist, or policymaker โ interrogate the data in plain English.
No data science background required. No API keys to manage. Just ask a question.
Built in 48 hours for a civic-tech hackathon. Powered by Groq's LLaMA 3.1, FastAPI, Next.js 15, and a relentless belief that public data should be genuinely public.
Datasets are never pre-cached into a database. Every audit is a live JIT (Just-in-Time) fetch directly from government portals โ CSV, XLSX, or PDF โ streamed to the browser with real-time progress indicators.
Connecting to Supabase catalog... 5%
Downloading 2.3 MB of CSV data... 28%
Parsed 91,620 rows ร 14 columns... 48%
Running anomaly detection... 80%
Detected 3 anomalies across 91,620 rows 100%
Every dataset gets a GROQ-accelerated Llama-3.1-8b analysis streamed character-by-character into a terminal-style UI. The AI cites actual numbers, names specific outlier entities, and explains what the data means for citizens.
Ask questions in plain English. The system uses Groq to generate a pandas expression, executes it safely against the live dataframe, and returns a plain-language explanation:
- "Which ward has the highest complaint count?"
- "How many records are above the average budget allocation?"
- "Show me the top 5 outliers"
Select any two datasets from the catalog and run an AI-powered correlation analysis. The engine:
- Fetches and audits both datasets in parallel
- Identifies shared anomaly entities (locations/departments appearing as outliers in both)
- Synthesizes a 4โ5 sentence policy-grade insight using Llama 3.1
When a dataset contains latitude/longitude columns (auto-detected via regex, no configuration needed), an interactive Leaflet map renders automatically with:
- Heat-colored markers (blue โ red) scaled by relative metric value
- Tooltip with entity name, metric value, and coordinates
- Auto-fitting bounds for any geography
Filter any dataset to a specific Ward, District, Pincode, or any string value โ without reloading. The backend re-runs the full statistical analysis on only the matching rows, so anomaly detection is always local to the slice.
Four chart types โ Bar, Line, Area, Pie โ rendered via Recharts with intelligent metric selection:
- Mirrors the backend's
run_analyticsalgorithm: skips ID/coordinate columns - Picks the highest-variance numeric column as the primary metric
- Anomalous entities render as red bars
- Y-axis auto-formats (
22k,4.5M) for readability
One-click report generation saves stats, AI analysis, anomaly flags, and NL query history to Supabase and returns a public shareable URL. Falls back to a downloadable JSON if the backend is unavailable.
The search engine expands queries with a synonym graph (accident โ fatal, rto, traffic, motor), scores datasets by title/tags/headers/description relevance, and returns ranked results with confidence percentages.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FRONTEND โ
โ Next.js 15 (App Router) ยท TypeScript ยท Tailwind CSS โ
โ Recharts ยท Framer Motion ยท React-Leaflet โ
โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ EventSource (SSE streaming)
โ REST (POST /api/*)
โโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ BACKEND โ
โ FastAPI ยท Python ยท Pandas ยท NumPy โ
โ Groq SDK (Llama-3.1-8b-instant) โ
โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โโโโโโโโโโโโผโโโโโโโ โโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโ
โ Supabase โ โ Live Open Government APIs โ
โ (Catalog DB โ โ data.gov.in ยท catalog.data โ
โ + Reports) โ โ .gov ยท Direct CSV/XLSX/PDF โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Data flow for a single dataset audit:
- Browser opens an
EventSourceto/api/jit-stream/{id} - Backend fetches the raw file from the government portal URL stored in Supabase
- Pandas parses and cleans the dataframe (handles encoding issues, unstructured regional data)
run_analytics()finds the highest-variance useful numeric column, runs 2ฯ outlier detection- Results stream to the browser as SSE events with progress percentages
- On completion, the full payload is sent as the final
doneevent - React triggers Groq AI analysis as a separate POST call, streamed to the terminal
Semantic search across Bengaluru's civic data catalog. Type in natural language, filter by department.
Real-time streaming audit of 91,620 BBMP grievance records. Cross-dataset correlation active.
Llama-3.1-8b streams a 4-sentence analysis citing actual statistics and naming outlier entities.
Two datasets loaded simultaneously. Shared anomaly entities flagged in red.
Auto-detected lat/lng columns rendered as a heat-colored interactive map. Zero configuration.
- Node.js 18+
- Python 3.11+
- A Groq API key (free tier works)
- A Supabase project
git clone https://github.com/DILIP-SHEESH/bap.git
cd bapcd backend
pip install -r requirements.txt
# Create .env
cat > .env << EOF
GROQ_API_KEY=your_groq_key_here
SUPABASE_URL=your_supabase_url
SUPABASE_KEY=your_supabase_anon_key
EOF
# Run
uvicorn app.main:app --reload --port 8000cd frontend
npm install
# Create .env.local
echo "NEXT_PUBLIC_API_URL=http://127.0.0.1:8000" > .env.local
# Run
npm run dev# Fetch live datasets from data.gov.in and catalog.data.gov
curl -X POST "http://localhost:8000/api/seed-all"Open http://localhost:3000 ๐
Run this SQL in your Supabase dashboard:
-- Dataset catalog
create table data_catalog (
id bigserial primary key,
title text not null,
description text,
source_url text,
direct_csv_link text,
tags text[],
column_headers text[]
);
-- Shareable audit reports
create table public_reports (
id text primary key,
dataset_title text,
stats jsonb,
flags jsonb,
ai_analysis text,
chart_data jsonb,
nl_queries jsonb,
created_at timestamptz default now()
);| Method | Endpoint | Description |
|---|---|---|
GET |
/api/jit-stream/{id} |
SSE stream โ live fetch, clean, analyze |
GET |
/api/jit-stream/{id}?region=Whitefield |
Same with region filter applied |
POST |
/api/ai-analyze |
Groq LLM analysis of stats + anomalies |
POST |
/api/nl-query |
Natural language โ pandas โ answer |
POST |
/api/correlate |
Cross-dataset AI correlation |
POST |
/api/search |
Semantic dataset search |
POST |
/api/save-report |
Persist audit report to Supabase |
GET |
/api/get-report/{id} |
Retrieve a saved report |
POST |
/api/seed |
Fetch datasets from CKAN by keyword |
POST |
/api/seed-all |
Multi-domain live aggregation |
GET |
/health |
Engine status |
The engine avoids naive ID-column detection through strict regex filtering, then selects the most statistically meaningful metric:
# 1. Filter out noise columns (IDs, coordinates, phone numbers, etc.)
skip_regex = re.compile(
r'\b(id|sl|no|sr|sno|pin|code|year|phone|mobile|lat|lng|latitude|longitude|index)\b',
re.IGNORECASE
)
useful = [c for c in numeric_cols if not skip_regex.search(str(c))]
# 2. Pick highest-variance column (most meaningful signal)
best_col = max(useful, key=lambda c: series(c).var())
# 3. Flag outliers beyond 2 standard deviations
threshold = avg + (2.0 * std_dev)
anomalies = df[df[best_col] > threshold]This ensures the chart and analysis are always about real civic metrics (complaint counts, budget allocations, incident rates) โ never complaint IDs or row numbers.
| Layer | Technology |
|---|---|
| Frontend Framework | Next.js 15 (App Router, Turbopack) |
| UI Language | TypeScript |
| Styling | Tailwind CSS |
| Charts | Recharts |
| Maps | React-Leaflet + OpenStreetMap |
| Animations | Framer Motion |
| Backend Framework | FastAPI |
| Data Processing | Pandas, NumPy |
| AI Inference | Groq Cloud (Llama-3.1-8b-instant) |
| Database | Supabase (PostgreSQL) |
| Streaming | Server-Sent Events (SSE) |
| File Parsing | CSV ยท XLSX (openpyxl/xlrd) ยท PDF (pdfplumber) |
| Deployment | Vercel (frontend) ยท Render/Railway (backend) |
bap/
โโโ frontend/
โ โโโ app/
โ โ โโโ page.tsx # Home โ dataset search & catalog
โ โ โโโ dataset/[id]/
โ โ โ โโโ page.tsx # Audit dashboard (main experience)
โ โ โโโ correlation/
โ โ โโโ page.tsx # Standalone correlation engine
โ โโโ components/
โ โ โโโ StopsMap.tsx # Leaflet geo map component
โ โโโ public/
โ โโโ screenshots/ # App screenshots for README
โ
โโโ backend/
โโโ app/
โโโ main.py # All FastAPI endpoints
โโโ database.py # Supabase client init
Pull requests are welcome. For major changes, please open an issue first.
# Fork โ Clone โ Branch โ PR
git checkout -b feature/your-feature-name
git commit -m "feat: add your feature"
git push origin feature/your-feature-nameCommit-Men ( BUILD FOR BENGALURU - 2026 ) CivLib - making government accountable




