Skip to content

DILIP-SHEESH/bap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

28 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation



Government data is public. Accountability shouldn't require a PhD.
CivLib turns 91,000-row civic datasets into actionable intelligence โ€” in seconds.


Live Demo ยท Features ยท Architecture ยท Quick Start ยท Screenshots


๐ŸŒ† What is CivLib?

CivLib is an open-source civic intelligence platform built for Bengaluru (and any city that publishes open government data). It aggregates datasets from official portals, runs automated statistical audits, flags anomalies, and lets any citizen โ€” researcher, journalist, or policymaker โ€” interrogate the data in plain English.

No data science background required. No API keys to manage. Just ask a question.

Built in 48 hours for a civic-tech hackathon. Powered by Groq's LLaMA 3.1, FastAPI, Next.js 15, and a relentless belief that public data should be genuinely public.


โœจ Features

๐Ÿ”ด Live Streaming Data Acquisition

Datasets are never pre-cached into a database. Every audit is a live JIT (Just-in-Time) fetch directly from government portals โ€” CSV, XLSX, or PDF โ€” streamed to the browser with real-time progress indicators.

Connecting to Supabase catalog...          5%
Downloading 2.3 MB of CSV data...         28%
Parsed 91,620 rows ร— 14 columns...        48%
Running anomaly detection...              80%
Detected 3 anomalies across 91,620 rows  100%

๐Ÿง  AI-Powered Audit Terminal

Every dataset gets a GROQ-accelerated Llama-3.1-8b analysis streamed character-by-character into a terminal-style UI. The AI cites actual numbers, names specific outlier entities, and explains what the data means for citizens.

๐Ÿ’ฌ Natural Language Query Engine

Ask questions in plain English. The system uses Groq to generate a pandas expression, executes it safely against the live dataframe, and returns a plain-language explanation:

  • "Which ward has the highest complaint count?"
  • "How many records are above the average budget allocation?"
  • "Show me the top 5 outliers"

๐Ÿ”— Cross-Dataset Correlation Engine

Select any two datasets from the catalog and run an AI-powered correlation analysis. The engine:

  • Fetches and audits both datasets in parallel
  • Identifies shared anomaly entities (locations/departments appearing as outliers in both)
  • Synthesizes a 4โ€“5 sentence policy-grade insight using Llama 3.1

๐Ÿ—บ๏ธ Auto-Detected Geo Map

When a dataset contains latitude/longitude columns (auto-detected via regex, no configuration needed), an interactive Leaflet map renders automatically with:

  • Heat-colored markers (blue โ†’ red) scaled by relative metric value
  • Tooltip with entity name, metric value, and coordinates
  • Auto-fitting bounds for any geography

๐Ÿ”ฌ Surgical Region Slicer

Filter any dataset to a specific Ward, District, Pincode, or any string value โ€” without reloading. The backend re-runs the full statistical analysis on only the matching rows, so anomaly detection is always local to the slice.

๐Ÿ“Š Master Visual Lab

Four chart types โ€” Bar, Line, Area, Pie โ€” rendered via Recharts with intelligent metric selection:

  • Mirrors the backend's run_analytics algorithm: skips ID/coordinate columns
  • Picks the highest-variance numeric column as the primary metric
  • Anomalous entities render as red bars
  • Y-axis auto-formats (22k, 4.5M) for readability

๐Ÿ“„ Shareable Audit Reports

One-click report generation saves stats, AI analysis, anomaly flags, and NL query history to Supabase and returns a public shareable URL. Falls back to a downloadable JSON if the backend is unavailable.

๐Ÿ” Semantic Dataset Search

The search engine expands queries with a synonym graph (accident โ†’ fatal, rto, traffic, motor), scores datasets by title/tags/headers/description relevance, and returns ranked results with confidence percentages.


๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                        FRONTEND                             โ”‚
โ”‚  Next.js 15 (App Router) ยท TypeScript ยท Tailwind CSS        โ”‚
โ”‚  Recharts ยท Framer Motion ยท React-Leaflet                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                     โ”‚  EventSource (SSE streaming)
                     โ”‚  REST (POST /api/*)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                        BACKEND                              โ”‚
โ”‚  FastAPI ยท Python ยท Pandas ยท NumPy                          โ”‚
โ”‚  Groq SDK (Llama-3.1-8b-instant)                            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚                          โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Supabase      โ”‚        โ”‚   Live Open Government APIs      โ”‚
โ”‚   (Catalog DB   โ”‚        โ”‚   data.gov.in  ยท  catalog.data   โ”‚
โ”‚    + Reports)   โ”‚        โ”‚   .gov ยท Direct CSV/XLSX/PDF     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Data flow for a single dataset audit:

  1. Browser opens an EventSource to /api/jit-stream/{id}
  2. Backend fetches the raw file from the government portal URL stored in Supabase
  3. Pandas parses and cleans the dataframe (handles encoding issues, unstructured regional data)
  4. run_analytics() finds the highest-variance useful numeric column, runs 2ฯƒ outlier detection
  5. Results stream to the browser as SSE events with progress percentages
  6. On completion, the full payload is sent as the final done event
  7. React triggers Groq AI analysis as a separate POST call, streamed to the terminal

๐Ÿ“ธ Screenshots

๐Ÿ  Home โ€” Dataset Discovery

Semantic search across Bengaluru's civic data catalog. Type in natural language, filter by department.

Home Screen


๐Ÿ“Š Audit Dashboard โ€” Live Data Intelligence

Real-time streaming audit of 91,620 BBMP grievance records. Cross-dataset correlation active.

Audit Dashboard


๐Ÿง  AI Inference Terminal

Llama-3.1-8b streams a 4-sentence analysis citing actual statistics and naming outlier entities.

AI Terminal


๐Ÿ”— Cross-Dataset Correlation Engine

Two datasets loaded simultaneously. Shared anomaly entities flagged in red.

Correlation Engine


๐Ÿ—บ๏ธ Geo Location Map

Auto-detected lat/lng columns rendered as a heat-colored interactive map. Zero configuration.

Geo Map


โšก Quick Start

Prerequisites

1. Clone

git clone https://github.com/DILIP-SHEESH/bap.git
cd bap

2. Backend Setup

cd backend
pip install -r requirements.txt

# Create .env
cat > .env << EOF
GROQ_API_KEY=your_groq_key_here
SUPABASE_URL=your_supabase_url
SUPABASE_KEY=your_supabase_anon_key
EOF

# Run
uvicorn app.main:app --reload --port 8000

3. Frontend Setup

cd frontend
npm install

# Create .env.local
echo "NEXT_PUBLIC_API_URL=http://127.0.0.1:8000" > .env.local

# Run
npm run dev

4. Seed the Catalog

# Fetch live datasets from data.gov.in and catalog.data.gov
curl -X POST "http://localhost:8000/api/seed-all"

Open http://localhost:3000 ๐Ÿš€


๐Ÿ—„๏ธ Supabase Schema

Run this SQL in your Supabase dashboard:

-- Dataset catalog
create table data_catalog (
  id          bigserial primary key,
  title       text not null,
  description text,
  source_url  text,
  direct_csv_link text,
  tags        text[],
  column_headers text[]
);

-- Shareable audit reports
create table public_reports (
  id              text primary key,
  dataset_title   text,
  stats           jsonb,
  flags           jsonb,
  ai_analysis     text,
  chart_data      jsonb,
  nl_queries      jsonb,
  created_at      timestamptz default now()
);

๐Ÿ”Œ API Reference

Method Endpoint Description
GET /api/jit-stream/{id} SSE stream โ€” live fetch, clean, analyze
GET /api/jit-stream/{id}?region=Whitefield Same with region filter applied
POST /api/ai-analyze Groq LLM analysis of stats + anomalies
POST /api/nl-query Natural language โ†’ pandas โ†’ answer
POST /api/correlate Cross-dataset AI correlation
POST /api/search Semantic dataset search
POST /api/save-report Persist audit report to Supabase
GET /api/get-report/{id} Retrieve a saved report
POST /api/seed Fetch datasets from CKAN by keyword
POST /api/seed-all Multi-domain live aggregation
GET /health Engine status

๐Ÿงฎ How Anomaly Detection Works

The engine avoids naive ID-column detection through strict regex filtering, then selects the most statistically meaningful metric:

# 1. Filter out noise columns (IDs, coordinates, phone numbers, etc.)
skip_regex = re.compile(
  r'\b(id|sl|no|sr|sno|pin|code|year|phone|mobile|lat|lng|latitude|longitude|index)\b',
  re.IGNORECASE
)
useful = [c for c in numeric_cols if not skip_regex.search(str(c))]

# 2. Pick highest-variance column (most meaningful signal)
best_col = max(useful, key=lambda c: series(c).var())

# 3. Flag outliers beyond 2 standard deviations
threshold = avg + (2.0 * std_dev)
anomalies = df[df[best_col] > threshold]

This ensures the chart and analysis are always about real civic metrics (complaint counts, budget allocations, incident rates) โ€” never complaint IDs or row numbers.


๐Ÿ› ๏ธ Tech Stack

Layer Technology
Frontend Framework Next.js 15 (App Router, Turbopack)
UI Language TypeScript
Styling Tailwind CSS
Charts Recharts
Maps React-Leaflet + OpenStreetMap
Animations Framer Motion
Backend Framework FastAPI
Data Processing Pandas, NumPy
AI Inference Groq Cloud (Llama-3.1-8b-instant)
Database Supabase (PostgreSQL)
Streaming Server-Sent Events (SSE)
File Parsing CSV ยท XLSX (openpyxl/xlrd) ยท PDF (pdfplumber)
Deployment Vercel (frontend) ยท Render/Railway (backend)

๐Ÿ“‚ Project Structure

bap/
โ”œโ”€โ”€ frontend/
โ”‚   โ”œโ”€โ”€ app/
โ”‚   โ”‚   โ”œโ”€โ”€ page.tsx                  # Home โ€” dataset search & catalog
โ”‚   โ”‚   โ”œโ”€โ”€ dataset/[id]/
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ page.tsx              # Audit dashboard (main experience)
โ”‚   โ”‚   โ””โ”€โ”€ correlation/
โ”‚   โ”‚       โ””โ”€โ”€ page.tsx              # Standalone correlation engine
โ”‚   โ”œโ”€โ”€ components/
โ”‚   โ”‚   โ””โ”€โ”€ StopsMap.tsx              # Leaflet geo map component
โ”‚   โ””โ”€โ”€ public/
โ”‚       โ””โ”€โ”€ screenshots/              # App screenshots for README
โ”‚
โ””โ”€โ”€ backend/
    โ””โ”€โ”€ app/
        โ”œโ”€โ”€ main.py                   # All FastAPI endpoints
        โ””โ”€โ”€ database.py               # Supabase client init

๐Ÿค Contributing

Pull requests are welcome. For major changes, please open an issue first.

# Fork โ†’ Clone โ†’ Branch โ†’ PR
git checkout -b feature/your-feature-name
git commit -m "feat: add your feature"
git push origin feature/your-feature-name

Team

Commit-Men ( BUILD FOR BENGALURU - 2026 ) CivLib - making government accountable

About

Live Link:

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors