Law-Aware AI Chatbot

A sophisticated Agentic RAG (Retrieval-Augmented Generation) system tailored for Indian legal research, capable of understanding complex legal queries, synthesizing case law, and providing actionable legal advice.

Features

Intelligent Legal Research: Automated scraping of judgments from eCourts portal
Multi-Modal Data Processing: Handles both online judgments and offline legal PDFs
Advanced RAG Pipeline: Combines semantic search (ChromaDB) with knowledge graphs (Neo4j)
Agentic Reasoning: Uses LangChain/LangGraph for complex query orchestration
Legal Entity Recognition: Extracts judges, acts, precedents, and case relationships
CAPTCHA Handling: Integrated EasyOCR for automated form solving

Tech Stack

Automation & Scraping: Python, Selenium, undetected-chromedriver, EasyOCR
Document Processing: PyMuPDF, pdfplumber, LangChain text splitters, spaCy NLP
Databases: ChromaDB (Vector store), Neo4j (Knowledge Graph)
AI & Orchestration: LangChain/LangGraph, Groq/OpenAI/Anthropic/Gemini APIs
UI: Streamlit/Gradio chatbot interface, Flask web app with Web3 integration

Installation

Clone the repository

git clone <repository-url>
cd law-aware-ai

Install dependencies
```
uv sync
```

Set up environment variables

cp .env.example .env
# Edit .env with your API keys and database credentials

Install additional NLP models (optional)
```
python -m spacy download en_core_web_sm
```

Usage

Command Line Interface

# Scrape judgments from eCourts
python main.py scrape --case-number "CRL.A. 123/2023" --court "Delhi High Court" --max-pages 5

# Process PDF documents
python main.py process --pdf-file data/pdfs/sample.pdf
python main.py process --pdf-dir data/pdfs/

# Store processed documents in vector database
python main.py store --processed-dir data/processed/
python main.py store --json-file data/processed/sample.json

# Populate and query knowledge graph
python main.py graph --populate --processed-dir data/processed/
python main.py graph --query "cases by judge Justice Singh"

# Run intelligent legal research agent
python main.py agent --query "Explain Section 302 IPC interpretation in murder cases"
python main.py agent --query "Compare death penalty approaches in Indian courts" --verbose

# Start web interface with Mars chatbot
python main.py web
python main.py web --host 127.0.0.1 --port 8000 --debug

Demo Scripts

# Phase 1-2 Demo: Data processing and vector storage
python demo_phase2.py

# Phase 3 Demo: Knowledge graph construction
python demo_phase3.py

# Phase 4 Demo: Agentic RAG pipeline
python demo_phase4.py

Project Phases

✅ Phase 1: Infrastructure & Setup (Complete)

Project structure with modular src/ layout
uv-based dependency management
Pydantic settings with environment variables
CLI interface with subcommands

✅ Phase 2: Data Processing & Vector Storage (Complete)

eCourts web scraping with CAPTCHA handling
PDF text extraction and entity recognition
Legal-aware text chunking for semantic search
ChromaDB vector storage with metadata

✅ Phase 3: Knowledge Graph Construction (Complete)

Neo4j graph database integration
Legal relationship modeling (cases, judges, acts, precedents)
Graph population pipeline from processed documents
Natural language graph querying

✅ Phase 4: Agentic RAG Pipeline (Complete)

Intelligent query routing (vector/graph/hybrid/complex)
LangGraph orchestration for multi-step reasoning
Legal entity extraction and context enrichment
Comprehensive answer synthesis from multiple sources
Confidence scoring and reasoning transparency

📋 Phase 5: User Interface & Deployment

Streamlit/Gradio chatbot interface
Conversation memory and context management
Production deployment with monitoring

Scrape judgments from eCourts

python main.py scrape --case-number "CRL.A. 123/2023" --court "Delhi High Court"

Process PDF documents

python main.py process --pdf-dir data/judgments/


### Development

```bash
# Install dev dependencies
uv sync --dev

# Run tests
pytest

# Format code
black src/

Project Structure

law-aware-ai/
├── config/
│   ├── settings.py          # Configuration management
│   └── __init__.py
├── data/
│   ├── judgments/           # Raw judgment PDFs
│   ├── processed/           # Processed text data
│   ├── vector_db/           # ChromaDB storage
│   └── graph_db/            # Neo4j data
├── src/
│   ├── data_processing/
│   │   ├── ecourts_scraper.py    # eCourts portal scraper
│   │   └── pdf_processor.py      # PDF text extraction
│   ├── vector_store/        # ChromaDB integration
│   ├── graph_store/         # Neo4j knowledge graph
│   ├── agent/               # LangChain orchestration
│   └── ui/                  # Streamlit/Gradio interfaces
├── main.py                  # CLI entry point
├── pyproject.toml           # Project configuration
├── .env.example             # Environment template
└── README.md

Architecture

Data Flow

Data Acquisition: Scrape eCourts portal and process offline PDFs
Text Processing: Extract and clean legal text, identify sections
Entity Extraction: Use NLP to identify legal entities and relationships
Vector Storage: Chunk and embed text in ChromaDB for semantic search
Knowledge Graph: Store relationships in Neo4j for structural queries
Agent Orchestration: Route queries to appropriate data sources
Response Synthesis: Generate comprehensive legal answers

Agentic Router

Factual Queries: "What is the judgment in case X?" → ChromaDB
Relational Queries: "How does case X relate to case Y?" → Neo4j
Complex Queries: Multi-source reasoning with LLM synthesis

Development Roadmap

Phase 1: Data acquisition and processing ✅
Phase 2: Vector storage and semantic search
Phase 3: Knowledge graph construction
Phase 4: Agentic RAG implementation
Phase 5: UI development and deployment

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer

This system is designed for legal research assistance and should not be considered a substitute for professional legal advice. Always consult qualified legal professionals for specific legal matters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Law-Aware AI Chatbot

Features

Tech Stack

Installation

Usage

Command Line Interface

Demo Scripts

Project Phases

✅ Phase 1: Infrastructure & Setup (Complete)

✅ Phase 2: Data Processing & Vector Storage (Complete)

✅ Phase 3: Knowledge Graph Construction (Complete)

✅ Phase 4: Agentic RAG Pipeline (Complete)

📋 Phase 5: User Interface & Deployment

Scrape judgments from eCourts

Process PDF documents

Project Structure

Architecture

Data Flow

Agentic Router

Development Roadmap

Contributing

License

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
data/pdfs		data/pdfs
src		src
tests		tests
web		web
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
demo_phase2.py		demo_phase2.py
demo_phase3.py		demo_phase3.py
demo_phase4.py		demo_phase4.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test_api_key.py		test_api_key.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Law-Aware AI Chatbot

Features

Tech Stack

Installation

Usage

Command Line Interface

Demo Scripts

Project Phases

✅ Phase 1: Infrastructure & Setup (Complete)

✅ Phase 2: Data Processing & Vector Storage (Complete)

✅ Phase 3: Knowledge Graph Construction (Complete)

✅ Phase 4: Agentic RAG Pipeline (Complete)

📋 Phase 5: User Interface & Deployment

Scrape judgments from eCourts

Process PDF documents

Project Structure

Architecture

Data Flow

Agentic Router

Development Roadmap

Contributing

License

Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages