Skip to content

Neerajdec2005/Law-Aware-AI

Repository files navigation

Law-Aware AI Chatbot

A sophisticated Agentic RAG (Retrieval-Augmented Generation) system tailored for Indian legal research, capable of understanding complex legal queries, synthesizing case law, and providing actionable legal advice.

Features

  • Intelligent Legal Research: Automated scraping of judgments from eCourts portal
  • Multi-Modal Data Processing: Handles both online judgments and offline legal PDFs
  • Advanced RAG Pipeline: Combines semantic search (ChromaDB) with knowledge graphs (Neo4j)
  • Agentic Reasoning: Uses LangChain/LangGraph for complex query orchestration
  • Legal Entity Recognition: Extracts judges, acts, precedents, and case relationships
  • CAPTCHA Handling: Integrated EasyOCR for automated form solving

Tech Stack

  • Automation & Scraping: Python, Selenium, undetected-chromedriver, EasyOCR
  • Document Processing: PyMuPDF, pdfplumber, LangChain text splitters, spaCy NLP
  • Databases: ChromaDB (Vector store), Neo4j (Knowledge Graph)
  • AI & Orchestration: LangChain/LangGraph, Groq/OpenAI/Anthropic/Gemini APIs
  • UI: Streamlit/Gradio chatbot interface, Flask web app with Web3 integration

Installation

  1. Clone the repository

    git clone <repository-url>
    cd law-aware-ai
  2. Install dependencies

    uv sync
  3. Set up environment variables

    cp .env.example .env
    # Edit .env with your API keys and database credentials
  4. Install additional NLP models (optional)

    python -m spacy download en_core_web_sm

Usage

Command Line Interface

# Scrape judgments from eCourts
python main.py scrape --case-number "CRL.A. 123/2023" --court "Delhi High Court" --max-pages 5

# Process PDF documents
python main.py process --pdf-file data/pdfs/sample.pdf
python main.py process --pdf-dir data/pdfs/

# Store processed documents in vector database
python main.py store --processed-dir data/processed/
python main.py store --json-file data/processed/sample.json

# Populate and query knowledge graph
python main.py graph --populate --processed-dir data/processed/
python main.py graph --query "cases by judge Justice Singh"

# Run intelligent legal research agent
python main.py agent --query "Explain Section 302 IPC interpretation in murder cases"
python main.py agent --query "Compare death penalty approaches in Indian courts" --verbose

# Start web interface with Mars chatbot
python main.py web
python main.py web --host 127.0.0.1 --port 8000 --debug

Demo Scripts

# Phase 1-2 Demo: Data processing and vector storage
python demo_phase2.py

# Phase 3 Demo: Knowledge graph construction
python demo_phase3.py

# Phase 4 Demo: Agentic RAG pipeline
python demo_phase4.py

Project Phases

✅ Phase 1: Infrastructure & Setup (Complete)

  • Project structure with modular src/ layout
  • uv-based dependency management
  • Pydantic settings with environment variables
  • CLI interface with subcommands

✅ Phase 2: Data Processing & Vector Storage (Complete)

  • eCourts web scraping with CAPTCHA handling
  • PDF text extraction and entity recognition
  • Legal-aware text chunking for semantic search
  • ChromaDB vector storage with metadata

✅ Phase 3: Knowledge Graph Construction (Complete)

  • Neo4j graph database integration
  • Legal relationship modeling (cases, judges, acts, precedents)
  • Graph population pipeline from processed documents
  • Natural language graph querying

✅ Phase 4: Agentic RAG Pipeline (Complete)

  • Intelligent query routing (vector/graph/hybrid/complex)
  • LangGraph orchestration for multi-step reasoning
  • Legal entity extraction and context enrichment
  • Comprehensive answer synthesis from multiple sources
  • Confidence scoring and reasoning transparency

📋 Phase 5: User Interface & Deployment

  • Streamlit/Gradio chatbot interface
  • Conversation memory and context management
  • Production deployment with monitoring

Scrape judgments from eCourts

python main.py scrape --case-number "CRL.A. 123/2023" --court "Delhi High Court"

Process PDF documents

python main.py process --pdf-dir data/judgments/


### Development

```bash
# Install dev dependencies
uv sync --dev

# Run tests
pytest

# Format code
black src/

Project Structure

law-aware-ai/
├── config/
│   ├── settings.py          # Configuration management
│   └── __init__.py
├── data/
│   ├── judgments/           # Raw judgment PDFs
│   ├── processed/           # Processed text data
│   ├── vector_db/           # ChromaDB storage
│   └── graph_db/            # Neo4j data
├── src/
│   ├── data_processing/
│   │   ├── ecourts_scraper.py    # eCourts portal scraper
│   │   └── pdf_processor.py      # PDF text extraction
│   ├── vector_store/        # ChromaDB integration
│   ├── graph_store/         # Neo4j knowledge graph
│   ├── agent/               # LangChain orchestration
│   └── ui/                  # Streamlit/Gradio interfaces
├── main.py                  # CLI entry point
├── pyproject.toml           # Project configuration
├── .env.example             # Environment template
└── README.md

Architecture

Data Flow

  1. Data Acquisition: Scrape eCourts portal and process offline PDFs
  2. Text Processing: Extract and clean legal text, identify sections
  3. Entity Extraction: Use NLP to identify legal entities and relationships
  4. Vector Storage: Chunk and embed text in ChromaDB for semantic search
  5. Knowledge Graph: Store relationships in Neo4j for structural queries
  6. Agent Orchestration: Route queries to appropriate data sources
  7. Response Synthesis: Generate comprehensive legal answers

Agentic Router

  • Factual Queries: "What is the judgment in case X?" → ChromaDB
  • Relational Queries: "How does case X relate to case Y?" → Neo4j
  • Complex Queries: Multi-source reasoning with LLM synthesis

Development Roadmap

  • Phase 1: Data acquisition and processing ✅
  • Phase 2: Vector storage and semantic search
  • Phase 3: Knowledge graph construction
  • Phase 4: Agentic RAG implementation
  • Phase 5: UI development and deployment

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer

This system is designed for legal research assistance and should not be considered a substitute for professional legal advice. Always consult qualified legal professionals for specific legal matters.

About

Guiding citizens through Retrieval Augmented based AI to get law assistance according to IPC sections and articles etc.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors