Skip to content

baranzeyn/Lexly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lexly 📄⚖️

AI-Powered Legal Contract Analysis Assistant w/ RAG System

Lexly is an intelligent, scalable web application designed to analyze legal contracts (Rental, Employment, Commercial) using Large Language Models (LLMs). It extracts potential risks, unfair clauses, and power imbalances while grounding its analysis in real Supreme Court precedents using a Retrieval-Augmented Generation (RAG) architecture.

🎯 The Problem & The Solution

Reading contracts is tedious, and relying purely on standard LLMs for legal advice often leads to "hallucinations" (e.g., flagging standard legal clauses as high-risk). Lexly solves this by combining Rule-Based NLP, strict Prompt Engineering, and a Vector Database (RAG) to ensure high Data Accuracy and cost-effective API usage.

🇹🇷 Current MVP Scope (Turkish Law Optimized): While the backend architecture is designed to support multiple global jurisdictions, the current Prompt Engineering rules and the RAG Vector Database are highly optimized specifically for the Turkish Legal System (Türk Borçlar Kanunu, İş Kanunu, vs.). The database is populated with real Yargıtay (Supreme Court of Türkiye) precedents, allowing the AI to achieve pinpoint accuracy for Turkish contracts. Other regions (US, EU, UK) are currently available as Proof-of-Concept (PoC) integrations.

🚀 Key Features (Engineering Highlights)

  • 🧠 Retrieval-Augmented Generation (RAG): Integrates ChromaDB and HuggingFace's all-MiniLM-L6-v2 to perform semantic searches on uploaded contracts. It automatically retrieves relevant Yargıtay precedents to ground the AI's analysis in real legal context.
  • ⚙️ Smart Pre-filtering & Token Optimization: Instead of sending massive raw PDFs to the LLM, Lexly uses a custom rule-based NLP module (Regex/Keyword matching) to extract only "risky" paragraphs. This reduces token consumption and API costs by up to 50%, ensuring faster response times.
  • 🛡️ Data Privacy (PII Masking): Automatically redacts sensitive information (TC Identity Numbers, IBANs, Phone Numbers) using Regex before the data ever leaves the server.
  • ⚖️ Edge-Case Handling & Data Accuracy: Utilizes advanced prompt engineering to prevent LLM hallucinations. For example, it successfully distinguishes between a "Standard Subletting Ban" (Low Risk) and a "Waiver of Legal Notice for Damages" (High Risk) within the same paragraph.
  • 📄 Document Parsing: Robust extraction of text from .pdf and .docx files using PyMuPDF and python-docx (Similar to web scraping/crawling pipelines).

🛠️ Tech Stack

  • Backend: Python 3.13, FastAPI, Pydantic (Data Validation)
  • AI Integration: Google Gemini API (gemini-2.5-flash-lite for rate-limit resilience)
  • Vector Database & Embeddings: ChromaDB, sentence-transformers
  • Data Processing: Regex, PyMuPDF (fitz)
  • Frontend: Vanilla JavaScript, HTML5, CSS3 (No heavy frameworks, highly optimized)

📁 Modular Project Structure

The project follows Separation of Concerns (SoC) principles for high scalability:

lexly/
├── app/
│   ├── api/            # API Route definitions
│   ├── models/         # Pydantic schemas (Request/Response validation)
│   ├── prompts/        # System prompts and region-based configurations
│   ├── services/       # Core business logic (AI, PDF Parsing, RAG operations)
│   └── utils/          # Helper functions (Text cleaning, Hash generation)
├── data/               # Static datasets (Precedent JSONs)
├── chroma_db/          # Persistent Vector Database (Ignored in Git)
├── main.py             # FastAPI application entry point
└── index.html          # Client-side interface

About

AI-powered LegalTech assistant built with FastAPI and Gemini LLM. Features a RAG system (ChromaDB) for Supreme Court precedents and rule-based NLP for token optimization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors