AI-powered GDPR compliance analysis tool. Describe a privacy scenario or your system architecture in plain English — get a grounded, structured report citing specific GDPR articles, recitals, and EDPB guidelines with confidence scores and source links.
Built with RAG (Retrieval-Augmented Generation) over the full GDPR text, BDSG, TTDSG, and EDPB guidance. Every cited article is backed by retrieved knowledge-base chunks, not hallucinated.
Not legal advice. Output is informational only. A qualified professional must interpret the results.
Describe a scenario; get a severity rating, violated articles with confidence scores, actionable recommendations, and honest "retrieval gap" notes when something is relevant but ungrounded.
uv run gdpr-check analyze "A company sends marketing emails to users without getting consent"Describe your system; get an auto-generated data map, risk level, and findings across 10+ compliance areas (legal basis, transparency, international transfers, processor agreements, security, retention, DPIA, data subject rights, and more).
uv run gdpr-check assess "SaaS collecting emails via web form, sends newsletters via Mailchimp, data in PostgreSQL on AWS eu-central-1"Same analysis engines exposed via FastAPI:
uv run gdpr-check serve
# POST /api/v1/analyze/violation
# POST /api/v1/analyze/compliance
# GET /healthuv run gdpr-check stats # aggregated cost, latency, token usage
uv run gdpr-check history # recent analysis runsSimple scenario — marketing emails without consent:
- Severity: HIGH
- Articles flagged: Art. 6 (0.95), Art. 7 (0.93), Art. 21 (0.97), Art. 13 (0.80), Art. 14 (0.72), Art. 17 (0.75)
- Includes 10 actionable recommendations and retrieval gap notes for ePrivacy Directive, Art. 83 fines, Art. 5 principles
Complex scenario — healthcare CRM with genetic data, children's data, AI/automated decisions, 5 third-party processors, multi-region AWS, transfers to US/UK/Japan:
- Severity: CRITICAL
- 15 findings across: special category data, consent validity, automated decision-making, DPIA, transparency, retention, international transfers (split by US processors vs research universities), processor contracts, security, data protection by design, children's data, data subject rights, breach notification readiness, ROPA
Minimal scenario — offline calculator app with no data collection:
- Severity: LOW
- All areas compliant, with a scope verification note to audit for inadvertent SDK telemetry
| Layer | Technology |
|---|---|
| LLM | Anthropic Claude API (Claude 4 Sonnet) |
| Vector store | ChromaDB with sentence-transformers embeddings |
| Retrieval | Dense + BM25 hybrid search |
| API | FastAPI |
| CLI | Typer + Rich |
| Database | SQLite (analysis persistence + query telemetry) |
| Language | Python 3.11+ |
Five-stage pipeline with validation:
- Extract — Structured entities from natural language (actors, data types, processing activities, jurisdiction, special categories)
- Classify — Topic tags (consent, transfers, security, children, AI/automated decisions, etc.) to steer retrieval
- Retrieve — Hybrid dense + BM25 search over ChromaDB; topic-aware routing pulls from main GDPR collection plus specialized v2 collections (DPIA, RoPA, TOM, consent guidance, EDPB guidelines)
- Reason — LLM generates structured JSON report grounded only in retrieved chunks
- Validate — Second pass removes or corrects citations not provable from retrieved context
- Intake — Free text or JSON DataMap → normalized DataMap (LLM parses prose into structured data categories, flows, third parties, storage)
- Map — Hybrid retrieval across main + v2 collections based on classified topics and data map signals
- Assess — LLM produces ComplianceAssessment with findings per compliance area, relevant articles, remediation, and technical guidance
- API / Persistence — FastAPI routes; projects, analyses, and documents stored in SQLite
Prerequisites: Python 3.11+, uv, Anthropic API key
-
Clone and install
git clone https://github.com/prathameshpatil7/gdpr-ai.git cd gdpr-ai uv sync -
Configure environment
cp .env.example .env # Set ANTHROPIC_API_KEY and optional paths (CHROMA_PATH, LOG_DB_PATH, etc.) -
Build the knowledge base (one-time, no API calls)
uv run python scripts/scrape_gdpr.py uv run python scripts/scrape_bdsg.py uv run python scripts/scrape_ttdsg.py uv run python scripts/translate_sources.py uv run python scripts/chunk_and_embed.py
-
Run an analysis
uv run gdpr-check analyze "Your scenario here" uv run gdpr-check assess "Your system description here"
The web UI provides a visual interface for both analysis modes.
Start the backend and frontend:
# Terminal 1: Backend
uv run gdpr-check serve
# Terminal 2: Frontend
cd frontend
npm install
npm run devOpen http://localhost:5173
- Analyze — run violation analysis or compliance assessment with live results
- History — browse, filter, and search past analyses
- Dashboard — usage stats, cost tracking, severity distribution charts
- Settings — theme toggle, connection status, about info
Start the server:
uv run gdpr-check serveViolation analysis:
curl -X POST http://localhost:8000/api/v1/analyze/violation \
-H "Content-Type: application/json" \
-d '{"scenario": "A German hospital accidentally emails patient test results to the wrong patient."}'Compliance assessment:
curl -X POST http://localhost:8000/api/v1/analyze/compliance \
-H "Content-Type: application/json" \
-d '{"system_description": "Mobile fitness app tracking location and heart rate, stored on AWS eu-central-1, anonymized analytics shared with US research partner."}'Gold-standard test scenarios with automated scoring:
uv run python tests/run_eval.py --scenarios SC-V-001,SC-C-001
uv run python tests/run_eval.py --mode violation_analysis --dry-runMetrics:
- Article recall — % of expected GDPR articles found
- Article precision — % of flagged articles that are expected (recitals excluded from penalty)
- Finding coverage — % of expected compliance areas addressed
- Law recall — % of expected legal instruments cited
Filters: --mode, --scenarios, --difficulty, --category
Regression detection: --check-baseline warns on >5pp drops, exits 1 on >10pp drops.
Gold set: 30 violation scenarios (SC-V-*) + 20 compliance scenarios (SC-C-*) in gold/test_scenarios.yaml
| Source | Usage | License |
|---|---|---|
| EU GDPR (consolidated) | Articles + recitals, chunked and embedded | EU law (public) |
| BDSG / TTDSG | Scraped from gesetze-im-internet.de, translated at index time | German public law |
| EDPB guidelines | Chunked guidance (breach notification, transfers, consent) | EDPB reuse policy |
| gdpr-info.eu | Fallback HTML source | Unofficial consolidation |
Every chunk carries source, source_url, and license metadata for traceability.
| Operation | Approximate cost |
|---|---|
| Single violation analysis | €0.02–0.08 |
| Single compliance assessment | €0.08–0.17 |
| Complex compliance (healthcare CRM) | ~€0.17 |
| Offline calculator (minimal) | ~€0.03 |
- Not legal advice — requires professional interpretation
- English runtime — German legal sources translated at index time
- Indexed law only — if an article isn't in the knowledge base, you'll see "retrieval gap" notes instead of hallucinated citations
- ePrivacy gaps — cookie/electronic marketing scenarios may be incomplete unless TTDSG chunks cover the pattern
- Latency — full runs are typically 20–190s depending on mode and complexity (multi-stage LLM pipeline)
| Release | Focus |
|---|---|
| v1 | Violation analysis CLI |
| v2 | Compliance assessment, local REST API, eval framework, SQLite persistence (current) |
| v3 | Web UI (React dashboard), auth, rate limits, feedback, PDF export |
| v4 | Near-100% accuracy architecture (deterministic retrieval, verification, confidence), gap tracker, German-first multilingual retrieval, document upload, website scanning, ToS/privacy, optional commercial path |
Details: docs/README.md and docs/phase-0-overview/03-target-users.md.
Copyright © 2026 Prathamesh Patil. All rights reserved.
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
You may view, fork, and modify the code for personal and educational purposes. However:
- Any use of this software in a networked service (SaaS, API, web app) requires you to release your complete source code under AGPL-3.0
- Commercial use without open-sourcing your derivative work is prohibited under this license
- For commercial licensing (closed-source use, enterprise deployment, white-labeling): contact prathamesh.p9594@gmail.com
See LICENSE for the full license text.
Retain all source, source_url, and license fields when exporting or redistributing chunks. Third-party datasets with specific license constraints (e.g., CC BY-NC-SA) must keep their original attribution.