Skip to content

rushichavda/claim-triage-system

Repository files navigation

πŸ₯ Claim Triage & Resolution Agentic System

Production-grade multi-agent system for healthcare claim denial triage and automated appeal generation

Python 3.11+ LangGraph License: MIT


πŸ“‹ Overview

🎯 Status: ALL ASSIGNMENT DELIVERABLES COMPLETE

This system implements a production-grade multi-agent orchestration workflow that:

  • βœ… Ingests claim denial PDFs with byte-level offset tracking
  • βœ… Extracts structured data with confidence scoring
  • βœ… Retrieves relevant policies using OpenAI text-embedding-3-small (1536-dim embeddings)
  • βœ… Reasons over policies to decide: Appeal | NoAppeal | Escalate
  • βœ… Generates appeals with verifiable citations (hallucination prevention)
  • βœ… Provides human-in-the-loop review interface
  • βœ… Executes with guarded permissions and full audit trail

Assignment Deliverables:

  • βœ… System Dossier (2 pages) - Complete architecture, agent taxonomy, threat model
  • βœ… Monitoring & Postmortem Playbook (1 page) - KPIs, alerts, incident runbooks
  • βœ… Model Card & Documentation (1 page) - Capabilities, limitations, ethical considerations
  • βœ… Business Case & 90-Day Rollout Plan (1 page) - ROI model, KPI improvements, milestones
  • βœ… Prototype Repo - Runnable Docker stack with 6 specialized agents
  • βœ… 20+ Test Cases - 10 synthetic + 10 adversarial with gold labels
  • βœ… CI/CD Harness - Regression suite with hallucination gating
  • βœ… Zero-Hallucination Enforcement - <2% rate via Citation Verifier

🎯 Key Features

  • Zero-Hallucination Tolerance: Every claim must link to verifiable source (doc ID + byte offsets)
  • HIPAA-Ready: Encryption-at-rest, tokenized PHI in logs, redaction mechanics
  • Production-Grade: Stateful agents, CI/CD gating, adversarial robustness testing
  • Observable: Full audit trail, LangSmith tracing, Prometheus metrics

πŸ—οΈ Architecture

Multi-Agent System

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Ingest Service β”‚ ─── PDF Parser (byte-level offsets)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Extractor Agent β”‚ ─── LLM + Instructor (structured output)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Retriever Agent β”‚ ─── OpenAI Embeddings + ChromaDB
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Policy Reasoner     β”‚ ─── LLM-based reasoning
β”‚ (Appeal/NoAppeal/   β”‚
β”‚  Escalate)          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Appeal Drafter      β”‚ ─── Generates appeals with citations
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Citation Verifier   β”‚ ─── Semantic similarity check
β”‚ (Hallucination Det.)β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Human Review UI     β”‚ ─── Streamlit (Approve/Reject/Modify)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Execution Adapter   β”‚ ─── Writeback with permissions
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Tech Stack

Component Technology
Orchestration LangGraph (stateful multi-agent workflow)
LLM OpenAI GPT-4o (reasoning, extraction, drafting)
Embeddings OpenAI text-embedding-3-small (1536 dimensions)
Vector Store ChromaDB (embedded, no separate service)
Database PostgreSQL + pgvector (ACID audit logs)
Caching Redis (state management, rate limiting)
API FastAPI (async, type-safe)
UI Streamlit (human review interface)
Observability LangSmith + Prometheus + Grafana
Security Fernet encryption, Presidio (PHI redaction)

πŸš€ Quick Start

Choose your setup based on your needs:

  • Development Mode - Lightweight, no Docker, for testing and development
  • Production Mode - Full stack with Docker, monitoring, and all services

πŸ“¦ Development Mode (Recommended for Testing)

Perfect for: Local development, testing, debugging

What's included: Core agents, ChromaDB (embedded), no Docker needed

Step 1: Prerequisites

# Required
- Python 3.11+
- OpenAI API key

Step 2: Setup Environment

# Clone and navigate to directory
cd claim-triage-system

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install dependencies
pip install -e .

# Configure environment
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY

Step 3: Generate Test Data

# Set your API key
export OPENAI_API_KEY="sk-proj-your-key-here"

# Generate 25 test files (5 policies + 20 test cases)
python scripts/generate_data_simple.py

Output: data/policy_docs/ and data/test_cases/ populated with test files

Step 4: Index Policies

# Index policy documents into ChromaDB (embedded mode)
python scripts/index_policies_openai.py

Output: ChromaDB vector store created at data/vector_store/

Step 5: Run Tests

# Run unit tests
pytest tests/unit/ -v

# Run regression suite (validates all 20 test cases)
python scripts/run_regression_suite.py

Expected: Hallucination rate <2%, Evidence coverage >85%

Step 6: Test Single Claim

# Process a single claim denial
python scripts/test_single_claim.py

That's it! You now have a working development environment.

What you DON'T need in dev mode:

  • ❌ Docker / Docker Compose
  • ❌ PostgreSQL database
  • ❌ Redis cache
  • ❌ Streamlit UI
  • ❌ Prometheus / Grafana monitoring

🐳 Production Mode (Full Stack)

Perfect for: Production deployment, demos, full feature testing

What's included: All services, monitoring, UI, databases

Step 1: Prerequisites

# Required
- Python 3.11+
- Docker & Docker Compose
- OpenAI API key
- 8GB RAM minimum (16GB recommended)

Step 2: Setup Environment

# Clone and navigate to directory
cd claim-triage-system

# Configure environment
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY

Step 3: Generate Test Data

# Set your API key
export OPENAI_API_KEY="sk-proj-your-key-here"

# Generate test data
python scripts/generate_data_simple.py

Step 4: Index Policies

# Create virtual environment (for indexing script)
python -m venv .venv
source .venv/bin/activate
pip install -e .

# Index policies
python scripts/index_policies_openai.py

Step 5: Start Docker Stack

# Start all services
docker-compose up -d

# Check services are running
docker-compose ps

# View logs
docker-compose logs -f app

Services started:

  • βœ… FastAPI (port 8000) - REST API
  • βœ… PostgreSQL (port 5432) - Database
  • βœ… ChromaDB (port 8001) - Vector store
  • βœ… Redis (port 6379) - Cache
  • βœ… Streamlit (port 8501) - UI
  • βœ… Prometheus (port 9090) - Metrics
  • βœ… Grafana (port 3000) - Dashboards

Step 6: Access Services

Step 7: Run Demo

# Process sample claims through full workflow
./run_demo.sh --docker

That's it! Full production stack is running.


πŸ§ͺ Quick Validation

After setup, verify everything works:

# Health check
curl http://localhost:8000/health

# Process a test claim
curl -X POST http://localhost:8000/api/v1/claims/process \
  -F "file=@data/test_cases/synthetic/denial_001_duplicate.pdf"

# Check metrics
curl http://localhost:9090/metrics | grep hallucination_rate

πŸ“š Additional Resources


πŸ§ͺ Testing

Test Structure

tests/
β”œβ”€β”€ unit/              # Unit tests for individual agents
β”œβ”€β”€ integration/       # Integration tests (multi-agent flows)
β”œβ”€β”€ adversarial/       # Red-team adversarial test cases
└── regression/        # Regression harness (20 testcases)

Run Tests

# All tests
make test

# Unit tests only
make test-unit

# Integration tests
make test-integration

# Adversarial tests
make test-adversarial

CI Gating

# CI policy: Block merge if:
- hallucination_rate > 2%
- evidence_coverage < 85%
- test_pass_rate < 100%

πŸ“Š Data Schemas

Core Models

  • Claim & ClaimDenial: Structured claim data with PHI fields (encrypted)
  • Citation & CitationSpan: Byte-level source tracking
  • AuditEvent & AuditLog: Immutable audit trail
  • Decision & DecisionRationale: Policy reasoning output
  • Appeal & AppealDraft: Generated appeals with citations

Example: Citation with Byte Offsets

{
  "citation_id": "cit-123",
  "claim_text": "Policy Section 4.2.1 states...",
  "source_span": {
    "document_id": "doc-456",
    "start_byte": 1234,
    "end_byte": 1450,
    "extracted_text": "Emergency services exception...",
    "page_number": 3,
    "extraction_confidence": 0.95
  },
  "verified": true,
  "verification_score": 0.92
}

πŸ” Security & Compliance

HIPAA Controls

  • βœ… Encryption-at-Rest: Fernet (AES-256) for PHI fields
  • βœ… Tokenized Logging: PHI replaced with deterministic tokens
  • βœ… Redaction: Presidio-based PII/PHI detection
  • βœ… Least Privilege: Guarded execution permissions (READ_ONLY | WRITE_APPEALS | ADMIN)
  • βœ… Audit Trail: Immutable append-only logs with full lineage

Threat Model

Risk Severity Mitigation Detection
Hallucination CRITICAL Citation verification, semantic similarity Audit events, CI gating
PHI Leak CRITICAL Encryption, tokenization, redaction Structured logging, alerts
Prompt Injection HIGH Input validation, sandboxing Adversarial tests
Data Poisoning HIGH Document hash verification Content integrity checks

πŸ“ˆ Monitoring & Observability

Key Metrics (Prometheus)

# Hallucination rate (CRITICAL)
hallucination_rate = failed_citations / total_citations

# Evidence coverage
evidence_coverage = verified_citations / total_claims

# False accept rate
false_accept_rate = incorrect_approvals / total_decisions

# Cost per case
cost_per_case = total_tokens * token_cost

Alerts

  • 🚨 Hallucination rate > 2%: Block deployments, escalate to humans
  • ⚠️ Evidence coverage < 85%: Review appeal quality
  • ⚠️ Avg latency > 30s: Scale infrastructure

πŸ› οΈ Development

Project Structure

claim-triage-system/
β”œβ”€β”€ services/
β”‚   β”œβ”€β”€ agents/           # All agent implementations
β”‚   β”‚   β”œβ”€β”€ extractor/
β”‚   β”‚   β”œβ”€β”€ retriever/
β”‚   β”‚   β”œβ”€β”€ policy_reasoner/
β”‚   β”‚   β”œβ”€β”€ citation_verifier/
β”‚   β”‚   β”œβ”€β”€ appeal_drafter/
β”‚   β”‚   └── executor/
β”‚   β”œβ”€β”€ ingest/           # PDF parsing
β”‚   β”œβ”€β”€ orchestrator/     # LangGraph workflow
β”‚   β”œβ”€β”€ human_review/     # Streamlit UI
β”‚   └── shared/           # Schemas, utilities
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ policy_docs/      # Policy document store
β”‚   β”œβ”€β”€ test_cases/       # 20 test cases (10 synthetic + 10 adversarial)
β”‚   └── vector_store/     # ChromaDB persistence
β”œβ”€β”€ tests/
β”œβ”€β”€ docs/
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ pyproject.toml
└── README.md

Add a New Agent

  1. Create directory: services/agents/your_agent/
  2. Implement: your_agent_agent.py with clear API contract
  3. Add to workflow: Update services/orchestrator/workflow.py
  4. Write tests: tests/unit/test_your_agent.py
  5. Update docs

πŸ“– Documentation

Core Documentation (Assignment Deliverables)

  1. System Dossier (2 pages): docs/SYSTEM_DOSSIER.md

    • Complete architecture diagrams and service maps
    • Agent taxonomy with detailed specifications
    • Data contracts and schemas
    • Threat model & risk management framework
    • HIPAA compliance controls
  2. Monitoring & Postmortem Playbook (1 page): docs/MONITORING_PLAYBOOK.md

    • Key metrics (KPIs) and alert thresholds
    • Canary deployment & rollback procedures
    • Incident runbook for hallucination events
    • Grafana dashboard configuration
  3. Model Card & Documentation (1 page): docs/MODEL_CARD.md

    • Training/prompt provenance
    • Model capabilities and limitations
    • Recommended usage and required human checks
    • Ethical considerations and bias mitigation
  4. Business Case & 90-Day Rollout Plan (1 page): docs/BUSINESS_CASE.md

    • Expected ROI model (638% 3-year ROI)
    • Measurable KPI improvements
    • Detailed 90-day rollout milestones
    • Risk mitigation and staffing changes

Additional Documentation


🀝 Contributing

# Setup pre-commit hooks
pre-commit install

# Run full CI pipeline locally
make all

# Format code before commit
make format

πŸ“œ License

MIT License - see LICENSE


πŸ™ Acknowledgments

  • LangGraph: Stateful multi-agent orchestration
  • OpenAI: GPT-4o for reasoning and text-embedding-3-small for embeddings

πŸ“ž Support


Built with ❀️ for healthcare compliance automation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages