Production-grade multi-agent system for healthcare claim denial triage and automated appeal generation
π― Status: ALL ASSIGNMENT DELIVERABLES COMPLETE
This system implements a production-grade multi-agent orchestration workflow that:
- β Ingests claim denial PDFs with byte-level offset tracking
- β Extracts structured data with confidence scoring
- β Retrieves relevant policies using OpenAI text-embedding-3-small (1536-dim embeddings)
- β Reasons over policies to decide: Appeal | NoAppeal | Escalate
- β Generates appeals with verifiable citations (hallucination prevention)
- β Provides human-in-the-loop review interface
- β Executes with guarded permissions and full audit trail
Assignment Deliverables:
- β System Dossier (2 pages) - Complete architecture, agent taxonomy, threat model
- β Monitoring & Postmortem Playbook (1 page) - KPIs, alerts, incident runbooks
- β Model Card & Documentation (1 page) - Capabilities, limitations, ethical considerations
- β Business Case & 90-Day Rollout Plan (1 page) - ROI model, KPI improvements, milestones
- β Prototype Repo - Runnable Docker stack with 6 specialized agents
- β 20+ Test Cases - 10 synthetic + 10 adversarial with gold labels
- β CI/CD Harness - Regression suite with hallucination gating
- β Zero-Hallucination Enforcement - <2% rate via Citation Verifier
- Zero-Hallucination Tolerance: Every claim must link to verifiable source (doc ID + byte offsets)
- HIPAA-Ready: Encryption-at-rest, tokenized PHI in logs, redaction mechanics
- Production-Grade: Stateful agents, CI/CD gating, adversarial robustness testing
- Observable: Full audit trail, LangSmith tracing, Prometheus metrics
βββββββββββββββββββ
β Ingest Service β βββ PDF Parser (byte-level offsets)
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Extractor Agent β βββ LLM + Instructor (structured output)
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Retriever Agent β βββ OpenAI Embeddings + ChromaDB
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββββββ
β Policy Reasoner β βββ LLM-based reasoning
β (Appeal/NoAppeal/ β
β Escalate) β
ββββββββββ¬βββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Appeal Drafter β βββ Generates appeals with citations
ββββββββββ¬βββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Citation Verifier β βββ Semantic similarity check
β (Hallucination Det.)β
ββββββββββ¬βββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Human Review UI β βββ Streamlit (Approve/Reject/Modify)
ββββββββββ¬βββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Execution Adapter β βββ Writeback with permissions
βββββββββββββββββββββββ
| Component | Technology |
|---|---|
| Orchestration | LangGraph (stateful multi-agent workflow) |
| LLM | OpenAI GPT-4o (reasoning, extraction, drafting) |
| Embeddings | OpenAI text-embedding-3-small (1536 dimensions) |
| Vector Store | ChromaDB (embedded, no separate service) |
| Database | PostgreSQL + pgvector (ACID audit logs) |
| Caching | Redis (state management, rate limiting) |
| API | FastAPI (async, type-safe) |
| UI | Streamlit (human review interface) |
| Observability | LangSmith + Prometheus + Grafana |
| Security | Fernet encryption, Presidio (PHI redaction) |
Choose your setup based on your needs:
- Development Mode - Lightweight, no Docker, for testing and development
- Production Mode - Full stack with Docker, monitoring, and all services
Perfect for: Local development, testing, debugging
What's included: Core agents, ChromaDB (embedded), no Docker needed
# Required
- Python 3.11+
- OpenAI API key# Clone and navigate to directory
cd claim-triage-system
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install dependencies
pip install -e .
# Configure environment
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY# Set your API key
export OPENAI_API_KEY="sk-proj-your-key-here"
# Generate 25 test files (5 policies + 20 test cases)
python scripts/generate_data_simple.pyOutput: data/policy_docs/ and data/test_cases/ populated with test files
# Index policy documents into ChromaDB (embedded mode)
python scripts/index_policies_openai.pyOutput: ChromaDB vector store created at data/vector_store/
# Run unit tests
pytest tests/unit/ -v
# Run regression suite (validates all 20 test cases)
python scripts/run_regression_suite.pyExpected: Hallucination rate <2%, Evidence coverage >85%
# Process a single claim denial
python scripts/test_single_claim.pyThat's it! You now have a working development environment.
What you DON'T need in dev mode:
- β Docker / Docker Compose
- β PostgreSQL database
- β Redis cache
- β Streamlit UI
- β Prometheus / Grafana monitoring
Perfect for: Production deployment, demos, full feature testing
What's included: All services, monitoring, UI, databases
# Required
- Python 3.11+
- Docker & Docker Compose
- OpenAI API key
- 8GB RAM minimum (16GB recommended)# Clone and navigate to directory
cd claim-triage-system
# Configure environment
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY# Set your API key
export OPENAI_API_KEY="sk-proj-your-key-here"
# Generate test data
python scripts/generate_data_simple.py# Create virtual environment (for indexing script)
python -m venv .venv
source .venv/bin/activate
pip install -e .
# Index policies
python scripts/index_policies_openai.py# Start all services
docker-compose up -d
# Check services are running
docker-compose ps
# View logs
docker-compose logs -f appServices started:
- β FastAPI (port 8000) - REST API
- β PostgreSQL (port 5432) - Database
- β ChromaDB (port 8001) - Vector store
- β Redis (port 6379) - Cache
- β Streamlit (port 8501) - UI
- β Prometheus (port 9090) - Metrics
- β Grafana (port 3000) - Dashboards
- API Documentation: http://localhost:8000/docs
- Human Review UI: http://localhost:8501
- Metrics Dashboard: http://localhost:3000 (login: admin/admin)
- Prometheus: http://localhost:9090
# Process sample claims through full workflow
./run_demo.sh --dockerThat's it! Full production stack is running.
After setup, verify everything works:
# Health check
curl http://localhost:8000/health
# Process a test claim
curl -X POST http://localhost:8000/api/v1/claims/process \
-F "file=@data/test_cases/synthetic/denial_001_duplicate.pdf"
# Check metrics
curl http://localhost:9090/metrics | grep hallucination_rate- System Architecture:
docs/SYSTEM_DOSSIER.md - Citation System Deep-Dive:
docs/CITATION_DEEP_DIVE.md - Monitoring & Alerts:
docs/MONITORING_PLAYBOOK.md - Model Card:
docs/MODEL_CARD.md - Business Case:
docs/BUSINESS_CASE.md - Embedding Usage:
docs/EMBEDDING_USAGE.md
tests/
βββ unit/ # Unit tests for individual agents
βββ integration/ # Integration tests (multi-agent flows)
βββ adversarial/ # Red-team adversarial test cases
βββ regression/ # Regression harness (20 testcases)
# All tests
make test
# Unit tests only
make test-unit
# Integration tests
make test-integration
# Adversarial tests
make test-adversarial# CI policy: Block merge if:
- hallucination_rate > 2%
- evidence_coverage < 85%
- test_pass_rate < 100%- Claim & ClaimDenial: Structured claim data with PHI fields (encrypted)
- Citation & CitationSpan: Byte-level source tracking
- AuditEvent & AuditLog: Immutable audit trail
- Decision & DecisionRationale: Policy reasoning output
- Appeal & AppealDraft: Generated appeals with citations
{
"citation_id": "cit-123",
"claim_text": "Policy Section 4.2.1 states...",
"source_span": {
"document_id": "doc-456",
"start_byte": 1234,
"end_byte": 1450,
"extracted_text": "Emergency services exception...",
"page_number": 3,
"extraction_confidence": 0.95
},
"verified": true,
"verification_score": 0.92
}- β Encryption-at-Rest: Fernet (AES-256) for PHI fields
- β Tokenized Logging: PHI replaced with deterministic tokens
- β Redaction: Presidio-based PII/PHI detection
- β Least Privilege: Guarded execution permissions (READ_ONLY | WRITE_APPEALS | ADMIN)
- β Audit Trail: Immutable append-only logs with full lineage
| Risk | Severity | Mitigation | Detection |
|---|---|---|---|
| Hallucination | CRITICAL | Citation verification, semantic similarity | Audit events, CI gating |
| PHI Leak | CRITICAL | Encryption, tokenization, redaction | Structured logging, alerts |
| Prompt Injection | HIGH | Input validation, sandboxing | Adversarial tests |
| Data Poisoning | HIGH | Document hash verification | Content integrity checks |
# Hallucination rate (CRITICAL)
hallucination_rate = failed_citations / total_citations
# Evidence coverage
evidence_coverage = verified_citations / total_claims
# False accept rate
false_accept_rate = incorrect_approvals / total_decisions
# Cost per case
cost_per_case = total_tokens * token_cost- π¨ Hallucination rate > 2%: Block deployments, escalate to humans
β οΈ Evidence coverage < 85%: Review appeal qualityβ οΈ Avg latency > 30s: Scale infrastructure
claim-triage-system/
βββ services/
β βββ agents/ # All agent implementations
β β βββ extractor/
β β βββ retriever/
β β βββ policy_reasoner/
β β βββ citation_verifier/
β β βββ appeal_drafter/
β β βββ executor/
β βββ ingest/ # PDF parsing
β βββ orchestrator/ # LangGraph workflow
β βββ human_review/ # Streamlit UI
β βββ shared/ # Schemas, utilities
βββ data/
β βββ policy_docs/ # Policy document store
β βββ test_cases/ # 20 test cases (10 synthetic + 10 adversarial)
β βββ vector_store/ # ChromaDB persistence
βββ tests/
βββ docs/
βββ docker-compose.yml
βββ pyproject.toml
βββ README.md
- Create directory:
services/agents/your_agent/ - Implement:
your_agent_agent.pywith clear API contract - Add to workflow: Update
services/orchestrator/workflow.py - Write tests:
tests/unit/test_your_agent.py - Update docs
-
System Dossier (2 pages):
docs/SYSTEM_DOSSIER.md- Complete architecture diagrams and service maps
- Agent taxonomy with detailed specifications
- Data contracts and schemas
- Threat model & risk management framework
- HIPAA compliance controls
-
Monitoring & Postmortem Playbook (1 page):
docs/MONITORING_PLAYBOOK.md- Key metrics (KPIs) and alert thresholds
- Canary deployment & rollback procedures
- Incident runbook for hallucination events
- Grafana dashboard configuration
-
Model Card & Documentation (1 page):
docs/MODEL_CARD.md- Training/prompt provenance
- Model capabilities and limitations
- Recommended usage and required human checks
- Ethical considerations and bias mitigation
-
Business Case & 90-Day Rollout Plan (1 page):
docs/BUSINESS_CASE.md- Expected ROI model (638% 3-year ROI)
- Measurable KPI improvements
- Detailed 90-day rollout milestones
- Risk mitigation and staffing changes
- Setup Comparison:
SETUP_COMPARISON.md- Development vs Production mode comparison - Project Completion Summary:
PROJECT_COMPLETION_SUMMARY.md - Implementation Status:
docs/IMPLEMENTATION_STATUS.md - Model Configuration:
docs/MODEL_CONFIGURATION.md - Running Guide:
docs/RUNNING_GUIDE.md
# Setup pre-commit hooks
pre-commit install
# Run full CI pipeline locally
make all
# Format code before commit
make formatMIT License - see LICENSE
- LangGraph: Stateful multi-agent orchestration
- OpenAI: GPT-4o for reasoning and text-embedding-3-small for embeddings
- Issues: https://github.com/your-org/claim-triage-system/issues
- Docs: https://docs.your-org.com/claim-triage
Built with β€οΈ for healthcare compliance automation