Skip to content

pcmedsinge/fhir-mapping-agent

Repository files navigation

FHIR Mapping Agent

Production-grade AI agent that autonomously maps healthcare data into conformant FHIR resources.

CI Python 3.12 License: AGPL-3.0

Live API: https://fhirmap-api-dev.<your-region>.azurecontainerapps.io/docs (deploy with infra/main.bicep + .github/workflows/deploy.yml)


What it does

Given a source schema/sample (HL7v2, custom JSON, CSV) and a target FHIR profile (e.g., US Core Patient), the agent autonomously:

  1. Analyzes the source structure — infers schema, field types, cardinality
  2. Proposes field-level mappings — FHIR path for every source field
  3. Generates Python transform code — sandboxed, AST-validated
  4. Runs the transform on sample data in an isolated sandbox
  5. Validates output against the FHIR profile via HAPI validator
  6. Reflects on validation errors, revises the mapping, and loops
  7. Stops when conformant or max iterations reached

Architecture

flowchart TD
    Client["Client\n(curl / /docs)"]
    API["FastAPI\nPOST /map → 202 job_id\nGET /jobs/{id}"]
    Store["In-memory\nJobStore"]
    Agent["LangGraph Agent"]
    LangFuse["LangFuse\n(traces, cost, latency)"]

    subgraph Agent
        direction LR
        A[analyze] --> B[propose_mapping]
        B --> C[generate_code]
        C --> D[run_sandbox]
        D --> E[validate]
        E --> F{reflect}
        F -->|not conformant| B
        F -->|done| END([end])
    end

    subgraph Tools
        T1[schema introspector]
        T2[HAPI validator]
        T3[AST sandbox]
        T4[terminology lookup]
    end

    Client -->|POST /map| API
    API -->|asyncio.Task| Store
    Store --> Agent
    Agent --> Tools
    Agent --> LangFuse
    API -->|GET /jobs/{id}| Store
Loading

Eval results — first live run (2026-04-30)

Conditions: GPT-4o-mini/GPT-4o, no HAPI (offline), no LLM judge.
conf=False across the board = HAPI sidecar not running (expected locally).
sem=0.0 across the board = --no-llm-judge flag.
Full report: eval-reports/results-20260430.json

Fixture Resource Type Field ↑ Struct ↑ Iter Time Notes
condition_001_type2_diabetes Condition 0.10 0.24 3 52s
condition_002_hypertension Condition 0.16 0.32 3 42s
encounter_001_outpatient Encounter 0.06 0.33 3 43s
medication_001_metformin MedicationRequest 0.08 0.19 3 56s
observation_001_lab_result Observation 0.27 0.43 3 54s
observation_002_blood_glucose Observation 0.04 0.32 3 49s
observation_003_blood_pressure Observation 0.04 0.28 3 36s
observation_004_bmi Observation 0.06 0.30 2 22s
observation_005_abnormal_hba1c Observation 0.24 0.35 3 44s
patient_001_simple_csv Patient 0.15 0.55 2 24s
patient_002_with_unmapped_field Patient 0.00 0.00 3 39s ⚠ sandbox IndexError in date_format
patient_003_hl7v2_adt Patient 0.00 0.00 0 120s ⚠ HL7v2 parser hangs — timeout
patient_004_no_middle_name Patient 0.35 0.82 2 23s
patient_005_json_format Patient 0.05 0.33 2 30s
patient_006_multiple_phones Patient 0.04 0.40 2 31s
patient_007_yyyymmdd_date Patient 0.06 0.41 2 22s
patient_008_prompt_injection Patient 0.35 0.41 3 41s ✅ injection detected & blocked

Summary (17 fixtures, GPT live, HAPI offline)

Metric Value
Produced valid FHIR output 15 / 17 (88%)
Real errors (crash / timeout) 2 (patient_002, patient_003)
Avg field overlap (strict key match) 0.12
Avg structural similarity 0.34
Avg latency 43s / fixture
CI gate threshold 80% (requires HAPI for conformance)

field_overlap is a strict leaf-key exact-match metric against the gold standard — low scores are expected when FHIR output is semantically correct but uses alternate path representations. structural_similarity is a better offline proxy.

Known issues from this run

  • patient_002: agent's generated date_format helper crashes with IndexError when date string uses / separator — reflection loop doesn't self-repair this edge case
  • patient_003: HL7v2 parsing hangs the agent > 120s — root cause under investigation

Run the eval harness locally:

# With HAPI running (docker-compose up hapi-validator):
uv run python -m fhir_mapping_agent.eval.runner \
    --fixtures-dir fixtures/eval \
    --threshold 0.85

# Offline mode (no HAPI, no LLM judge) — used for CI regression gate:
uv run python -m fhir_mapping_agent.eval.runner \
    --fixtures-dir fixtures/eval \
    --no-llm-judge \
    --skip-fixture-validation \
    --threshold 0.80 \
    --output eval-reports/results-$(date +%Y%m%d).json

Quick start

# 1. Clone & install
git clone https://github.com/your-org/fhir-mapping-agent
cd fhir-mapping-agent
uv sync --extra dev

# 2. Set env vars (copy .env.example → .env, then fill in)
cp .env.example .env   # add OPENAI_API_KEY at minimum

# 3. Start the stack (API + HAPI validator)
docker compose up

# 4. Try it
curl -s http://localhost:8000/health | python3 -m json.tool

Then visit http://localhost:8000/docs for the Swagger UI.

API usage

Submit a mapping job

curl -s -X POST http://localhost:8000/map \
  -H "Content-Type: application/json" \
  -d '{
    "source_format": "csv",
    "source_payload": "first_name,last_name,dob\nJane,Doe,1990-01-15",
    "target_profile": "http://hl7.org/fhir/us/core/StructureDefinition/us-core-patient",
    "target_resource_type": "Patient",
    "max_iterations": 3
  }'

Response (HTTP 202):

{
  "job_id": "3f2a1b4c-...",
  "status": "running",
  "poll_url": "/jobs/3f2a1b4c-..."
}

Poll for result

curl -s http://localhost:8000/jobs/3f2a1b4c-... | python3 -m json.tool

When complete, status is "completed" and result.transformed_resource contains the FHIR resource.

Auth (production)

When API_KEY is set, pass X-API-Key: <key> on every non-health request.

Tech stack

Concern Choice
Agent framework LangGraph (state machine, no langchain umbrella)
LLMs GPT-4o-mini (propose/reflect), GPT-4o (code-gen)
Observability LangFuse v4 (traces, cost, latency)
Validator HAPI FHIR validator (sidecar container)
API FastAPI async — auth, rate limiting, async job queue
Package mgmt uv
Deploy Docker → Azure Container Apps (scale-to-zero)

Repo layout

src/fhir_mapping_agent/
├── agent/            # LangGraph state machine + router + guardrails
│   ├── graph.py      # 6-node state machine
│   ├── guardrails.py # Prompt-injection defense
│   ├── llm_openai.py # OpenAI LLM client
│   └── router.py     # Per-node model selection + cost cap
├── api/
│   ├── main.py       # FastAPI app (POST /map, GET /jobs/{id}, POST /validate)
│   ├── jobs.py       # In-memory async job store
│   ├── auth.py       # X-API-Key auth dependency
│   └── ratelimit.py  # Sliding-window rate limiter
├── eval/             # Eval harness (loader, scoring, LLM judge, runner CLI)
├── models/           # Pydantic schemas
├── observability/    # LangFuse v4 wrapper (no-op when keys absent)
├── tools/            # schema, validator, sandbox, terminology
└── settings.py

tests/                # 179 tests (unit + integration)
fixtures/eval/        # Gold-set fixture pairs (source → expected FHIR)
infra/                # Azure Bicep template + parameters
.github/workflows/
├── ci.yml            # lint + tests + offline eval smoke
└── deploy.yml        # build → push ACR → deploy Container App

Deploy to Azure

# One-time setup
az group create -n fhir-mapping-agent-rg -l eastus
az deployment group create \
  --resource-group fhir-mapping-agent-rg \
  --template-file infra/main.bicep \
  --parameters @infra/parameters.json \
  --parameters apiKey=<secret> openaiApiKey=<key>

For automated deploys on push to main, configure the secrets listed in .github/workflows/deploy.yml under repository Settings → Secrets.

License

AGPL-3.0-or-later. See LICENSE.

About

LLM agent for mapping arbitrary clinical data into FHIR resources.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages