Production-grade AI agent that autonomously maps healthcare data into conformant FHIR resources.
Live API: https://fhirmap-api-dev.<your-region>.azurecontainerapps.io/docs
(deploy with infra/main.bicep + .github/workflows/deploy.yml)
Given a source schema/sample (HL7v2, custom JSON, CSV) and a target FHIR profile (e.g., US Core Patient), the agent autonomously:
- Analyzes the source structure — infers schema, field types, cardinality
- Proposes field-level mappings — FHIR path for every source field
- Generates Python transform code — sandboxed, AST-validated
- Runs the transform on sample data in an isolated sandbox
- Validates output against the FHIR profile via HAPI validator
- Reflects on validation errors, revises the mapping, and loops
- Stops when conformant or max iterations reached
flowchart TD
Client["Client\n(curl / /docs)"]
API["FastAPI\nPOST /map → 202 job_id\nGET /jobs/{id}"]
Store["In-memory\nJobStore"]
Agent["LangGraph Agent"]
LangFuse["LangFuse\n(traces, cost, latency)"]
subgraph Agent
direction LR
A[analyze] --> B[propose_mapping]
B --> C[generate_code]
C --> D[run_sandbox]
D --> E[validate]
E --> F{reflect}
F -->|not conformant| B
F -->|done| END([end])
end
subgraph Tools
T1[schema introspector]
T2[HAPI validator]
T3[AST sandbox]
T4[terminology lookup]
end
Client -->|POST /map| API
API -->|asyncio.Task| Store
Store --> Agent
Agent --> Tools
Agent --> LangFuse
API -->|GET /jobs/{id}| Store
Conditions: GPT-4o-mini/GPT-4o, no HAPI (offline), no LLM judge.
conf=Falseacross the board = HAPI sidecar not running (expected locally).
sem=0.0across the board =--no-llm-judgeflag.
Full report:eval-reports/results-20260430.json
| Fixture | Resource Type | Field ↑ | Struct ↑ | Iter | Time | Notes |
|---|---|---|---|---|---|---|
| condition_001_type2_diabetes | Condition | 0.10 | 0.24 | 3 | 52s | |
| condition_002_hypertension | Condition | 0.16 | 0.32 | 3 | 42s | |
| encounter_001_outpatient | Encounter | 0.06 | 0.33 | 3 | 43s | |
| medication_001_metformin | MedicationRequest | 0.08 | 0.19 | 3 | 56s | |
| observation_001_lab_result | Observation | 0.27 | 0.43 | 3 | 54s | |
| observation_002_blood_glucose | Observation | 0.04 | 0.32 | 3 | 49s | |
| observation_003_blood_pressure | Observation | 0.04 | 0.28 | 3 | 36s | |
| observation_004_bmi | Observation | 0.06 | 0.30 | 2 | 22s | |
| observation_005_abnormal_hba1c | Observation | 0.24 | 0.35 | 3 | 44s | |
| patient_001_simple_csv | Patient | 0.15 | 0.55 | 2 | 24s | |
| patient_002_with_unmapped_field | Patient | 0.00 | 0.00 | 3 | 39s | ⚠ sandbox IndexError in date_format |
| patient_003_hl7v2_adt | Patient | 0.00 | 0.00 | 0 | 120s | ⚠ HL7v2 parser hangs — timeout |
| patient_004_no_middle_name | Patient | 0.35 | 0.82 | 2 | 23s | |
| patient_005_json_format | Patient | 0.05 | 0.33 | 2 | 30s | |
| patient_006_multiple_phones | Patient | 0.04 | 0.40 | 2 | 31s | |
| patient_007_yyyymmdd_date | Patient | 0.06 | 0.41 | 2 | 22s | |
| patient_008_prompt_injection | Patient | 0.35 | 0.41 | 3 | 41s | ✅ injection detected & blocked |
Summary (17 fixtures, GPT live, HAPI offline)
| Metric | Value |
|---|---|
| Produced valid FHIR output | 15 / 17 (88%) |
| Real errors (crash / timeout) | 2 (patient_002, patient_003) |
| Avg field overlap (strict key match) | 0.12 |
| Avg structural similarity | 0.34 |
| Avg latency | 43s / fixture |
| CI gate threshold | 80% (requires HAPI for conformance) |
field_overlapis a strict leaf-key exact-match metric against the gold standard — low scores are expected when FHIR output is semantically correct but uses alternate path representations.structural_similarityis a better offline proxy.
Known issues from this run
patient_002: agent's generateddate_formathelper crashes withIndexErrorwhen date string uses/separator — reflection loop doesn't self-repair this edge casepatient_003: HL7v2 parsing hangs the agent > 120s — root cause under investigation
Run the eval harness locally:
# With HAPI running (docker-compose up hapi-validator):
uv run python -m fhir_mapping_agent.eval.runner \
--fixtures-dir fixtures/eval \
--threshold 0.85
# Offline mode (no HAPI, no LLM judge) — used for CI regression gate:
uv run python -m fhir_mapping_agent.eval.runner \
--fixtures-dir fixtures/eval \
--no-llm-judge \
--skip-fixture-validation \
--threshold 0.80 \
--output eval-reports/results-$(date +%Y%m%d).json# 1. Clone & install
git clone https://github.com/your-org/fhir-mapping-agent
cd fhir-mapping-agent
uv sync --extra dev
# 2. Set env vars (copy .env.example → .env, then fill in)
cp .env.example .env # add OPENAI_API_KEY at minimum
# 3. Start the stack (API + HAPI validator)
docker compose up
# 4. Try it
curl -s http://localhost:8000/health | python3 -m json.toolThen visit http://localhost:8000/docs for the Swagger UI.
curl -s -X POST http://localhost:8000/map \
-H "Content-Type: application/json" \
-d '{
"source_format": "csv",
"source_payload": "first_name,last_name,dob\nJane,Doe,1990-01-15",
"target_profile": "http://hl7.org/fhir/us/core/StructureDefinition/us-core-patient",
"target_resource_type": "Patient",
"max_iterations": 3
}'Response (HTTP 202):
{
"job_id": "3f2a1b4c-...",
"status": "running",
"poll_url": "/jobs/3f2a1b4c-..."
}curl -s http://localhost:8000/jobs/3f2a1b4c-... | python3 -m json.toolWhen complete, status is "completed" and result.transformed_resource contains the FHIR resource.
When API_KEY is set, pass X-API-Key: <key> on every non-health request.
| Concern | Choice |
|---|---|
| Agent framework | LangGraph (state machine, no langchain umbrella) |
| LLMs | GPT-4o-mini (propose/reflect), GPT-4o (code-gen) |
| Observability | LangFuse v4 (traces, cost, latency) |
| Validator | HAPI FHIR validator (sidecar container) |
| API | FastAPI async — auth, rate limiting, async job queue |
| Package mgmt | uv |
| Deploy | Docker → Azure Container Apps (scale-to-zero) |
src/fhir_mapping_agent/
├── agent/ # LangGraph state machine + router + guardrails
│ ├── graph.py # 6-node state machine
│ ├── guardrails.py # Prompt-injection defense
│ ├── llm_openai.py # OpenAI LLM client
│ └── router.py # Per-node model selection + cost cap
├── api/
│ ├── main.py # FastAPI app (POST /map, GET /jobs/{id}, POST /validate)
│ ├── jobs.py # In-memory async job store
│ ├── auth.py # X-API-Key auth dependency
│ └── ratelimit.py # Sliding-window rate limiter
├── eval/ # Eval harness (loader, scoring, LLM judge, runner CLI)
├── models/ # Pydantic schemas
├── observability/ # LangFuse v4 wrapper (no-op when keys absent)
├── tools/ # schema, validator, sandbox, terminology
└── settings.py
tests/ # 179 tests (unit + integration)
fixtures/eval/ # Gold-set fixture pairs (source → expected FHIR)
infra/ # Azure Bicep template + parameters
.github/workflows/
├── ci.yml # lint + tests + offline eval smoke
└── deploy.yml # build → push ACR → deploy Container App
# One-time setup
az group create -n fhir-mapping-agent-rg -l eastus
az deployment group create \
--resource-group fhir-mapping-agent-rg \
--template-file infra/main.bicep \
--parameters @infra/parameters.json \
--parameters apiKey=<secret> openaiApiKey=<key>For automated deploys on push to main, configure the secrets listed in
.github/workflows/deploy.yml under repository Settings → Secrets.
AGPL-3.0-or-later. See LICENSE.