A distributed data analytics platform for intelligent, intuitive exploration of clinical data with a focus on radiology reports. Scout processes large volumes of HL7 messages into a Delta Lake, making them accessible through interactive analytics and notebooks.
- Documentation: https://washu-scout.readthedocs.io/en/latest/
- Issue Tracker: https://xnat.atlassian.net/jira/software/projects/SCOUT/summary
- AI Assistant Guide: See CLAUDE.md for comprehensive codebase documentation
- Analytics: Apache Superset for no-code visualizations and SQL queries
- Notebooks: JupyterHub with PySpark for programmatic data analysis
- Data Lake: Delta Lake on MinIO with Hive Metastore catalog
- Query Engine: Trino for distributed SQL queries
- Orchestration: Temporal workflows for HL7 ingestion pipeline
- Monitoring: Grafana dashboards for observability
- Deployment: Automated Ansible deployment on Kubernetes (K3s)
HL7 Logs → Temporal → Extractor → Bronze (MinIO) → Transformer → Silver (Delta Lake)
↓
Trino ← Superset/JupyterHub
From the ansible/ directory:
# 1. Configure your environment
cp inventory.example.yaml inventory.yaml
# Edit inventory.yaml with your hosts, paths, and secrets
# 2. Deploy full platform
make all
# Or deploy individual components
make install-k3s # Kubernetes cluster
make install-lake # MinIO + Hive
make install-trino # OPA + Trino
make install-superset # Superset
make install-orchestrator # Temporal workflow engine
make install-jupyter # JupyterHub notebooks
make install-monitor # Grafana monitoringSee ansible/README.md for detailed deployment documentation.
kubectl exec -n temporal -i deployment/temporal-admintools -- temporal workflow start \
--task-queue ingest-hl7-log \
--type IngestHl7LogWorkflow \
--input '{"logsRootPath": "/data/hl7", "reportTableName": "reports"}'In Superset SQL Lab:
SELECT * FROM delta.default.reports WHERE modality = 'CT' LIMIT 100;In JupyterHub:
df = spark.read.table("delta.default.reports")
df.filter(df.modality == "MRI").show()- ansible/: Deployment automation (Ansible playbooks and roles)
- docs/: User and developer documentation
- explorer/: Web-based landing page (React/TypeScript)
- extractor/: HL7 processing services (Python/TypeScript)
- orchestrator/: Temporal workflows (TypeScript)
- helm/: Helm chart configurations
- tests/: Integration and unit tests
For comprehensive information, visit our documentation:
- Quickstart: Getting started with Scout Analytics and Notebooks
- Data Schema: HL7 field mappings and table structure
- Services: Architecture and component overview
- Ingestion: HL7 report processing workflows
- Tips & Tricks: Usage tips for Grafana, Superset, and JupyterHub
See LICENSE file for license information.