Scout - Medical Data Intelligence Platform

A distributed data analytics platform for intelligent, intuitive exploration of clinical data with a focus on radiology reports. Scout processes large volumes of HL7 messages into a Delta Lake, making them accessible through interactive analytics and notebooks.

Quick Links

Documentation: https://washu-scout.readthedocs.io/en/latest/
Issue Tracker: https://xnat.atlassian.net/jira/software/projects/SCOUT/summary
AI Assistant Guide: See CLAUDE.md for comprehensive codebase documentation

Key Features

Analytics: Apache Superset for no-code visualizations and SQL queries
Notebooks: JupyterHub with PySpark for programmatic data analysis
Data Lake: Delta Lake on MinIO with Hive Metastore catalog
Query Engine: Trino for distributed SQL queries
Orchestration: Temporal workflows for HL7 ingestion pipeline
Monitoring: Grafana dashboards for observability
Deployment: Automated Ansible deployment on Kubernetes (K3s)

Architecture

HL7 Logs → Temporal → Extractor → Bronze (MinIO) → Transformer → Silver (Delta Lake)
                                                                        ↓
                                                              Trino ← Superset/JupyterHub

Getting Started

Deploy Scout

From the ansible/ directory:

# 1. Configure your environment
cp inventory.example.yaml inventory.yaml
# Edit inventory.yaml with your hosts, paths, and secrets

# 2. Deploy full platform
make all

# Or deploy individual components
make install-k3s          # Kubernetes cluster
make install-lake         # MinIO + Hive
make install-trino        # OPA + Trino
make install-superset     # Superset
make install-orchestrator # Temporal workflow engine
make install-jupyter      # JupyterHub notebooks
make install-monitor      # Grafana monitoring

See ansible/README.md for detailed deployment documentation.

Ingest HL7 Reports

kubectl exec -n temporal -i deployment/temporal-admintools -- temporal workflow start \
  --task-queue ingest-hl7-log \
  --type IngestHl7LogWorkflow \
  --input '{"logsRootPath": "/data/hl7", "reportTableName": "reports"}'

Query Data

In Superset SQL Lab:

SELECT * FROM delta.default.reports WHERE modality = 'CT' LIMIT 100;

In JupyterHub:

df = spark.read.table("delta.default.reports")
df.filter(df.modality == "MRI").show()

Project Structure

ansible/: Deployment automation (Ansible playbooks and roles)
docs/: User and developer documentation
explorer/: Web-based landing page (React/TypeScript)
extractor/: HL7 processing services (Python/TypeScript)
orchestrator/: Temporal workflows (TypeScript)
helm/: Helm chart configurations
tests/: Integration and unit tests

Documentation

For comprehensive information, visit our documentation:

Quickstart: Getting started with Scout Analytics and Notebooks
Data Schema: HL7 field mappings and table structure
Services: Architecture and component overview
Ingestion: HL7 report processing workflows
Tips & Tricks: Usage tips for Grafana, Superset, and JupyterHub

License

See LICENSE file for license information.

Name		Name	Last commit message	Last commit date
Latest commit History 398 Commits
.github		.github
analytics/notebooks		analytics/notebooks
ansible		ansible
docker/xnat-plugin-installer		docker/xnat-plugin-installer
docs		docs
extractor		extractor
helm		helm
keycloak		keycloak
launchpad		launchpad
orchestrator		orchestrator
policy/trino		policy/trino
sdk/python		sdk/python
terraform		terraform
tests		tests
verify-node		verify-node
zap		zap
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
.semgrepignore		.semgrepignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
eslint.config.cjs		eslint.config.cjs
intellij_tag_java.xml		intellij_tag_java.xml
linting_java_checkstyle.xml		linting_java_checkstyle.xml
readthedocs.yaml		readthedocs.yaml
renovate.json5		renovate.json5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scout - Medical Data Intelligence Platform

Quick Links

Key Features

Architecture

Getting Started

Deploy Scout

Ingest HL7 Reports

Query Data

Project Structure

Documentation

License

About

Uh oh!

Releases 6

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Scout - Medical Data Intelligence Platform

Quick Links

Key Features

Architecture

Getting Started

Deploy Scout

Ingest HL7 Reports

Query Data

Project Structure

Documentation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages