DataTools4Heart Data Ingestion Suite

This repository includes the mapping definitions to convert hospital specific patient data into the DT4H CDM, which is specified in https://github.com/DataTools4Heart/common-data-model.

The data ingestion suite contains a number of concepts as pieces of the overall mapping process. These concepts are schemas, terminology services, concept maps, mappings and mapping jobs. They are instantiated for each clinical site based on the characteristics of the data at each site.

Schemas

Schemas are used to have well-defined representations for the source healthcare resources such as conditions, medications, lab measurements, etc. These representations are processed by the mappings to extract the actual data from the data source.

Terminology services

Terminology services are used to convert codes from one terminology system to another one e.g. ICD9 to ICD10.

Concept maps

Concept maps contain auxilary information that is needed during the mapping process. Having a tabular format, they can be seen as a key-value mapping such that the values in the first column are keys and the values in the rest of the columns are values. Concept maps are usually used to convert proprietary value sets / codes to their CDM correspondings and unit conversions i.e. converting units used in the original data sources to the units used in the CDM via some conversion functions. Concept maps are referred by mappings via mapping context configurations.

Mappings

Mappings are the actual scripts transforming the original EHR data entities to the CDM, specifically corresponding profiles composing the CDM. Mappings extract information from the original data sources and put the extracted data into the specified fields of the CDM profile. Mappings can utilise terminology services and concept maps to apply code and unit conversions and extract any information needed from concept maps.

Mapping jobs

Mapping jobs are used to configure input and output specifications for mappings: where the data source is located (database or file system) and where the mapped resources will be written to (file system or FHIR server). In addition, each mapping job has one or more mappings to be executed.

toFhir

The toFhir folder contains DT4H-specific utility functions to be injected into the toFhir engine. These functions are used in FHIR Path expressions for mappings extract and map the source data to the CDM as needed.

Deployment Guideline (with Nginx)

Requirements

Git
Docker

Downloading DT4H Mapping Configurations

DT4H mapping configurations are maintained in the project’s GitHub repository. Create and navigate into a working directory to run the tools: <workspaceDir>

git clone https://github.com/DataTools4Heart/data-ingestion-suite.git
git clone https://github.com/DataTools4Heart/common-data-model.git

onFHIR & toFHIR Deployment

Download the password file for SRDC’s private Docker repository: Download password file. Please contact us for access.
Copy the file into <workspaceDir>.
Run the following scripts:

sh ./data-ingestion-suite/docker/server/pull.sh
sh ./data-ingestion-suite/docker/server/run.sh

If needed, give execution access and run the scripts:

chmod +x ./data-ingestion-suite/docker/server/pull.sh
chmod +x ./data-ingestion-suite/docker/server/run.sh

Running Behind Nginx Configuration

This deployment has been tested with Nginx and its use is recommended. To use our predefined Nginx Docker container:

sh ./data-ingestion-suite/docker/proxy/run.sh

If your host machine is already running Nginx, add the following proxy configuration and restart your Nginx:

location /dt4h/tofhir/api {
    proxy_pass http://127.0.0.1:6085/tofhir;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
}

location /dt4h/tofhir {
    proxy_pass http://127.0.0.1:6082/;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
}

location /dt4h/tofhir/kibana/ {
    proxy_pass http://127.0.0.1:6601/;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
}

location /dt4h/onfhir {
   proxy_pass http://127.0.0.1:6080/fhir;
   proxy_set_header Host $host;
   proxy_set_header X-Real-IP $remote_addr;
}

MIMIC Dataset Deployment (Optional)

Only follow this section if you're working with MIMIC-IV v3.1. If you are using a different MIMIC version, the mappings may not work correctly.

MIMIC Dataset Preparation

Ensure your dataset includes a folder hosp/ containing required CSVs like admissions.csv, patients.csv, etc.

Example folder path:

C:/development/data/mimic-iv-3.1/
└── hosp/
    ├── admissions.csv
    ├── emar.csv
    ├── patients.csv
    └── ...

Update Dataset Path

Uncomment line 77 in:

./data-ingestion-suite/docker/server/docker-compose-tofhir.yml

Replace only the part before the colon (:) with your dataset path.

Execute Mappings for MIMIC Dataset

Navigate to http://<hostname>/dt4h/tofhir
Click hosp → Open
Click Executions
Click the green arrow next to the mimic-hosp-csv-to-fhir-server entry
Click the double-right-arrow icon to select all mappings
Click Run
Use the Refresh icon to monitor execution status and check mapping results inside the “x-deploy” job.

View logs and errors at: http://<hostname>/dt4h/tofhir/kibana

In Kibana: click the top-left menu and choose Discover under Analytics

General Execution of Mappings

Navigate to http://<hostname>/dt4h/tofhir
Click your project and click Open
Click Executions
Click the green arrow next to the “x-deploy” entry
Click the double-right-arrow icon to select all mappings
Click Run
Use the Refresh icon to monitor execution status and check mapping results inside the “x-deploy” job.

View logs and errors at: http://<hostname>/dt4h/tofhir/kibana

In Kibana: click the top-left menu and choose Discover under Analytics

Automated Docker Container Update (Optional)

If you’re installing for the first time, you can skip this section. This section is only for updating the existing installation.

1. Stop all running containers

sh ./data-ingestion-suite/docker/server/stop.shh

2. Pull the latest updates

cd common-data-model
git pull
cd ..

cd data-ingestion-suite
git pull
cd ..
sh ./data-ingestion-suite/docker/server/pull.sh

3. Restart all containers

sh ./data-ingestion-suite/docker/server/run.sh
sh ./data-ingestion-suite/docker/proxy/restart.sh  # Optional

Clean Installation from Scratch (Optional)

Use this section to completely remove all containers, volumes, and data, then perform a fresh installation.

1. Stop containers and remove all data

Run the clean-and-stop script to stop all containers and remove associated volumes:

Warning: This will permanently delete all persisted data including FHIR resources and mapping execution history.

sh ./data-ingestion-suite/docker/proxy/stop.sh  # Optional
sh ./data-ingestion-suite/docker/server/clean-and-stop.sh

2. Pull the latest updates

cd common-data-model
git pull
cd ..

cd data-ingestion-suite
git pull
cd ..
sh ./data-ingestion-suite/docker/server/pull.sh

3. Start all containers

After cleaning, follow the original deployment steps to reinstall:

sh ./data-ingestion-suite/docker/server/run.sh
sh ./data-ingestion-suite/docker/proxy/restart.sh  # Optional

For more details, visit toFHIR.io or onfhir.io.

Name		Name	Last commit message	Last commit date
Latest commit History 425 Commits
docker		docker
mapping-contexts		mapping-contexts
mapping-jobs		mapping-jobs
mappings		mappings
schemas		schemas
src/test/scala/eu/datatools4heart/dis		src/test/scala/eu/datatools4heart/dis
terminology-systems		terminology-systems
test-data		test-data
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DataTools4Heart Data Ingestion Suite

Schemas

Terminology services

Concept maps

Mappings

Mapping jobs

toFhir

Deployment Guideline (with Nginx)

Requirements

Downloading DT4H Mapping Configurations

onFHIR & toFHIR Deployment

Running Behind Nginx Configuration

MIMIC Dataset Deployment (Optional)

MIMIC Dataset Preparation

Update Dataset Path

Execute Mappings for MIMIC Dataset

General Execution of Mappings

Automated Docker Container Update (Optional)

1. Stop all running containers

2. Pull the latest updates

3. Restart all containers

Clean Installation from Scratch (Optional)

1. Stop containers and remove all data

2. Pull the latest updates

3. Start all containers

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 7

Uh oh!

Languages

DataTools4Heart/data-ingestion-suite

Folders and files

Latest commit

History

Repository files navigation

DataTools4Heart Data Ingestion Suite

Schemas

Terminology services

Concept maps

Mappings

Mapping jobs

toFhir

Deployment Guideline (with Nginx)

Requirements

Downloading DT4H Mapping Configurations

onFHIR & toFHIR Deployment

Running Behind Nginx Configuration

MIMIC Dataset Deployment (Optional)

MIMIC Dataset Preparation

Update Dataset Path

Execute Mappings for MIMIC Dataset

General Execution of Mappings

Automated Docker Container Update (Optional)

1. Stop all running containers

2. Pull the latest updates

3. Restart all containers

Clean Installation from Scratch (Optional)

1. Stop containers and remove all data

2. Pull the latest updates

3. Start all containers

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Packages