Skip to content

drci-foch/OSIRIS-RWD_mapping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OSIRIS-RWD Mapping

Pipeline to extract cancer patient data from Hopital Foch databases and map them to the OSIRIS-RWD format.

Project structure

run_pipeline.py          # Pipeline entry point
config.py                # Paths, constants
db.py                    # Database connection helpers (EDS, Axia, CHIMIO)
utils.py                 # Shared utilities (I/O, type conversion)
steps/
  s01_create_cohort.py   # Step 1: cancer cohort from EDS (PostgreSQL)
  s02_patient_admin.py   # Step 2: patient demographics from Axia (Oracle)
  s03_primary_cancer.py  # Step 3: primary cancer data from CHIMIO (Oracle)
  s04_medication.py      # Step 4: medication data from CHIMIO (Oracle)
.env                     # Credentials (gitignored)
.env.example             # Template for .env
data/                    # Output files (gitignored)

Overview

Step Module Source Output
1 steps/s01_create_cohort.py EDS (PostgreSQL) data/cancer_ipp.csv
2 steps/s02_patient_admin.py Axia (Oracle) data/osiris_rwd_export.json
3 steps/s03_primary_cancer.py CHIMIO (Oracle) enriches JSON with primaryCancer, cancerOrder, tnmEvent, patient measures
4 steps/s04_medication.py CHIMIO (Oracle) enriches JSON with medication (ATC codes, drug names, dates)

Getting started

Prerequisites

  • Python 3.10+
  • Access to EDS V2 (PostgreSQL), Axia (Oracle) and CHIMIO (Oracle) databases
  • Oracle Client 12c installed (required for Axia thick mode connection)

Installation

python -m venv venv
venv\Scripts\activate
pip install psycopg2 oracledb python-dotenv

Configuration

Copy .env.example to .env and fill in your database credentials:

cp .env.example .env

Warning: .env is in .gitignore and must never be committed.

Usage

# Run the full pipeline
python run_pipeline.py

# Run a single step
python run_pipeline.py --step 1    # cohort only
python run_pipeline.py --step 2    # patient admin only (uses existing cohort)
python run_pipeline.py --step 3    # primary cancer only (uses existing JSON)
python run_pipeline.py --step 4    # medication only (uses existing JSON)

Adding a new step

  1. Create steps/s05_my_step.py with a run(patients, ...) function
  2. Add the call in run_pipeline.py
  3. Done

Documentation

See documentation.md for the full technical documentation: data sources, variable mappings, ICD-10 criteria, pseudonymization, and JSON structure.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages