Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions analysis/example.do
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
// stata cannot handle compressed csv files directly, so unzip first to a plain csv file
!gunzip output/input.csv.gz

// now import the uncompressed csv using delimited
import delimited using output/input.csv


// your analysis code goes here


// all dta file outputs should be saved using `gzsave` and a .dta.gz extension
// In subsequent actions, use `gzuse` to load them.
gzsave output/stata.dta.gz
10 changes: 10 additions & 0 deletions analysis/example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import pandas as pd
import pyarrow.feather

df = pd.read_csv("output/input.csv.gz")


# feather files are compressed by default in python
df.to_feather("output/python.feather.lz4")
pyarrow.feather.write_feather(df, "output/python.feather.raw", compression="uncompressed")
pyarrow.feather.write_feather(df, "output/python.feather.zstd", compression="zstd")
7 changes: 7 additions & 0 deletions analysis/example.r
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# read compressed .csv file
df <- readr::read_csv("output/input.csv.gz")

# write a .feather file output
arrow::write_feather(df, "output/r.feather.lz4")
arrow::write_feather(df, "output/r.feather.raw", compression = "uncompressed")
arrow::write_feather(df, "output/r.feather.zstd", compression = "zstd")
21 changes: 21 additions & 0 deletions project.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,24 @@ actions:
outputs:
highly_sensitive:
dataset: output/dataset.csv.gz

python_example:
run: python:latest analysis/example.py
needs: [generate_study_population]
outputs:
highly_sensitive:
cohort: output/python.feather*

stata_example:
run: stata-mp:latest analysis/example.do
needs: [generate_study_population]
outputs:
highly_sensitive:
cohort: output/stata.dta.gz

r_example:
run: r:latest analysis/example.r
needs: [generate_study_population]
outputs:
highly_sensitive:
cohort: output/r.feather*
Loading