Details of the purpose and any published outputs from this project can be found at the link above.
The contents of this repository MUST NOT be considered an accurate or valid representation of the study or its purpose. This repository may reflect an incomplete or incorrect analysis with no further ongoing work. The content has ONLY been made public to support the OpenSAFELY open science and transparency principles and to support the sharing of re-usable code for other subsequent users. No clinical, policy or safety conclusions must be drawn from the contents of this repository.
The OpenSAFELY framework is a Trusted Research Environment (TRE) for electronic health records research in the NHS, with a focus on public accountability and research quality.
Read more at OpenSAFELY.org.
As standard, research projects have a MIT license.
COMPARES-vaccines: a COMmon Protocol for the Analysis of Relative Effectiveness and Safety of Covid-19 vaccine products
This repository contains analytic code for a common analytic protocol, applicable to a chosen Covid-19 vaccination campaign in England, to make head-to-head comparisons between the vaccine products used in that campaign.
The Protocol accommodates the following campaign-specific characteristics:
- start and end dates
- vaccine products
- study eligibility criteria
This repo should be forked (maybe - TBD) when starting an analysis for the next campaign.
- The
codelists/directory contains all the codelists used to define variables in analysis. - The
analysis/directory contains the executable scripts used to conduct the analysis. - The
project.yamldefines run-order and dependencies for all the analysis scripts. This file should not be edited directly. To make changes to the yaml, edit and run theanalysis/lib-0/create-project.Rscript instead. - Non-disclosive model outputs, including tables, figures, etc, are available via the OpenSAFELY job server.
The analysis scripts in the analysis/ directory are organised into sub-directories as follows:
-
design.Rdefines the campaign-specific design elements (or parameters) used throughout the study (start and end dates, eligibility, products, etc). It also defines matching and weighting specification, look-up dictionaries, and other useful objects. This script is run at the start of all subsequent R scripts, including thecreate-project.Rscript to ensure study-wide parameters are passed to the dataset definition via the project.yaml.create-project.Rcreates theproject.yamlfile defining action outputs and dependencies.utility.Rdefines functions used throughout the codebase. This script is run at the start of all subsequent R scripts.
-
dataset_definition.pyis the script defining the dataset to extract from the database, using ehrQL.dummy_dataset_definition.Rdefines a custom dummy dataset.
This can be used instead of the dummy data created by ehrQL when it is necessarily to have more control over the structure in the data, such as more realistic vaccination dates or event rates. If the dataset definition is updated, this script must also be updated to ensure variable names and types match.variables.pycontains some function and variable definitions to be read in by the dataset definition.codelist.pypulls the codelists from thecodelists/directory to be usable in the dataset definition.
-
balance.R(cohort,method,spec) runs a script to balance characteristics across vaccine groups. It uses different methods, each of which attempts to obtain balance on baseline variables:- If
method = "match"then it runs a matching algorithm to pair recipients of product A with product B, with matching criteria determined byspec. It outputs a dataset containing the matching "weights" (0/1), and a matching ID. - If
method = "weight"then it estimates a propensity model to estimate the probability of receipt of product A versus product B, with the model determined byspec. It outputs a dataset containing the person-specific weights. - If
method = "lmw"then it derives the weights that are implied if running a linear outcome regression model, with one model for each product. These weights can be obtained independently of the outcome. It outputs a dataset containing the person-specific weights.
- If
report.R(cohort,method,spec) describes baseline information for the matched or weighted or lmw method eg Table 1 type cohort characteristics, post-weighting balance checks.combine-weights.R(cohort) combines weights across all weighted and matched analyses for the given cohort. Also calculates the Effective sample size based on the weights.match-coverage.R(cohort,spec) describes matching rates over calendar time.
-
aj.R(cohort,method,spec,subgroup,outcome) derives Aalen-Johansen survival estimates for each product and calculates relative risk and risk differences. This is largely based on the OpenSAFELY Kaplan-Meier reusable action, with an extension to AJ estimates.plr.R(cohort,method,spec,subgroup,outcome) compares cumulative incidence curves between products using pooled logistic regression.combine-contrasts.Rcollects treatment contrasts from theaj.Randplr.Rscripts.
Scripts may take one or more arguments:
cohort, the name of the cohort to be analysed, defined in thedesign.Rscript.spec, the matching or weighting specification, taking values A, B, C, etc for convenience, and fully defined in thedesign.Rscript. For matching,specis the set of variables to match on. For weighting,specis the model formula passed to theweightit()function.method, taking values match or weight or lmw.subgroup, the subgroup variable. Cumulative incidences will be calculated separately within each level of this variable. Choose all for no subgroups (i.e., the main analysis). Choose to select a specific variable to stratify on. This variable must be exactly matched in the matching run if usingmethod="match", and must be used as a stratification variable if usingmethod="weight"(this requirement is under review!)outcome, the outcome of interest, for example covid_admitted or covid_death.
This will appear on the job server page for the ECHO project when it is ready to be run for the first time.
Draft version 0.1 of the protocol is available as a PDF file.
A poster describing the protocol, presented at ISCB46, is available as a PDF file.