Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
d01ee0f
Add weight.statistics when hazard = TRUE
remlapmot Jun 2, 2026
7b267ab
Bump version
remlapmot Jun 2, 2026
285eca1
Populate outcome.model when hazard = TRUE
remlapmot Jun 2, 2026
c8bf595
Add warning if numerator and denominator are the same
remlapmot Jun 2, 2026
b059192
Update NEWS.md
remlapmot Jun 2, 2026
43546a6
Improve SEQuential helpfile
remlapmot Jun 2, 2026
1107b49
Document
remlapmot Jun 2, 2026
7a7b133
Add proper tests for selection.random
remlapmot Jun 3, 2026
9d467d2
Add behavioural test for weight.lower/weight.upper truncation
remlapmot Jun 4, 2026
f166621
Add behavioural test for weight.p99 truncation
remlapmot Jun 4, 2026
84309b3
Add behavioural test for followup.include/trial.include
remlapmot Jun 4, 2026
14065fe
Add behavioural test for followup.class
remlapmot Jun 4, 2026
6bb857b
Add behavioural test for weight.lag_condition
remlapmot Jun 4, 2026
5a3ddbe
Add behavioural test for followup.min / followup.max
remlapmot Jun 4, 2026
f222950
Add behavioural test for weight.eligible_cols
remlapmot Jun 4, 2026
7feb05a
Fix numerator()/denominator() accessors returning NULL
remlapmot Jun 4, 2026
265f96e
Document that weight truncation applies only to the outcome-model fit
remlapmot Jun 4, 2026
607166d
Document
remlapmot Jun 4, 2026
9b7603e
Add thanks to Francesca
remlapmot Jun 4, 2026
ef7ef87
Pass the formula cache to inline.pred() in the weight models
remlapmot Jun 6, 2026
702b739
Fix argument ordering
remlapmot Jun 7, 2026
9536e79
Document
remlapmot Jun 7, 2026
650d9c9
Update NEWS.md
remlapmot Jun 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion SEQTaRget/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: SEQTaRget
Type: Package
Title: Sequential Trial Emulation
Version: 1.4.2.9001
Version: 1.4.2.9002
Authors@R: c(person(given = "Ryan",
family = "O'Dea",
role = c("aut", "cre"),
Expand Down
11 changes: 10 additions & 1 deletion SEQTaRget/NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,22 @@
* Speed up the hazard ratio calculation by fitting the Cox model with the survival C fitters directly on a prebuilt design matrix instead of `coxph(formula, data)`, avoiding the `model.frame`/`model.matrix` rebuild on every bootstrap iteration: `survival::coxph.fit()` for the non-competing-event model and `survival::agreg.fit()` for the competing-event Fine-Gray (counting-process) model. The hazard ratio and CIs are unchanged.
* Fix the competing-event Fine-Gray hazard fit to use the `finegray()` case weights (`fgwt`), which are required for a valid subdistribution-hazard estimate and were previously omitted. This is a no-op for the current hazard simulation (which has only administrative censoring, so all `fgwt` are 1) but corrects the estimate should the simulated data ever carry random censoring.
* Report competing events per treatment arm in the `@info` slot as `info$compevent.unique` and `info$compevent.nonunique`, mirroring the structure of `info$outcome.unique` / `info$outcome.nonunique`. Both are grouped by baseline treatment; the non-unique table counts all competing event occurrences in the expanded data and the unique table counts distinct subjects who experienced the competing event. Both are `NA` when no `compevent` is specified.
* From a `SEQuential()` fit, populate `weight.statistics` and `outcome.model` when `hazard = TRUE`.
* Warn when the `numerator` and `denominator` weight models are given identical covariates. In that case the stabilized weights all equal 1 (i.e., no weighting), which is usually a typo in the `denominator` argument.
* Improve the `SEQuential()` helpfile by adding a per-protocol example.
* Add behavioural tests that `selection.random = TRUE` retains all treated trial-starts, subsamples control trial-starts to the requested `selection.prob` fraction, and is reproducible under a fixed seed; rename the previous smoke test that did not actually exercise the feature.
* Add behavioural tests for; `weight.lower`/`weight.upper` truncation, `weight.p99` truncation, `followup.include`/`trial.include`, `followup.class`, `weight.lag_condition`, `followup.min`/`followup.max`, and `weight.eligible_cols`.
* Fix `numerator()` and `denominator()` returning `NULL` for every weighted model; they now return the fitted per-arm numerator/denominator weight models.
* Document that weight truncation applies only to the outcome-model fit.
* Pass the formula cache to `inline.pred()` in the weight models
* Fix `SEQOpts()` argument ordering.

# SEQTaRget v1.4.2

* Remove mention of units from time in docs.
* Improve memory usage in the bootstrapping.
* Fix off-by-one labeling in survival output so that `followup = k` correctly represents survival after `k` intervals, adding a row at `followup = survival.max + 1` for the final interval's estimate.
* Fix expansion bug where subjects experiencing the outcome early were incorrectly carried forward with `outcome=0` rows from subsequent periods by truncating each trial at the first event row
* Fix expansion bug where subjects experiencing the outcome early were incorrectly carried forward with `outcome=0` rows from subsequent periods by truncating each trial at the first event row (thanks, @francescazaccagnino)
* Add `expand.only` option to `SEQopts()`. When `TRUE`, `SEQuential()` returns the expanded `data.table` directly and skips the analysis steps, for users who want to inspect or store the expanded dataset on its own.
* Fix `followup.spline = TRUE` so the basis is genuinely non-linear. Splines are now built into the model formula via `splines::ns()` instead of being applied as a single-column transform of `followup`, and the new `followup.spline.df` option (default `4`) controls the number of basis functions. The treatment-by-followup interaction now uses the same spline basis. Knots are baked from the full expanded `followup` once at fit time so the basis is identical at fit and prediction time across bootstraps and survival grids. Internally, formula column extraction now uses `all.vars()`, so user-supplied covariates may include `ns()`, `bs()`, `I()`, `factor()`, `poly()` etc. without breaking expansion.
* Rename `format.time()` to `format_time()` because it wasn't an S3 method and hence was causing roxygen2 to write incorrect information in its helpfile.
Expand Down
15 changes: 8 additions & 7 deletions SEQTaRget/R/SEQopts.R
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,13 @@
#' @param excused Logical: in the case of censoring, whether there is an excused condition, default is `FALSE`
#' @param excused.cols List: list of column names for treatment switch excuses - should be the same length, and ordered the same as \code{treat.level}
#' @param fastglm.method Integer: decomposition method for fastglm (`0L`-column-pivoted QR, `1L`-unpivoted QR, `2L`-LLT Cholesky, `3L`-LDLT Cholesky), default is `2L`
#' @param glm.package Character: package to use for fitting GLMs, either `"fastglm"` (default) or `"parglm"`. When `"parglm"` is selected the `nthreads` option controls the number of threads passed to `parglm::parglm.fit()`. For most realistic SEQTaRget workloads (expanded datasets up to approximately a few million rows) `"fastglm"` is faster; `"parglm"` may help only on substantially larger datasets where the parallel chunking outweighs its setup overhead.
#' @param parglm.control A control object from [parglm::parglm.control()] to pass to `parglm::parglm.fit()`. Only used when `glm.package = "parglm"`. Defaults to `parglm::parglm.control(method = "FAST")`. If you encounter a `chol(): decomposition failed` error (e.g. with near-singular model matrices on large datasets), pass `parglm.control = parglm::parglm.control(method = "LAPACK")` to use the more numerically stable QR decomposition instead, or switch to using the fastglm backend.
#' @param followup.class Logical: treat followup as a class, e.g. expands every time to it's own indicator column, default is `FALSE`
#' @param followup.include Logical: whether or not to include 'followup' and 'followup_squared' in the outcome model, default is `TRUE`
#' @param followup.max Numeric: maximum time to expand about, default is `Inf` (no maximum)
#' @param followup.min Numeric: minimum follow-up time since trial enrollment to include, must be non-negative, default is `0`
#' @param followup.spline Logical: treat followup as a natural cubic spline (`splines::ns()`), default is `FALSE`
#' @param followup.spline.df Integer: degrees of freedom passed to `splines::ns()` when `followup.spline = TRUE`. With `df = k`, `ns()` places `k - 1` interior knots at quantiles of `followup`. Must be `>= 1`; `df = 1` is equivalent to a linear term and is generally not what you want. Default is `4` (3 interior knots).
#' @param glm.package Character: package to use for fitting GLMs, either `"fastglm"` (default) or `"parglm"`. When `"parglm"` is selected the `nthreads` option controls the number of threads passed to `parglm::parglm.fit()`. For most realistic SEQTaRget workloads (expanded datasets up to approximately a few million rows) `"fastglm"` is faster; `"parglm"` may help only on substantially larger datasets where the parallel chunking outweighs its setup overhead.
#' @param hazard Logical: hazard error calculation instead of survival estimation, default is `FALSE`
#' @param indicator.baseline String: identifier for baseline variables in \code{covariates, numerator, denominator} - intended as an override
#' @param indicator.squared String: identifier for squared variables in \code{covariates, numerator, denominator} - intended as an override
Expand All @@ -39,6 +38,7 @@
#' @param nthreads Integer: number of threads to use for data.table processing, default is [data.table::getDTthreads()]
#' @param numerator String: numerator covariates to the right hand side of a formula object
#' @param parallel Logical: define if the SEQuential process is run in parallel, default is `FALSE`
#' @param parglm.control A control object from [parglm::parglm.control()] to pass to `parglm::parglm.fit()`. Only used when `glm.package = "parglm"`. Defaults to `parglm::parglm.control(method = "FAST")`. If you encounter a `chol(): decomposition failed` error (e.g. with near-singular model matrices on large datasets), pass `parglm.control = parglm::parglm.control(method = "LAPACK")` to use the more numerically stable QR decomposition instead, or switch to using the fastglm backend.
#' @param plot.colors Character: Colors for output plot if \code{km.curves = TRUE}, defaulted to ggplot2 defaults
#' @param plot.labels Character: Color labels for output plot if \code{km.curves = TRUE} in order e.g. \code{c("risk.0", "risk.1")}
#' @param plot.subtitle Character: Subtitle for output plot if \code{km.curves = TRUE}
Expand All @@ -57,11 +57,11 @@
#' @param visit.denominator String: visit denominator covariates to the right hand side of a formula object
#' @param visit.numerator String: visit numerator covariates to the right hand side of a formula object
#' @param weight.eligible_cols List: list of column names for indicator columns defining which weights are eligible for weight models - in order of \code{treat.level}
#' @param weight.lower Numeric: IPCW weights truncated at this lower bound, must be non-negative, default is `0`
#' @param weight.lower Numeric: IPCW weights truncated at this lower bound, must be non-negative, default is `0`. Truncation is applied only to the weights used to fit the outcome model; the weights reported in \code{weight.statistics} and in the returned data (when \code{data.return = TRUE}) are the untruncated values.
#' @param weight.lag_condition Logical: whether weights should be conditioned on treatment lag value, default `TRUE`
#' @param weight.p99 Logical: forces weight truncation at 1st and 99th percentile weights, will override provided \code{weight.upper} and \code{weight.lower}
#' @param weight.p99 Logical: forces weight truncation at 1st and 99th percentile weights, will override provided \code{weight.upper} and \code{weight.lower}. The percentiles are taken from the untruncated weight distribution (as reported in \code{weight.statistics}), and as with \code{weight.lower}/\code{weight.upper} the truncation affects only the weights used to fit the outcome model.
#' @param weight.preexpansion Logical: whether weighting should be done on pre-expanded data, default `TRUE`
#' @param weight.upper Numeric: weights truncated at upper end at this weight, default is `Inf`
#' @param weight.upper Numeric: weights truncated at upper end at this weight, default is `Inf`. As with \code{weight.lower}, truncation affects only the weights used to fit the outcome model, not those reported in \code{weight.statistics} or the returned data.
#' @param weighted Logical: whether or not to perform weighted analysis, default is `FALSE`
#' @returns An object of class 'SEQopts'
#' @export
Expand All @@ -72,11 +72,12 @@ SEQopts <- function(bootstrap = FALSE, bootstrap.nboot = 100, bootstrap.sample =
cense = NA, cense.denominator = NA, cense.eligible = NA, cense.numerator = NA,
compevent = NA, covariates = NA, data.return = FALSE, denominator = NA,
deviation = FALSE, deviation.col = NA, deviation.conditions = c(NA, NA), deviation.excused = FALSE, deviation.excused_cols = c(NA, NA),
excused = FALSE, excused.cols = c(NA, NA), expand.only = FALSE, fastglm.method = 2L, glm.package = "fastglm", parglm.control = NULL,
excused = FALSE, excused.cols = c(NA, NA), expand.only = FALSE, fastglm.method = 2L,
followup.class = FALSE, followup.include = TRUE, followup.max = Inf, followup.min = 0, followup.spline = FALSE, followup.spline.df = 4L,
glm.package = "fastglm",
hazard = FALSE, indicator.baseline = "_bas", indicator.squared = "_sq",
km.curves = FALSE, multinomial = FALSE, ncores = availableCores(omit = 1L), nthreads = getDTthreads(),
numerator = NA, parallel = FALSE, plot.colors = c("#F8766D", "#00BFC4", "#555555"), plot.labels = NA, plot.subtitle = NA, plot.title = NA, plot.type = "survival",
numerator = NA, parallel = FALSE, parglm.control = NULL, plot.colors = c("#F8766D", "#00BFC4", "#555555"), plot.labels = NA, plot.subtitle = NA, plot.title = NA, plot.type = "survival",
risk.times = NA,
seed = NULL, selection.first_trial = FALSE, selection.prob = 0.8, selection.random = FALSE, subgroup = NA, survival.max = Inf,
treat.level = c(0, 1), trial.include = TRUE,
Expand Down
42 changes: 33 additions & 9 deletions SEQTaRget/R/SEQuential.R
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,37 @@
#' @examples
#' \donttest{
#' data <- SEQdata
#' model <- SEQuential(data, id.col = "ID",
#' time.col = "time",
#' eligible.col = "eligible",
#' treatment.col = "tx_init",
#' outcome.col = "outcome",
#' time_varying.cols = c("N", "L", "P"),
#' fixed.cols = "sex",
#' method = "ITT",
#' options = SEQopts())
#'
#' # Intention-to-treat (ITT) effect: subjects are assigned to the treatment
#' # arm defined by their baseline treatment and followed regardless of any later
#' # treatment changes, so no weighting is required.
#' SEQuential(data, id.col = "ID",
#' time.col = "time",
#' eligible.col = "eligible",
#' treatment.col = "tx_init",
#' outcome.col = "outcome",
#' time_varying.cols = c("N", "L", "P"),
#' fixed.cols = "sex",
#' method = "ITT",
#' options = SEQopts())
#'
#' # Per-protocol effect via artificial censoring: subjects are censored when they
#' # deviate from their assigned strategy, and inverse-probability-of-censoring
#' # weights adjust for the resulting selection bias. The denominator models the
#' # probability of remaining uncensored given the time-varying confounders, while
#' # the numerator uses only the baseline covariates to stabilize the weights (so
#' # the two formulas must differ - identical formulas give weights of 1).
#' SEQuential(data, id.col = "ID",
#' time.col = "time",
#' eligible.col = "eligible",
#' treatment.col = "tx_init",
#' outcome.col = "outcome",
#' time_varying.cols = c("N", "L", "P"),
#' fixed.cols = "sex",
#' method = "censoring",
#' options = SEQopts(weighted = TRUE,
#' numerator = "sex",
#' denominator = "N + L + P + sex"))
#' }
#'
#' @export
Expand Down Expand Up @@ -248,6 +270,8 @@ SEQuential <- function(data, id.col, time.col, eligible.col, treatment.col, outc
label <- subgroups[[i]]
models <- lapply(analytic, function(x) x$model[[i]])
hazard[[label]] <- internal.hazard(models, params, formula_cache)
outcome[[label]] <- lapply(models, function(x) clean_fastglm(x$model))
weights[[label]] <- lapply(analytic, function(x) x$weighted_stats)
}
}
rm(analytic)
Expand Down
14 changes: 10 additions & 4 deletions SEQTaRget/R/class_methods.R
Original file line number Diff line number Diff line change
Expand Up @@ -184,8 +184,11 @@ setMethod("show", "SEQoutput", function(object) {
numerator <- function(object) {
if (!is(object, "SEQoutput")) stop("Object is not of class SEQoutput")
if (!object@params@weighted) stop("SEQuential process was not weighted")
return(list(numerator0 = lapply(object@weight.statistics, function(x) x[[1]]$n0.coef),
numerator1 = lapply(object@weight.statistics, function(x) x[[1]]$n1.coef)))
arm <- function(i) lapply(object@weight.statistics, function(x) {
coefs <- x[[1]]$coef.numerator
if (length(coefs) >= i) coefs[[i]] else NULL
})
return(list(numerator0 = arm(1L), numerator1 = arm(2L)))
}

#' Retrieves Denominator Models from SEQuential object
Expand All @@ -198,8 +201,11 @@ numerator <- function(object) {
denominator <- function(object) {
if (!is(object, "SEQoutput")) stop("Object is not of class SEQoutput")
if (!object@params@weighted) stop("SEQuential process was not weighted")
return(list(denominator0 = lapply(object@weight.statistics, function(x) x[[1]]$d0.coef),
denominator1 = lapply(object@weight.statistics, function(x) x[[1]]$d1.coef)))
arm <- function(i) lapply(object@weight.statistics, function(x) {
coefs <- x[[1]]$coef.denominator
if (length(coefs) >= i) coefs[[i]] else NULL
})
return(list(denominator0 = arm(1L), denominator1 = arm(2L)))
}

#' Retrieves Outcome Models from SEQuential object
Expand Down
8 changes: 8 additions & 0 deletions SEQTaRget/R/class_setters.R
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,14 @@ parameter.simplifier <- function(params) {
warning("Without LTFU or Visit, weighted ITT model is not supported, automatically changed to weighted = FALSE")
params@weighted <- FALSE
}

if (params@weighted && params@method != "ITT" &&
!is.na(params@numerator) && !is.na(params@denominator) &&
identical(params@numerator, params@denominator)) {
warning("Numerator and denominator weight models use identical covariates ('", params@numerator,
"'); the stabilized weights will all equal 1 (i.e., no weighting). The denominator should ",
"typically include the time-varying confounders that the numerator omits - check for a typo in either or both of 'numerator' and 'denominator'.")
}
if (params@followup.class && params@followup.spline) stop("Followup cannot be both a class and a spline, please select one.")
if (!params@plot.type %in% c("survival", "risk", "inc")) stop("Supported plot types are 'survival', 'risk', and 'inc' (in the case of censoring), please select one.")

Expand Down
Loading
Loading