CausalInference · remlapmot · Jun 8, 2026 · Jun 2, 2026 · Jun 2, 2026 · Jun 2, 2026
diff --git a/SEQTaRget/DESCRIPTION b/SEQTaRget/DESCRIPTION
@@ -1,7 +1,7 @@
 Package: SEQTaRget
 Type: Package
 Title: Sequential Trial Emulation
-Version: 1.4.2.9001
+Version: 1.4.2.9002
 Authors@R: c(person(given = "Ryan",
                     family = "O'Dea",
                     role = c("aut", "cre"),

diff --git a/SEQTaRget/NEWS.md b/SEQTaRget/NEWS.md
@@ -5,13 +5,22 @@
 * Speed up the hazard ratio calculation by fitting the Cox model with the survival C fitters directly on a prebuilt design matrix instead of `coxph(formula, data)`, avoiding the `model.frame`/`model.matrix` rebuild on every bootstrap iteration: `survival::coxph.fit()` for the non-competing-event model and `survival::agreg.fit()` for the competing-event Fine-Gray (counting-process) model. The hazard ratio and CIs are unchanged.
 * Fix the competing-event Fine-Gray hazard fit to use the `finegray()` case weights (`fgwt`), which are required for a valid subdistribution-hazard estimate and were previously omitted. This is a no-op for the current hazard simulation (which has only administrative censoring, so all `fgwt` are 1) but corrects the estimate should the simulated data ever carry random censoring.
 * Report competing events per treatment arm in the `@info` slot as `info$compevent.unique` and `info$compevent.nonunique`, mirroring the structure of `info$outcome.unique` / `info$outcome.nonunique`. Both are grouped by baseline treatment; the non-unique table counts all competing event occurrences in the expanded data and the unique table counts distinct subjects who experienced the competing event. Both are `NA` when no `compevent` is specified.
+* From a `SEQuential()` fit, populate `weight.statistics` and `outcome.model` when `hazard = TRUE`.
+* Warn when the `numerator` and `denominator` weight models are given identical covariates. In that case the stabilized weights all equal 1 (i.e., no weighting), which is usually a typo in the `denominator` argument.
+* Improve the `SEQuential()` helpfile by adding a per-protocol example.
+* Add behavioural tests that `selection.random = TRUE` retains all treated trial-starts, subsamples control trial-starts to the requested `selection.prob` fraction, and is reproducible under a fixed seed; rename the previous smoke test that did not actually exercise the feature.
+* Add behavioural tests for; `weight.lower`/`weight.upper` truncation, `weight.p99` truncation, `followup.include`/`trial.include`, `followup.class`, `weight.lag_condition`, `followup.min`/`followup.max`, and `weight.eligible_cols`.
+* Fix `numerator()` and `denominator()` returning `NULL` for every weighted model; they now return the fitted per-arm numerator/denominator weight models.
+* Document that weight truncation applies only to the outcome-model fit.
+* Pass the formula cache to `inline.pred()` in the weight models
+* Fix `SEQOpts()` argument ordering.
 
 # SEQTaRget v1.4.2
 
 * Remove mention of units from time in docs.
 * Improve memory usage in the bootstrapping.
 * Fix off-by-one labeling in survival output so that `followup = k` correctly represents survival after `k` intervals, adding a row at `followup = survival.max + 1` for the final interval's estimate.
-* Fix expansion bug where subjects experiencing the outcome early were incorrectly carried forward with `outcome=0` rows from subsequent periods by truncating each trial at the first event row
+* Fix expansion bug where subjects experiencing the outcome early were incorrectly carried forward with `outcome=0` rows from subsequent periods by truncating each trial at the first event row (thanks, @francescazaccagnino)
 * Add `expand.only` option to `SEQopts()`. When `TRUE`, `SEQuential()` returns the expanded `data.table` directly and skips the analysis steps, for users who want to inspect or store the expanded dataset on its own.
 * Fix `followup.spline = TRUE` so the basis is genuinely non-linear. Splines are now built into the model formula via `splines::ns()` instead of being applied as a single-column transform of `followup`, and the new `followup.spline.df` option (default `4`) controls the number of basis functions. The treatment-by-followup interaction now uses the same spline basis. Knots are baked from the full expanded `followup` once at fit time so the basis is identical at fit and prediction time across bootstraps and survival grids. Internally, formula column extraction now uses `all.vars()`, so user-supplied covariates may include `ns()`, `bs()`, `I()`, `factor()`, `poly()` etc. without breaking expansion.
 * Rename `format.time()` to `format_time()` because it wasn't an S3 method and hence was causing roxygen2 to write incorrect information in its helpfile.

diff --git a/SEQTaRget/R/SEQopts.R b/SEQTaRget/R/SEQopts.R
@@ -22,14 +22,13 @@
 #' @param excused Logical: in the case of censoring, whether there is an excused condition, default is `FALSE`
 #' @param excused.cols List: list of column names for treatment switch excuses - should be the same length, and ordered the same as \code{treat.level}
 #' @param fastglm.method Integer: decomposition method for fastglm (`0L`-column-pivoted QR, `1L`-unpivoted QR, `2L`-LLT Cholesky, `3L`-LDLT Cholesky), default is `2L`
-#' @param glm.package Character: package to use for fitting GLMs, either `"fastglm"` (default) or `"parglm"`. When `"parglm"` is selected the `nthreads` option controls the number of threads passed to `parglm::parglm.fit()`. For most realistic SEQTaRget workloads (expanded datasets up to approximately a few million rows) `"fastglm"` is faster; `"parglm"` may help only on substantially larger datasets where the parallel chunking outweighs its setup overhead.
-#' @param parglm.control A control object from [parglm::parglm.control()] to pass to `parglm::parglm.fit()`. Only used when `glm.package = "parglm"`. Defaults to `parglm::parglm.control(method = "FAST")`. If you encounter a `chol(): decomposition failed` error (e.g. with near-singular model matrices on large datasets), pass `parglm.control = parglm::parglm.control(method = "LAPACK")` to use the more numerically stable QR decomposition instead, or switch to using the fastglm backend.
 #' @param followup.class Logical: treat followup as a class, e.g. expands every time to it's own indicator column, default is `FALSE`
 #' @param followup.include Logical: whether or not to include 'followup' and 'followup_squared' in the outcome model, default is `TRUE`
 #' @param followup.max Numeric: maximum time to expand about, default is `Inf` (no maximum)
 #' @param followup.min Numeric: minimum follow-up time since trial enrollment to include, must be non-negative, default is `0`
 #' @param followup.spline Logical: treat followup as a natural cubic spline (`splines::ns()`), default is `FALSE`
 #' @param followup.spline.df Integer: degrees of freedom passed to `splines::ns()` when `followup.spline = TRUE`. With `df = k`, `ns()` places `k - 1` interior knots at quantiles of `followup`. Must be `>= 1`; `df = 1` is equivalent to a linear term and is generally not what you want. Default is `4` (3 interior knots).
+#' @param glm.package Character: package to use for fitting GLMs, either `"fastglm"` (default) or `"parglm"`. When `"parglm"` is selected the `nthreads` option controls the number of threads passed to `parglm::parglm.fit()`. For most realistic SEQTaRget workloads (expanded datasets up to approximately a few million rows) `"fastglm"` is faster; `"parglm"` may help only on substantially larger datasets where the parallel chunking outweighs its setup overhead.
 #' @param hazard Logical: hazard error calculation instead of survival estimation, default is `FALSE`
 #' @param indicator.baseline String: identifier for baseline variables in \code{covariates, numerator, denominator} - intended as an override
 #' @param indicator.squared String: identifier for squared variables in \code{covariates, numerator, denominator} - intended as an override
@@ -39,6 +38,7 @@
 #' @param nthreads Integer: number of threads to use for data.table processing, default is [data.table::getDTthreads()]
 #' @param numerator String: numerator covariates to the right hand side of a formula object
 #' @param parallel Logical: define if the SEQuential process is run in parallel, default is `FALSE`
+#' @param parglm.control A control object from [parglm::parglm.control()] to pass to `parglm::parglm.fit()`. Only used when `glm.package = "parglm"`. Defaults to `parglm::parglm.control(method = "FAST")`. If you encounter a `chol(): decomposition failed` error (e.g. with near-singular model matrices on large datasets), pass `parglm.control = parglm::parglm.control(method = "LAPACK")` to use the more numerically stable QR decomposition instead, or switch to using the fastglm backend.
 #' @param plot.colors Character: Colors for output plot if \code{km.curves = TRUE}, defaulted to ggplot2 defaults
 #' @param plot.labels Character: Color labels for output plot if \code{km.curves = TRUE} in order e.g. \code{c("risk.0", "risk.1")}
 #' @param plot.subtitle Character: Subtitle for output plot if \code{km.curves = TRUE}
@@ -57,11 +57,11 @@
 #' @param visit.denominator String: visit denominator covariates to the right hand side of a formula object
 #' @param visit.numerator String: visit numerator covariates to the right hand side of a formula object
 #' @param weight.eligible_cols List: list of column names for indicator columns defining which weights are eligible for weight models - in order of \code{treat.level}
-#' @param weight.lower Numeric: IPCW weights truncated at this lower bound, must be non-negative, default is `0`
+#' @param weight.lower Numeric: IPCW weights truncated at this lower bound, must be non-negative, default is `0`. Truncation is applied only to the weights used to fit the outcome model; the weights reported in \code{weight.statistics} and in the returned data (when \code{data.return = TRUE}) are the untruncated values.
 #' @param weight.lag_condition Logical: whether weights should be conditioned on treatment lag value, default `TRUE`
-#' @param weight.p99 Logical: forces weight truncation at 1st and 99th percentile weights, will override provided \code{weight.upper} and \code{weight.lower}
+#' @param weight.p99 Logical: forces weight truncation at 1st and 99th percentile weights, will override provided \code{weight.upper} and \code{weight.lower}. The percentiles are taken from the untruncated weight distribution (as reported in \code{weight.statistics}), and as with \code{weight.lower}/\code{weight.upper} the truncation affects only the weights used to fit the outcome model.
 #' @param weight.preexpansion Logical: whether weighting should be done on pre-expanded data, default `TRUE`
-#' @param weight.upper Numeric: weights truncated at upper end at this weight, default is `Inf`
+#' @param weight.upper Numeric: weights truncated at upper end at this weight, default is `Inf`. As with \code{weight.lower}, truncation affects only the weights used to fit the outcome model, not those reported in \code{weight.statistics} or the returned data.
 #' @param weighted Logical: whether or not to perform weighted analysis, default is `FALSE`
 #' @returns An object of class 'SEQopts'
 #' @export
@@ -72,11 +72,12 @@ SEQopts <- function(bootstrap = FALSE, bootstrap.nboot = 100, bootstrap.sample =
                     cense = NA, cense.denominator = NA, cense.eligible = NA, cense.numerator = NA,
                     compevent = NA, covariates = NA, data.return = FALSE, denominator = NA,
                     deviation = FALSE, deviation.col = NA, deviation.conditions = c(NA, NA), deviation.excused = FALSE, deviation.excused_cols = c(NA, NA),
-                    excused = FALSE, excused.cols = c(NA, NA), expand.only = FALSE, fastglm.method = 2L, glm.package = "fastglm", parglm.control = NULL,
+                    excused = FALSE, excused.cols = c(NA, NA), expand.only = FALSE, fastglm.method = 2L,
                     followup.class = FALSE, followup.include = TRUE, followup.max = Inf, followup.min = 0, followup.spline = FALSE, followup.spline.df = 4L,
+                    glm.package = "fastglm",
                     hazard = FALSE, indicator.baseline = "_bas", indicator.squared = "_sq",
                     km.curves = FALSE, multinomial = FALSE, ncores = availableCores(omit = 1L), nthreads = getDTthreads(),
-                    numerator = NA, parallel = FALSE, plot.colors = c("#F8766D", "#00BFC4", "#555555"), plot.labels = NA, plot.subtitle = NA, plot.title = NA, plot.type = "survival",
+                    numerator = NA, parallel = FALSE, parglm.control = NULL, plot.colors = c("#F8766D", "#00BFC4", "#555555"), plot.labels = NA, plot.subtitle = NA, plot.title = NA, plot.type = "survival",
                     risk.times = NA,
                     seed = NULL, selection.first_trial = FALSE, selection.prob = 0.8, selection.random = FALSE, subgroup = NA, survival.max = Inf,
                     treat.level = c(0, 1), trial.include = TRUE,

diff --git a/SEQTaRget/R/SEQuential.R b/SEQTaRget/R/SEQuential.R
@@ -29,15 +29,37 @@
 #' @examples
 #' \donttest{
 #' data <- SEQdata
-#' model <- SEQuential(data, id.col = "ID", 
-#'                           time.col = "time", 
-#'                           eligible.col = "eligible",
-#'                           treatment.col = "tx_init",
-#'                           outcome.col = "outcome",
-#'                           time_varying.cols = c("N", "L", "P"),
-#'                           fixed.cols = "sex",
-#'                           method = "ITT", 
-#'                           options = SEQopts())
+#'
+#' # Intention-to-treat (ITT) effect: subjects are assigned to the treatment
+#' # arm defined by their baseline treatment and followed regardless of any later
+#' # treatment changes, so no weighting is required.
+#' SEQuential(data, id.col = "ID",
+#'            time.col = "time",
+#'            eligible.col = "eligible",
+#'            treatment.col = "tx_init",
+#'            outcome.col = "outcome",
+#'            time_varying.cols = c("N", "L", "P"),
+#'            fixed.cols = "sex",
+#'            method = "ITT",
+#'            options = SEQopts())
+#'
+#' # Per-protocol effect via artificial censoring: subjects are censored when they
+#' # deviate from their assigned strategy, and inverse-probability-of-censoring
+#' # weights adjust for the resulting selection bias. The denominator models the
+#' # probability of remaining uncensored given the time-varying confounders, while
+#' # the numerator uses only the baseline covariates to stabilize the weights (so
+#' # the two formulas must differ - identical formulas give weights of 1).
+#' SEQuential(data, id.col = "ID",
+#'            time.col = "time",
+#'            eligible.col = "eligible",
+#'            treatment.col = "tx_init",
+#'            outcome.col = "outcome",
+#'            time_varying.cols = c("N", "L", "P"),
+#'            fixed.cols = "sex",
+#'            method = "censoring",
+#'            options = SEQopts(weighted = TRUE,
+#'                              numerator = "sex",
+#'                              denominator = "N + L + P + sex"))
 #' }
 #'
 #' @export
@@ -248,6 +270,8 @@ SEQuential <- function(data, id.col, time.col, eligible.col, treatment.col, outc
       label <- subgroups[[i]]
       models <- lapply(analytic, function(x) x$model[[i]])
       hazard[[label]] <- internal.hazard(models, params, formula_cache)
+      outcome[[label]] <- lapply(models, function(x) clean_fastglm(x$model))
+      weights[[label]] <- lapply(analytic, function(x) x$weighted_stats)
     }
   }
   rm(analytic)

diff --git a/SEQTaRget/R/class_methods.R b/SEQTaRget/R/class_methods.R
@@ -184,8 +184,11 @@ setMethod("show", "SEQoutput", function(object) {
 numerator <- function(object) {
   if (!is(object, "SEQoutput")) stop("Object is not of class SEQoutput")
   if (!object@params@weighted) stop("SEQuential process was not weighted")
-  return(list(numerator0 = lapply(object@weight.statistics, function(x) x[[1]]$n0.coef),
-              numerator1 = lapply(object@weight.statistics, function(x) x[[1]]$n1.coef)))
+  arm <- function(i) lapply(object@weight.statistics, function(x) {
+    coefs <- x[[1]]$coef.numerator
+    if (length(coefs) >= i) coefs[[i]] else NULL
+  })
+  return(list(numerator0 = arm(1L), numerator1 = arm(2L)))
 }
 
 #' Retrieves Denominator Models from SEQuential object
@@ -198,8 +201,11 @@ numerator <- function(object) {
 denominator <- function(object) {
   if (!is(object, "SEQoutput")) stop("Object is not of class SEQoutput")
   if (!object@params@weighted) stop("SEQuential process was not weighted")
-  return(list(denominator0 = lapply(object@weight.statistics, function(x) x[[1]]$d0.coef),
-              denominator1 = lapply(object@weight.statistics, function(x) x[[1]]$d1.coef)))
+  arm <- function(i) lapply(object@weight.statistics, function(x) {
+    coefs <- x[[1]]$coef.denominator
+    if (length(coefs) >= i) coefs[[i]] else NULL
+  })
+  return(list(denominator0 = arm(1L), denominator1 = arm(2L)))
 }
 
 #' Retrieves Outcome Models from SEQuential object

diff --git a/SEQTaRget/R/class_setters.R b/SEQTaRget/R/class_setters.R
@@ -137,6 +137,14 @@ parameter.simplifier <- function(params) {
     warning("Without LTFU or Visit, weighted ITT model is not supported, automatically changed to weighted = FALSE")
     params@weighted <- FALSE
   }
+
+  if (params@weighted && params@method != "ITT" &&
+      !is.na(params@numerator) && !is.na(params@denominator) &&
+      identical(params@numerator, params@denominator)) {
+    warning("Numerator and denominator weight models use identical covariates ('", params@numerator,
+            "'); the stabilized weights will all equal 1 (i.e., no weighting). The denominator should ",
+            "typically include the time-varying confounders that the numerator omits - check for a typo in either or both of 'numerator' and 'denominator'.")
+  }
   if (params@followup.class && params@followup.spline) stop("Followup cannot be both a class and a spline, please select one.")
   if (!params@plot.type %in% c("survival", "risk", "inc")) stop("Supported plot types are 'survival', 'risk', and 'inc' (in the case of censoring), please select one.")