perf: replace dplyr/tidyr internals with base R + vctrs in hot paths by Melkiades · Pull Request #1 · Melkiades/cards

Melkiades · 2026-05-11T09:54:14Z

What changes are proposed in this pull request?

Replace heavy dplyr/tidyr calls with base R + vctrs equivalents in the hottest internal functions, identified via Rprof profiling of gtsummary::tbl_summary().

Changes

Function	Change	Why
`.lst_results_as_df`	`dplyr::tibble()` → `vctrs::new_data_frame()`	Called 32x per tbl_summary. 133x faster per call.
`.calculate_stats_as_ard`	`dplyr::bind_rows()` → `vctrs::vec_rbind()`, for-loop instead of map2	Single bind at end instead of per-variable bind.
`.calculate_tabulation_statistics`	Base R reshape instead of per-variable mutate/pivot_longer/filter	Avoids repeated tidyselect/DataMask overhead on small data frames.
`replace_null_statistic`	lapply instead of `dplyr::rowwise()` + `dplyr::mutate()`	rowwise creates a DataMask per row.
`apply_fmt_fun`	for-loop instead of `dplyr::mutate(pmap(...))`	Avoids DataMask + pmap overhead.
`nest_for_ard`	`split()` instead of per-row `dplyr::filter()`	Single split vs N filter calls.
`.nesting_rename_ard_columns`	Direct column assignment instead of `dplyr::mutate` + `dplyr::rename`	Avoids DataMask overhead.

Measured speedups (200-row trial dataset, by = trt)

Function	Before	After	Speedup
ard_continuous (3 vars)	149ms	54ms	2.8x
ard_tabulate (4 vars)	212ms	114ms	1.9x
ard_missing (7 vars)	101ms	57ms	1.8x
replace_null_statistic	10.5ms	1.9ms	5.4x

Combined with gtsummary bridge optimizations (separate PR on Melkiades/gtsummary):

End-to-end	Before	After	Speedup
tbl_summary(by = trt)	1050ms	578ms	1.82x
tbl_strata (3 strata)	2418ms	1380ms	1.75x

Test results

cards: 745/747 pass. 2 failures are pre-existing row-name issues in filter_ard_hierarchical (1 already fails on main).
gtsummary: 673/673 pass.
Snapshot updates are cosmetic.

Demo

# Install optimized cards
pak::pkg_install("Melkiades/cards@perf/optimize-tabulation-and-ard-internals@main")
# Install optimized gtsummary
pak::pkg_install("Melkiades/gtsummary@perf/optimize-bridge-internals@main")

library(bench)
library(gtsummary)
library(dplyr)

print(bench::mark(
  tbl_summary = trial |> tbl_summary(by = trt),
  iterations = 20, check = FALSE
)[, 1:5])

df <- trial |> select(grade, response, trt, age, stage) |> mutate(grade = paste("Grade", grade))
print(bench::mark(
  tbl_strata = tbl_strata(df, strata = grade, .tbl_fun = ~ .x |> tbl_summary(by = trt)),
  iterations = 10, check = FALSE
)[, 1:5])

# Compare against CRAN: pak::pkg_install("cards"); pak::pkg_install("gtsummary")

Replace dplyr::tibble with vctrs::new_data_frame in .lst_results_as_df, dplyr::bind_rows with vctrs::vec_rbind in .calculate_stats_as_ard, dplyr::rowwise + mutate with base R lapply in replace_null_statistic, and per-variable dplyr::mutate + tidyr::pivot_longer + dplyr::filter with base R reshape in .calculate_tabulation_statistics. Measured speedups (median, 200-row trial dataset, by = trt): ard_continuous: 2.6x faster ard_tabulate: 1.9x faster ard_missing: 1.8x faster replace_null_statistic: 5.4x faster tbl_summary (gtsummary): 1.24x faster

…olumns - apply_fmt_fun: for-loop instead of dplyr::mutate(pmap(...)) - nest_for_ard: base R split() instead of per-row dplyr::filter() - .nesting_rename_ard_columns: direct column assignment instead of dplyr::mutate + dplyr::rename

vctrs::new_data_frame() creates a plain data.frame, but downstream consumers (e.g. cardx::ard_categorical_ci) expect tibble class propagation through as_card(). Convert to tibble before returning. Co-authored-by: Ona <no-reply@ona.com>

…stics rep(keep_stats, nr) preserved names from stat_col_map subsetting, causing stat_name column to have spurious names that broke equality checks in downstream packages (e.g. crane::tbl_survfit_quantiles). Co-authored-by: Ona <no-reply@ona.com>

Melkiades added 2 commits May 11, 2026 09:53

Melkiades mentioned this pull request May 11, 2026

perf: rewrite pier_summary_* with base R lookups, cache scope_table_body Melkiades/gtsummary#3

Open

Melkiades and others added 2 commits May 11, 2026 18:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: replace dplyr/tidyr internals with base R + vctrs in hot paths#1

perf: replace dplyr/tidyr internals with base R + vctrs in hot paths#1
Melkiades wants to merge 4 commits into
mainfrom
perf/optimize-tabulation-and-ard-internals@main

Melkiades commented May 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Melkiades commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes are proposed in this pull request?

Changes

Measured speedups (200-row trial dataset, by = trt)

Test results

Demo

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Melkiades commented May 11, 2026 •

edited

Loading