Releases: jaredhuling/fastglm
CRAN release 0.1.0
fastglm 0.1.0
New features
-
New top-level function
fastglm_nb()for negative-binomial
regression with the dispersionthetaestimated jointly with the
regression coefficients. Plays the same role asMASS::glm.nb(),
but the IRLS loop, the inner Brent root-find fortheta, and the
outer(beta, theta)alternation all run entirely in C++. -
New top-level function
fastglm_hurdle()for two-part count models
with a binary zero / non-zero component and a zero-truncated Poisson
or NB count component. Same model aspscl::hurdle(); the joint fit
runs in a single C++ driver that calls the existing IRLS solver
twice and runs an inner Brent MLE forthetain the unknown-theta
NB case. -
New top-level function
fastglm_zi()for zero-inflated Poisson and
NB regression. Same model aspscl::zeroinfl(); the entire EM
driver runs in C++, including closed-form posterior responsibilities,
weighted IRLS for both M-steps, the inner Brent MLE fortheta, and
the analytical observed-informationvcov. -
New
firth = TRUEargument tofastglm()andfastglm_fit()for
Firth's bias-reducing penalty on the score. Currently supported for
family = binomial(link = "logit")on dense designs; converges in
finite steps under separation, where unpenalized IRLS would diverge.
Coefficients agree withlogistf::logistf()to 1e-7 on the standard
Heinze-Schemper test cases. -
New built-in
negbin(theta, link)family for negative-binomial
regression with known dispersion, dispatched to a native C++ kernel
on thelog,sqrt, andidentitylinks. Drop-in for
MASS::negative.binomial(theta, link)without a hard dependency on
MASS. -
Native fast paths for the
family$family == "Negative Binomial(K)"
form produced byMASS::negative.binomial(), the
quasibinomial()/quasipoisson()families, andstatmod::tweedie().
Detection is automatic.
Documentation
-
New vignette
count-firth-fastglmcoveringfastglm_nb(),
fastglm_hurdle(),fastglm_zi(), and thefirth = TRUEflag, with
small inline accuracy and timing comparisons againstMASS::glm.nb,
pscl::hurdle,pscl::zeroinfl, andlogistf::logistf. -
New vignette
benchmarks-fastglmproviding a more comprehensive
benchmark study at larger sample sizes, covering all six model
classes. Pre-compiled at maintainer build time so it does not run
duringR CMD check/R CMD buildand adds essentially zero time
to the package check budget. -
fastglm-overviewupdated with a short section introducing the new
count-data and Firth entry points.
Other new features
-
Real
vcov()for"fastglm"objects: the unscaled covariance is now
computed in C++ from the IRLS factorisation directly and exposed as
$cov.unscaled.summary.fastglm(),vcov.fastglm(), and
predict.fastglm(se.fit = TRUE)now all work without a refit hack;
vcov.fastglmFit()no longer re-runsglm.fitto recover aqrslot. -
S3methods registered onsandwich::vcovHC()andsandwich::vcovCL()
for heteroskedasticity-consistent (HC0–HC3) and cluster-robust covariance
matrices. Afterlibrary(sandwich),vcovHC(fit)andvcovCL(fit, ...)
matchsandwich::vcovHC.glm()/sandwich::vcovCL.glm()to
floating-point precision and work for sparse,big.matrix, and
in-memory fits. (Earlier development versions shipped local
vcovHC()/vcovCL()generics; those have been removed in favour of
registering on the canonical sandwich generics.) -
Sparse design matrices (
Matrix::dgCMatrix) are now supported directly
byfastglm()andfastglm_fit()for the LLT (method = 2) and LDLT
(method = 3) Cholesky paths. Other decompositions are rejected with
an informative error rather than silently densified. -
bigmemory::big.matrixinputs now stream the design matrix in
row-blocks ofFASTGLM_CHUNK_ROWSrows (default16384,
user-configurable via the environment variable). Filebacked
big.matrixobjects no longer have to be materialised in RAM. -
New top-level function
fastglm_streaming(chunk_callback, n_chunks, family, ...)for fitting GLMs on data sources that do not fit in
memory: Arrow datasets, Parquet files, DuckDB queries, CSV streamers,
and any other chunk-yielding iterator. The IRLS loop, step-halving,
and Cholesky solve all run in C++; the closure is invoked only to
deliver one row-block at a time.
Speed
-
For standard families (gaussian / binomial / poisson / Gamma /
inverse.gaussian on their common links) the per-iteration calls to
family$variance,mu.eta,linkinv, anddev.residsare now
evaluated in inline C++ rather than via R callbacks. Detection is
automatic; non-standard families fall back to the previous R-callback
path with no user-facing change. -
The IRLS solver pre-allocates its working buffers across iterations
and usesnoalias()writes throughoutsolve_wls(). Eigen's
parallelism is no longer disabled. -
On large
nthe combined effect is roughly a 1.5×–2× speed-up over
fastglm 0.0.4 on the same hardware, on top of the existing 3×–10×
advantage overstats::glm().
Documentation
-
New vignette
fastglm-overviewproviding a single high-level entry
point covering all of the package's functionality. -
New vignette
large-data-fastglmwalking through the three
large-data paths (sparse,big.matrix, streaming callback) end to
end, including the Arrow / Parquet recipe.
Internal
- New
tests/testthat/suite coveringvcov(), native vs callback
family dispatch, sparse fits,big.matrixstreaming, callback-based
streaming, and robust SE.
CRAN release 0.0.3
- Moves c++ header files to inst/install so source c++ code can be used externally
CRAN release 0.0.1
First release of the fastglm package to CRAN (version 0.01)