When the minsplit is too large, empty splits are created. These lead to a segmentation fault in the native C function gini() due to an unchecked NULL pointer access.
The fix is likely to skip empty splits/have some null pointer check before reading "*y[i]". Exact location is:
209 for (i = 0; rtot > edge; i++) {
210 j = (int) *y[i] - 1; // <-- CRASHES HERE
211 rwt -= aprior[j] * wt[i];
212 lwt += aprior[j] * wt[i];
213 rtot--;
214 ltot++;
215 right[j] -= wt[i];
216 left[j] += wt[i];
When I checked it with a debugger:
(gdb) run
Starting program: /users/user42/R-4.5.0/bin/Rscript /users/user42/exp/adaboostB.R /users/user42/data/adaboostBoutputs_1/default/crashes/id:000000,sig:11,src:000000,time:2914037,execs:268,op:havoc,rep:2
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
process 445733 is executing new program: /usr/bin/bash
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after fork from child process 445736]
process 445733 is executing new program: /users/user42/R-4.5.0/bin/exec/R
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after vfork from child process 445740]
[Detaching after vfork from child process 445742]
Loading required package: rpart
Loading required package: caret
Loading required package: ggplot2
[New Thread 0x7ffff415c640 (LWP 445744)]
Loading required package: lattice
[Detaching after vfork from child process 445745]
[Detaching after vfork from child process 445747]
Loading required package: foreach
Loading required package: doParallel
Loading required package: iterators
Loading required package: parallel
Warning messages:
1: In rgl.init(initValue, onlyNULL) : RGL: unable to open X11 display
2: 'rgl.init' failed, will use the null device.
See '?rgl.useNULL' for ways to avoid this warning.
Thread 1 "R" received signal SIGSEGV, Segmentation fault.
0x00007ffff481265c in gini (n=<optimized out>, y=0x55555ee67310, x=0x55555ea79c90, numcat=<optimized out>, edge=-2147483648, improve=0x7fffffffaaa8, split=0x7fffffffaab0, csplit=0x5555607399f0, my_risk=<optimized out>, wt=0x55555ebaae40) at gini.c:210
210 j = (int) *y[i] - 1;
(gdb)
A minimal example to reproduce the bug is:
library(adabag)
library(datasets)
data(iris)
set.seed(100)
samplesIris <- sample(1:150, 100)
train_data <- iris[samplesIris, ]
test_data <- iris[-samplesIris, ]
nIter <- 100 # Number of boosting iterations
maxdepth <- 5.086914 # Max depth for decision trees
minsplit <- 6.908775e+9 # Minimum number of observations in a node
model <- boosting(Species ~ ., data = train_data, mfinal = nIter,
control = rpart.control(maxdepth = maxdepth, minsplit= minsplit))
System:
user42@node0:~/bug14$ ./../R-4.5.0/bin/Rscript --version
Rscript (R) version 4.5.0 (2025-04-11)
user42@node0:~/bug14$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
and rpart:
wget https://cran.r-project.org/src/contrib/rpart_4.1.24.tar.gz
When the minsplit is too large, empty splits are created. These lead to a segmentation fault in the native C function gini() due to an unchecked NULL pointer access.
The fix is likely to skip empty splits/have some null pointer check before reading "*y[i]". Exact location is:
When I checked it with a debugger:
A minimal example to reproduce the bug is:
System:
and rpart: