Skip to content

Memory safety bug in rpart::rpart() (gini.c): null pointer dereference when minsplit exceeds dataset size #72

@karineek

Description

@karineek

When the minsplit is too large, empty splits are created. These lead to a segmentation fault in the native C function gini() due to an unchecked NULL pointer access.

The fix is likely to skip empty splits/have some null pointer check before reading "*y[i]". Exact location is:

   209      for (i = 0; rtot > edge; i++) {
   210          j = (int) *y[i] - 1;    // <-- CRASHES HERE
   211          rwt -= aprior[j] * wt[i];
   212          lwt += aprior[j] * wt[i];
   213          rtot--;
   214          ltot++;
   215          right[j] -= wt[i];
   216          left[j] += wt[i];

When I checked it with a debugger:

(gdb) run
Starting program: /users/user42/R-4.5.0/bin/Rscript /users/user42/exp/adaboostB.R /users/user42/data/adaboostBoutputs_1/default/crashes/id:000000,sig:11,src:000000,time:2914037,execs:268,op:havoc,rep:2
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
process 445733 is executing new program: /usr/bin/bash
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after fork from child process 445736]
process 445733 is executing new program: /users/user42/R-4.5.0/bin/exec/R
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after vfork from child process 445740]
[Detaching after vfork from child process 445742]
Loading required package: rpart
Loading required package: caret
Loading required package: ggplot2
[New Thread 0x7ffff415c640 (LWP 445744)]
Loading required package: lattice
[Detaching after vfork from child process 445745]
[Detaching after vfork from child process 445747]
Loading required package: foreach
Loading required package: doParallel
Loading required package: iterators
Loading required package: parallel
Warning messages:
1: In rgl.init(initValue, onlyNULL) : RGL: unable to open X11 display
2: 'rgl.init' failed, will use the null device.
See '?rgl.useNULL' for ways to avoid this warning. 

Thread 1 "R" received signal SIGSEGV, Segmentation fault.
0x00007ffff481265c in gini (n=<optimized out>, y=0x55555ee67310, x=0x55555ea79c90, numcat=<optimized out>, edge=-2147483648, improve=0x7fffffffaaa8, split=0x7fffffffaab0, csplit=0x5555607399f0, my_risk=<optimized out>, wt=0x55555ebaae40) at gini.c:210
210             j = (int) *y[i] - 1;
(gdb) 

A minimal example to reproduce the bug is:

library(adabag)
library(datasets)
data(iris)
set.seed(100)
samplesIris <- sample(1:150, 100)
train_data <- iris[samplesIris, ]
test_data <- iris[-samplesIris, ]
nIter <- 100               # Number of boosting iterations
maxdepth <- 5.086914       # Max depth for decision trees
minsplit <- 6.908775e+9  # Minimum number of observations in a node
model <- boosting(Species ~ ., data = train_data, mfinal = nIter,
                                    control = rpart.control(maxdepth = maxdepth, minsplit= minsplit))

System:

user42@node0:~/bug14$ ./../R-4.5.0/bin/Rscript --version
Rscript (R) version 4.5.0 (2025-04-11)
user42@node0:~/bug14$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

and rpart:

wget https://cran.r-project.org/src/contrib/rpart_4.1.24.tar.gz

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions