Skip to content

[bug] rpart Incorrectly Labelling Splits #71

@emstruong

Description

@emstruong

Hello,

As you can see in this reprex, I suspect that it seems that when a predictor is ordered and potentially when its names are numbers that the reported splits are incorrect.

In rpart(y ~ x_ordered), x_ordered=100 is appearing in nodes that I don't think it should be?

library(rpart)
library(rpart.plot)
library(tree)
x <- rep(c(2,5,10, 25, 50, 100), each = 10)

x_ordered <- as.ordered(x)
y<- (x*100)

rpart(y ~ x)
#> n= 60 
#> 
#> node), split, n, deviance, yval
#>       * denotes terminal node
#> 
#> 1) root 60 711000000  3200.0000  
#>   2) x< 75 50 156120000  1840.0000  
#>     4) x< 37.5 40  31300000  1050.0000  
#>       8) x< 17.5 30   3266667   566.6667 *
#>       9) x>=17.5 10         0  2500.0000 *
#>     5) x>=37.5 10         0  5000.0000 *
#>   3) x>=75 10         0 10000.0000 *

tree(y ~ x)
#> node), split, n, deviance, yval
#>       * denotes terminal node
#> 
#> 1) root 60 711000000  3200.0  
#>   2) x < 75 50 156100000  1840.0  
#>     4) x < 37.5 40  31300000  1050.0  
#>       8) x < 17.5 30   3267000   566.7 *
#>       9) x > 17.5 10         0  2500.0 *
#>     5) x > 37.5 10         0  5000.0 *
#>   3) x > 75 10         0 10000.0 *

# rpart(y ~ x) |> rpart.plot::prp(type = 5)

rpart(y ~ x_ordered)
#> n= 60 
#> 
#> node), split, n, deviance, yval
#>       * denotes terminal node
#> 
#> 1) root 60 711000000  3200.0000  
#>   2) x_ordered=2,5,10,25,50 50 156120000  1840.0000  
#>     4) x_ordered=2,5,10,25 40  31300000  1050.0000  
#>       8) x_ordered=2,5,10 30   3266667   566.6667 *
#>       9) x_ordered=25,50,100 10         0  2500.0000 *
#>     5) x_ordered=50,100 10         0  5000.0000 *
#>   3) x_ordered=100 10         0 10000.0000 *

tree(y ~ x_ordered)
#> node), split, n, deviance, yval
#>       * denotes terminal node
#> 
#> 1) root 60 711000000  3200.0  
#>   2) x_ordered: 2,5,10,25,50 50 156100000  1840.0  
#>     4) x_ordered: 2,5,10,25 40  31300000  1050.0  
#>       8) x_ordered: 2,5,10 30   3267000   566.7 *
#>       9) x_ordered: 25 10         0  2500.0 *
#>     5) x_ordered: 50 10         0  5000.0 *
#>   3) x_ordered: 100 10         0 10000.0 *

# rpart(y ~ x_ordered) |> rpart.plot::prp(type = 5)

Created on 2025-04-18 with reprex v2.1.1

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.0 (2024-04-24)
#>  os       Ubuntu 22.04.5 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language en_CA:en
#>  collate  en_CA.UTF-8
#>  ctype    en_CA.UTF-8
#>  tz       America/Toronto
#>  date     2025-04-18
#>  pandoc   3.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/x86_64/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.3   2024-06-21 [1] CRAN (R 4.4.1)
#>  digest        0.6.37  2024-08-19 [1] CRAN (R 4.4.1)
#>  evaluate      1.0.1   2024-10-10 [1] CRAN (R 4.4.1)
#>  fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.4.1)
#>  fs            1.6.5   2024-10-30 [1] CRAN (R 4.4.1)
#>  glue          1.8.0   2024-09-30 [1] CRAN (R 4.4.1)
#>  htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.4.1)
#>  knitr         1.49    2024-11-08 [1] CRAN (R 4.4.1)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.4.1)
#>  reprex        2.1.1   2024-07-06 [1] RSPM
#>  rlang         1.1.4   2024-06-04 [1] CRAN (R 4.4.1)
#>  rmarkdown     2.29    2024-11-04 [1] CRAN (R 4.4.1)
#>  rpart       * 4.1.23  2023-12-05 [2] CRAN (R 4.4.0)
#>  rpart.plot  * 3.1.2   2024-02-26 [1] RSPM
#>  rstudioapi    0.17.1  2024-10-22 [1] RSPM (R 4.4.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] RSPM (R 4.4.0)
#>  tree        * 1.0-44  2024-12-11 [1] RSPM
#>  withr         3.0.2   2024-10-28 [1] CRAN (R 4.4.1)
#>  xfun          0.49    2024-10-31 [1] CRAN (R 4.4.1)
#>  yaml          2.3.10  2024-07-26 [1] CRAN (R 4.4.1)
#> 
#>  [1] /home/mstruong/R/x86_64-pc-linux-gnu-library/4.4
#>  [2] /opt/R/4.4.0/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions