Skip to content

ENH: Use Repeated Stratified K-Fold as base CV class #573

@andersbogsnes

Description

@andersbogsnes

Is your feature request related to a problem? Please describe.
When doing Stratified K-fold, we can get non-generalizable results based on the random seed. The random seed controls the split of data when doing K-fold, so we can end up with folds that don't accurately represent generalized data.

Describe the solution you'd like
By making RepeatedStratifiedKFold to be the default CV class, we repeat the StratifiedKFold n times, choosing a new random seed for the split each time. This ensures that we control for "unlucky" draws when assessing model generalizability. The downside is potentially longer training times, as we now double CV time for the same folds, given repeat=2

Describe alternatives you've considered
We can also choose to do nothing - the user can pass any CV object they want

Additional context
We try to implement best practice out of the box - in general we favour precision over training time, though there is a balance

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions