-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Is your feature request related to a problem? Please describe.
When doing Stratified K-fold, we can get non-generalizable results based on the random seed. The random seed controls the split of data when doing K-fold, so we can end up with folds that don't accurately represent generalized data.
Describe the solution you'd like
By making RepeatedStratifiedKFold to be the default CV class, we repeat the StratifiedKFold n times, choosing a new random seed for the split each time. This ensures that we control for "unlucky" draws when assessing model generalizability. The downside is potentially longer training times, as we now double CV time for the same folds, given repeat=2
Describe alternatives you've considered
We can also choose to do nothing - the user can pass any CV object they want
Additional context
We try to implement best practice out of the box - in general we favour precision over training time, though there is a balance