-
Notifications
You must be signed in to change notification settings - Fork 9
Description
I stumbled across the helpers.testAll implementation recently:
https://rdrr.io/cran/nplr/src/R/helpers.R#sym-.testAll
Lines 140 to 144 in 86296d2
| test2 <- try(nlm(f=.sce, p=.initPars(x, y, 2), x=x, yobs=y, .weight, LPweight, .nPL2), silent=TRUE) | |
| test3 <- try(nlm(f=.sce, p=.initPars(x, y, 3), x=x, yobs=y, .weight, LPweight, .nPL3), silent=TRUE) | |
| test4 <- try(nlm(f=.sce, p=.initPars(x, y, 4), x=x, yobs=y, .weight, LPweight, .nPL4), silent=TRUE) | |
| test5 <- try(nlm(f=.sce, p=.initPars(x, y, 5), x=x, yobs=y, .weight, LPweight, .nPL5), silent=TRUE) | |
| scores <- sapply(list(test2, test3, test4, test5), function(t){ |
Basically, it fits one instance each of the 2-, 3-, 4-, 5-parameters model to the data and chooses the model that gives the best goodness of fit. That is, if the 4-parameters model has a better goodness of fit that the 5-parameters model, it returns 4.
But unless I am mistaken, each n-parameters model is a generalization of the (n-1)-parameters model, so except for convergence to different local minima resulting from poor initialization, there is no reason why any n-parameter model should ever perform worse than the (n-1)-parameter model. Fortunately, initialization can be fixed easily: in fact, if you fit any (n-1)-parameters model and then use its fitted parameters to initialize fitting of the n-parameters model, there is no way at all the n-parameters model can perform worse; in the worst case, it's as good, but generally it will perform better.
(Of course, the n-parameters model may be overfitting, but that should not be a concern for the fitting function that makes a choice based on SSD only. The user, by using npars="all", already implies that they consider all 2- to 5-parameters models valid solutions.)
So in summary, to my understanding, there is room for improvement of testAll. I'd propose to either
- use successive initialization, starting with the 2-parameters model, using that for a 3-parameters fit, and so forth, then return the 5-parameters model, or
- use the parameters returned by the testAll function to initialize one final fit with the 5-parameters model.
[Note that approach 1 may give worse solutions than 2; but approach 2 is guaranteed to not be worse that what `nplr currently does, and usually better.]
In both cases, the result should always be the 5-parameters model, because nowhere in this world can the 2-/3-/4-parameters model be worse in terms of goodness of fit.