What is the reasoning behind the current implementation of testAll?

I stumbled across the `helpers.testAll` implementation recently:
https://rdrr.io/cran/nplr/src/R/helpers.R#sym-.testAll
https://github.com/fredcommo/nplr/blob/86296d20f263948d9fc92f4cde2824edb4ce0fe3/R/helpers.R#L140-L144

Basically, it fits one instance each of the 2-, 3-, 4-, 5-parameters model to the data and chooses the model that gives the best goodness of fit. That is, if the 4-parameters model has a better goodness of fit that the 5-parameters model, it returns 4.

But unless I am mistaken, each n-parameters model is a generalization of the (n-1)-parameters model, so except for convergence to different local minima resulting from poor initialization, there is no reason why any n-parameter model should ever perform worse than the (n-1)-parameter model. Fortunately, initialization can be fixed easily: in fact, if you fit any (n-1)-parameters model and then use its fitted parameters to initialize fitting of the n-parameters model, there is no way at all the n-parameters model can perform worse; in the worst case, it's as good, but generally it will perform better.
(Of course, the n-parameters model may be overfitting, but that should not be a concern for the fitting function that makes a choice based on SSD only. The user, by using npars="all", already implies that they consider all 2- to 5-parameters models valid solutions.)

So in summary, to my understanding, there is room for improvement of testAll. I'd propose to either 
1. use successive initialization, starting with the 2-parameters model, using that for a 3-parameters fit, and so forth, then return the 5-parameters model, or
2. use the parameters returned by the testAll function to initialize one final fit with the 5-parameters model.
[Note that approach 1 *may* give worse solutions than 2; but approach 2 is guaranteed to not be worse that what `nplr currently does, and usually better.]

In both cases, the result should always be the 5-parameters model, because nowhere in this world can the 2-/3-/4-parameters model be worse in terms of goodness of fit.

	test2 <- try(nlm(f=.sce, p=.initPars(x, y, 2), x=x, yobs=y, .weight, LPweight, .nPL2), silent=TRUE)
	test3 <- try(nlm(f=.sce, p=.initPars(x, y, 3), x=x, yobs=y, .weight, LPweight, .nPL3), silent=TRUE)
	test4 <- try(nlm(f=.sce, p=.initPars(x, y, 4), x=x, yobs=y, .weight, LPweight, .nPL4), silent=TRUE)
	test5 <- try(nlm(f=.sce, p=.initPars(x, y, 5), x=x, yobs=y, .weight, LPweight, .nPL5), silent=TRUE)
	scores <- sapply(list(test2, test3, test4, test5), function(t){

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the reasoning behind the current implementation of testAll? #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

What is the reasoning behind the current implementation of testAll? #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions