Skip to content

What is the reasoning behind the current implementation of testAll? #5

@bersbersbers

Description

@bersbersbers

I stumbled across the helpers.testAll implementation recently:
https://rdrr.io/cran/nplr/src/R/helpers.R#sym-.testAll

nplr/R/helpers.R

Lines 140 to 144 in 86296d2

test2 <- try(nlm(f=.sce, p=.initPars(x, y, 2), x=x, yobs=y, .weight, LPweight, .nPL2), silent=TRUE)
test3 <- try(nlm(f=.sce, p=.initPars(x, y, 3), x=x, yobs=y, .weight, LPweight, .nPL3), silent=TRUE)
test4 <- try(nlm(f=.sce, p=.initPars(x, y, 4), x=x, yobs=y, .weight, LPweight, .nPL4), silent=TRUE)
test5 <- try(nlm(f=.sce, p=.initPars(x, y, 5), x=x, yobs=y, .weight, LPweight, .nPL5), silent=TRUE)
scores <- sapply(list(test2, test3, test4, test5), function(t){

Basically, it fits one instance each of the 2-, 3-, 4-, 5-parameters model to the data and chooses the model that gives the best goodness of fit. That is, if the 4-parameters model has a better goodness of fit that the 5-parameters model, it returns 4.

But unless I am mistaken, each n-parameters model is a generalization of the (n-1)-parameters model, so except for convergence to different local minima resulting from poor initialization, there is no reason why any n-parameter model should ever perform worse than the (n-1)-parameter model. Fortunately, initialization can be fixed easily: in fact, if you fit any (n-1)-parameters model and then use its fitted parameters to initialize fitting of the n-parameters model, there is no way at all the n-parameters model can perform worse; in the worst case, it's as good, but generally it will perform better.
(Of course, the n-parameters model may be overfitting, but that should not be a concern for the fitting function that makes a choice based on SSD only. The user, by using npars="all", already implies that they consider all 2- to 5-parameters models valid solutions.)

So in summary, to my understanding, there is room for improvement of testAll. I'd propose to either

  1. use successive initialization, starting with the 2-parameters model, using that for a 3-parameters fit, and so forth, then return the 5-parameters model, or
  2. use the parameters returned by the testAll function to initialize one final fit with the 5-parameters model.
    [Note that approach 1 may give worse solutions than 2; but approach 2 is guaranteed to not be worse that what `nplr currently does, and usually better.]

In both cases, the result should always be the 5-parameters model, because nowhere in this world can the 2-/3-/4-parameters model be worse in terms of goodness of fit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions