pn_modselect_step: Backwards Stepwise Selection of Positive-Negative Richards $nlslist$ Models

Description

This function performs backawards stepwise model selection for nlsList models fitted using SSposnegRichards.

Usage

pn_modselect_step(x,

y,

grp,

forcemod = 0,

existing = FALSE,

penaliz = "1/sqrt(n)")

Arguments

a numeric vector of the primary predictor

a numeric vector of the response variable

grp

a factor of same length as x and y that distinguishes groups within the dataset

forcemod

optional numeric value to constrain model selection (see Details)

existing

optional logical value specifying whether some of the relevant models have already been fitted

penaliz

optional character value to determine how models are ranked (see Details)

Value

A data.frame containing statistics produced by extraF evaluations at each step, detailing the name of the general and best reduced model at each step. The overall best model evaluated by the end of the function is saved globally as $pn_bestmodel.lis$ The naming convention for models is a concatenation of 'richardsR', the modno and '.lis' (see SSposnegRichards).

Details

First, whether parameter M should be fixed (see SSposnegRichards) is determined by fitting models 12 and 20 and comparing their perfomance using extraF. If model 12 provides superior performance (variable values of M) then 16 models that estimate M are run (models 1 through 16), otherwise the models with fixed M are fitted (models 21 through 36). Model selection then proceeds by fitting the most general model (8-parameter, model 1 for variable M; 7-parameter, model 21 for fixed M). At each subsequent step a more reduced model is evaluated by creating nlsList models through removal of a single parameter from the decreasing section of the curve (i.e. RAsym, Rk, Ri or RM). This is repeated until all possible models with one less parameter have been fitted and then these models are then ranked by modified pooled residual standard error (see below) to determine which reduced parameter model provides the best fit. The best reduced parameter model is then compared with the more general model retained from the the previous step using the function extraF to determine whether the more general model provides significant improvement over the best reduced model. The most appropriate model is then retained to be used as the general model in the next step. This process continues for up to six steps (all steps will be attempted even if the general model provides better performance to allow for much more reduced models to also be evaluated). The most reduced model possible to evaluate in this function contains only parameters for the positive section of the curve (4-parameters for variable M, 3-parameters for fixed M). Fitting these nlsList models can be time-consuming (2-4 hours using the dataset posneg_data that encompasses 100 individuals) and if several of the relevant models are already fitted the option existing=TRUE can be used to avoid refitting models that already exist globally (note that a model object in which no grouping levels were successfully parameterized will be refitted, as will objects that are not of class nlsList). Specifying forcemod=3 will force model selection to only consider fixed M models and setting forcemod=4 will force model selection to consider models with varying values of M only. If fitting both models 12 and 20 fails, fixed M models will be used by default. Models are ranked by modified pooled residual square error. By default residual standard error is divided by the square root of sample size. This exponentially penalizes models for which very few grouping levels (individuals) are successfully parameterized (the few individuals that are parameterized in these models are fit unsuprisingly well) using a function based on the relationship between standard error and sample size. However, different users may have different preferences and these can be specified in the argument penaliz (which residual standard error is multiplied by). This argument must be a character value that contains the character n (sample size) and must be a valid right hand side (RHS) of a formula: e.g. 1*(n), (n)^2. It cannot contain more than one n but could be a custom function, e.g. FUN(n).

Examples

Run this code

#run model selection for posneg_data object (only first 3 group levels for example's sake)

data(posneg_data)

   subdata<-subset(posneg_data, as.numeric(row.names (posneg_data) ) < 40)

   modseltable <- pn_modselect_step(subdata$age, subdata$mass,

      subdata$id, existing = FALSE)



#fit nlsList model initially and then run model selection

#for posneg_data object when at least one model is already fit

#note forcemod is set to 3 so that models 21-36 are evaluated

#(only first 4 group levels for example's sake)

subdata<-subset(posneg_data, as.numeric(row.names (posneg_data) ) < 40)

   richardsR22.lis <- nlsList(mass ~ SSposnegRichards(age, Asym = Asym, K = K,

      Infl = Infl, RAsym = RAsym, Rk = Rk, Ri = Ri , modno = 22)

                        ,data = subdata)

   modseltable <- pn_modselect_step(subdata$age, subdata$mass,

      subdata$id, forcemod = 3, existing = TRUE)



#run model selection ranked by residual standard error*sample size

#(only first 4 group levels for example's sake)

modseltable <- pn_modselect_step(subdata$age, subdata$mass,

      subdata$id, penaliz='1*(n)', existing = TRUE)

Run the code above in your browser using DataLab