stepByStep: Analyse and compare stepwise model predictions

Description

This function builds a generalized linear model with forward stepwise inclusion of variables, using AIC as the selection criterion, and provides the values predicted at each step, as well as their correlation with the final model predictions.

Usage

stepByStep(data, sp.col, var.cols, family = binomial(link = "logit"), 
Favourability = FALSE, trace = 0, direction = "forward", k = 2, 
cor.method = "pearson")

Value

This function returns a list of the following components:

predictions: a data frame with the model's fitted values at each step of the variable selection.
correlations: a numeric vector of the correlation between the predictions at each step and those of the final model.
variables: a character vector of the variables in the final model, named with the step at which each was included.
model: the resulting model object.

Arguments

data: a data frame containing your target and predictor variables.
sp.col: name or index number of the column of 'data' that contains the response variable.
var.cols: names or index numbers of the columns of 'data' that contain the predictor variables.
family: argument to be passed to the glm function indicating the family (and error distribution) to use in modelling. The default is binomial distribution with logit link (for binary response variables).
Favourability: logical, whether to apply the Favourability function to remove the effect of prevalence from predicted probability (Real et al. 2006). Applicable only to binomial GLMs. Defaults to FALSE.
trace: argument to pass to the step function. If positive, information is printed during the stepwise procedure. Larger values may give more detailed information. The default is 0 (silent).
direction: argument to pass to the step function. Implementation contributed by Alba Estrada. Can be "forward" (the default, for back-compatibility with former versions of 'stepByStep'), "backward", or "both".
k: argument to pass to the step function indicating the multiple of the number of degrees of freedom used for the penalty. The default is 2, which yields the original AIC. You can use larger values for a more stringent selection-- e.g., for a critical p-value of 0.05, use k = qchisq(0.05, 1, lower.tail = F).
cor.method: character string to pass to cor indicating which coefficient to use for correlating predictions at each step with those of the final model. Can be "pearson" (the default), "kendall", or "spearman".

Author

A. Marcia Barbosa

Details

Stepwise variable selection often includes more variables than would a model selected after examining all possible combinations of the variables (e.g. with package MuMIn or glmulti). The 'stepByStep' function can be useful to assess if a stepwise model with just the first few variables could already provide predictions very close to the final ones (see e.g. Fig. 3 in Munoz et al., 2005). It can also be useful to see which variables determine the more general trends in the model predictions, and which variables just provide additional (local) nuances.

References

Munoz, A.R., Real R., BARBOSA A.M. & Vargas J.M. (2005) Modelling the distribution of Bonelli's Eagle in Spain: Implications for conservation planning. Diversity and Distributions 11: 477-486

Real R., Barbosa A.M. & Vargas J.M. (2006) Obtaining environmental favourability functions from logistic regression. Environmental and Ecological Statistics 13: 237-245.

Examples

Run this code

data(rotif.env)

stepByStep(data = rotif.env, sp.col = 18, var.cols = 5:17, 
cor.method = "spearman")
 
stepByStep(data = rotif.env, sp.col = 18, var.cols = 5:17, 
cor.method = "spearman", Favourability = TRUE)
 
stepByStep(data = rotif.env, sp.col = 9, var.cols = c(5:8, 10:17), 
family = poisson)

Run the code above in your browser using DataLab