This function builds a generalized linear model with forward stepwise inclusion of variables, using AIC as the selection criterion, and provides the values predicted at each step, as well as their correlation with the final model predictions.
stepByStep(data, sp.col, var.cols, family = binomial(link = "logit"),
Favourability = FALSE, trace = 0, cor.method = "pearson")
This function returns a list of the following components:
a data frame with the model's fitted values at each step of the variable selection.
a numeric vector of the correlation between the predictions at each step and those of the final model.
a character vector of the variables in the final model, named with the step at which each was included.
the resulting model object.
a data frame containing your target and predictor variables.
index number of the column of 'data' that contains the target variable.
index numbers of the columns of 'data' that contain the predictor variables.
argument to be passed to the glm
function indicating the family (and error distribution) to use in modelling. The default is binomial distribution with logit link (for binary target variables).
logical, whether to apply the Fav
ourability function to remove the effect of prevalence from predicted probability (Real et al. 2006). Applicable only to binomial GLMs. Defaults to FALSE.
argument to pass to the step
function. If positive, information is printed during the stepwise procedure. Larger values may give more detailed information. The default is 0 (silent).
character string to pass to function cor
indicating which coefficient should be used for correlating predictions at each step with those of the final model. Can be "pearson" (the default), "kendall", or "spearman".
A. Marcia Barbosa
Stepwise variable selection often includes more variables than would a model selected after examining all possible combinations of the variables (e.g. with packages MuMIn and glmulti). The 'stepByStep' function can be useful to assess if a stepwise model with just the first few variables could already provide predictions very close to the final ones (see e.g. Fig. 3 in Munoz et al., 2005). It can also be useful to see which variables determine the more general trends in the model predictions, and which just add (local) additional nuances.
Munoz, A.R., Real R., BARBOSA A.M. & Vargas J.M. (2005) Modelling the distribution of Bonelli's Eagle in Spain: Implications for conservation planning. Diversity and Distributions 11: 477-486
Real R., Barbosa A.M. & Vargas J.M. (2006) Obtaining environmental favourability functions from logistic regression. Environmental and Ecological Statistics 13: 237-245.
data(rotif.env)
stepByStep(data = rotif.env, sp.col = 18, var.cols = 5:17,
cor.method = "spearman")
stepByStep(data = rotif.env, sp.col = 18, var.cols = 5:17,
cor.method = "spearman", Favourability = TRUE)
stepByStep(data = rotif.env, sp.col = 9, var.cols = c(5:8, 10:17),
family = poisson)
Run the code above in your browser using DataLab