stepByStep: Analyse and compare stepwise model predictions

Description

This function builds a generalized linear model with forward stepwise inclusion of variables, using AIC as the selection criterion, and provides the values predicted at each step, as well as their correlation with the final model predictions.

Usage

stepByStep(data, sp.col, var.cols, family = binomial(link = "logit"), 
Favourability = FALSE, trace = 0, cor.method = "pearson")

Arguments

data

a data frame containing your target and predictor variables.

sp.col

index number of the column of data that contains the target variable.

var.cols

index numbers of the columns of data that contain the predictor variables.

family

argument to be passed to the glm function indicating the family (and error distribution) to use in modelling. The default is binomial distribution with logit link (for binary target variables).

Favourability

logical, whether to apply the Favourability function to remove the effect of prevalence from predicted probability (Real et al. 2006). Applicable only to binomial GLMs. Defaults to FALSE.

trace

argument to pass to the step function. If positive, information is printed during the stepwise procedure. Larger values may give more detailed information. The default is 0 (silent).

cor.method

character string to pass to function cor indicating which coefficient should be used for correlating predictions at each step with those of the final model. Can be "pearson" (the default), "kendall", or "spearman".

Value

This function returns a list of the following components:

predictions

a data frame with the model's fitted values at each step of the variable selection.

correlations

a numeric vector of the correlation between the predictions at each step and those of the final model.

variables

a character vector of the variables in the final model, named with the step at which each was included.

model

the resulting model object.

Details

Stepwise variable selection often includes more variables than would a model selected after examining all possible combinations of the variables (e.g. with packages MuMIn and glmulti). The stepByStep function can be useful to assess if a stepwise model with just the first few variables could already provide predictions very close to the final ones (see e.g. Fig. 3 in Munoz et al., 2005). It can also be useful to see which variables determine the more general trends in the model predictions, and which just add (local) additional nuances.

References

Munoz, A.R., Real R., BARBOSA A.M. & Vargas J.M. (2005) Modelling the distribution of Bonelli's Eagle in Spain: Implications for conservation planning. Diversity and Distributions 11: 477-486

Real R., Barbosa A.M. & Vargas J.M. (2006) Obtaining environmental favourability functions from logistic regression. Environmental and Ecological Statistics 13: 237-245.

Examples

Run this code

# NOT RUN {
data(rotif.env)

stepByStep(data = rotif.env, sp.col = 18, var.cols = 5:17, 
cor.method = "spearman")
 
stepByStep(data = rotif.env, sp.col = 18, var.cols = 5:17, 
cor.method = "spearman", Favourability = TRUE)
 
stepByStep(data = rotif.env, sp.col = 9, var.cols = c(5:8, 10:17), 
family = poisson)
# }

Run the code above in your browser using DataLab