selectEV: Select parsimonious set of explanatory variables.

Description

selectEV selects the parsimonious set of explanatory variables (EVs) which best explains variation in a given response variable (RV). Each EV can be represented by 1 or more derived variables (see deriveVars). The function uses a process of forward selection based on comparison of nested models by the F-test. An EV is selected for inclusion when, during nested model comparison, it accounts for a significant amount of remaining variation, under the alpha value specified by the user.

Usage

selectEV(data, dvdata, alpha = 0.01, interaction = FALSE, dir = NULL,
  trainmax = NULL)

Arguments

data

Data frame containing the response variable in the first column and explanatory variables in subsequent columns. The response variable should represent presence/background data, coded as: 1/NA. See readData.

dvdata

List of data frames, with each data frame containing selected derived variables for a given explanatory variable (e.g. the first item in the list returned by selectDVforEV).

alpha

Alpha-level used in F-test comparison of models. Default is 0.01.

interaction

Logical. Allows interaction terms between pairs of EVs. Default is FALSE.

dir

Directory to which files will be written during subset selection of explanatory variables. Defaults to the working directory.

trainmax

Integer. Maximum number of uninformed background points to be used to train the models. May be used to reduce computation time for data sets with very large numbers of points. Default is no maximum. See Details for more information.

Value

List of 2 (3):

A list of data frames, with one data frame for each selected EV. This item is recommended as input for dvdata in plotResp.
A data frame showing the trail of forward selection of individual EVs (and interaction terms if necessary).
(If trainmax reduces the number of uninformed background points) a new data object. See details.

Details

The F-statistic that selectEV uses for nested model comparison is calculated using equation 59 in Halvorsen (2013). See Halvorsen et al. (2015) for a more detailed explanation of the forward selection procedure. When interaction = TRUE, the forward selection procedure selects a parsimonious group of individual EVs first, and then tests interactions between EVs included in the model afterwards. Therefore, interactions are only explored between terms which are individually explain a significant amount of variation. When interaction = FALSE, interactions are not considered. If trainmax reduces the number of uninformed background points in the training data, a new data object is returned as part of the function output. This data object shows which of the uninformed background points were randomly selected, and should be used together with the selected EVs in plotResp if plotting single-effect model response. Explanatory variables should be uniquely named, and the names must not contain spaces, underscores, or colons. Underscores and colons are reserved to denote derived variables and interaction terms repectively.

References

Halvorsen, R. (2013). A strict maximum likelihood explanation of MaxEnt, and some implications for distribution modelling. Sommerfeltia, 36, 1-132. Halvorsen, R., Mazzoni, S., Bryn, A., & Bakkestuen, V. (2015). Opportunities for improved distribution modelling practice via a strict maximum likelihood interpretation of MaxEnt. Ecography, 38(2), 172-183.

Examples

Run this code

## Not run: ------------------------------------
# selectedevs <- selectEV(dat, selectedderiveddat, alpha = 0.0001,
#    dir = "D:/path/to/modeling/directory", interaction = TRUE)
# 
# # From vignette:
# grasslandEVselect <- selectEV(grasslandPO, grasslandDVselect[[1]], alpha = 0.001,
#    interaction = TRUE)
# summary(grasslandDVselect[[1]])
# length(grasslandDVselect[[1]])
# summary(grasslandEVselect[[1]])
# length(grasslandEVselect[[1]])
# plot(grasslandEVselect$selection$round, grasslandEVselect$selection$addedFVA)
## ---------------------------------------------

Run the code above in your browser using DataLab