selectDVforEV: Select parsimonious sets of derived variables.

Description

For each explanatory variable (EV), selectDVforEV selects the parsimonious set of derived variables (DV) which best explains variation in a given response variable. The function uses a process of forward selection based on comparison of nested models by the F-test. A DV is selected for inclusion when, during nested model comparison, it accounts for a significant amount of remaining variation, under the alpha value specified by the user.

Usage

selectDVforEV(data, dvdata, alpha = 0.01, dir = NULL, trainmax = NULL)

Arguments

data

Data frame containing the response variable in the first column and explanatory variables in subsequent columns. The response variable should represent presence/background data, coded as: 1/NA. See readData.

dvdata

List of data frames, with each data frame containing derived variables for a given explanatory variable (e.g. the first item in the list returned by deriveVars).

alpha

Alpha-level used in F-test comparison of models. Default is 0.01.

dir

Directory to which files will be written during subset selection of derived variables. Defaults to the working directory.

trainmax

Integer. Maximum number of uninformed background points to be used to train the models. May be used to reduce computation time for data sets with very large numbers of points. Default is no maximum. See Details for more information.

Value

List of 2 (3):

A list of data frames, with each data frame containing selected DVs for a given EV. This item is recommended as input for dvdata in selectEV.
A list of data frames, where each data frame shows the trail of forward selection of DVs for a given EV.
(If trainmax reduces the number of uninformed background points) a new data object. See details.

Details

The F-statistic that selectDVforEV uses for nested model comparison is calculated using equation 59 in Halvorsen (2013). See Halvorsen et al. (2015) for a more detailed explanation of the forward selection procedure. If the derived variables were created using deriveVars, the same response variable should be used in selectDVforEV, as the deviation and spline transformations produced by deriveVars are RV-specific. If trainmax reduces the number of uninformed background points in the training data, a new data object is returned as part of the function output. This data object shows which of the uninformed background points were randomly selected, and should be used together with the selected DVs in selectEV during continued model selection. Explanatory variables should be uniquely named, and the names must not contain spaces, underscores, or colons. Underscores and colons are reserved to denote derived variables and interaction terms repectively.

References

Halvorsen, R. (2013). A strict maximum likelihood explanation of MaxEnt, and some implications for distribution modelling. Sommerfeltia, 36, 1-132. Halvorsen, R., Mazzoni, S., Bryn, A., & Bakkestuen, V. (2015). Opportunities for improved distribution modelling practice via a strict maximum likelihood interpretation of MaxEnt. Ecography, 38(2), 172-183.

Examples

Run this code

## Not run: ------------------------------------
# selecteddvs <- selectDVforEV(dat, deriveddat, alpha = 0.0001,
#    dir = "D:/path/to/modeling/directory")
# 
# # From vignette:
# grasslandDVselect <- selectDVforEV(grasslandPO, grasslandDVs[[1]], alpha = 0.001)
# summary(grasslandDVs$EVDV)
# sum(sapply(grasslandDVs$EVDV, length))
# summary(grasslandDVselect$selectedDV)
# sum(sapply(grasslandDVselect$selectedDV, length))
## ---------------------------------------------

Run the code above in your browser using DataLab