mem.select: Selection of the best subset of spatial eigenvectors (MEM, Moran's Eigenvector Maps)

Description

mem.select computes the spatial eigenvectors (MEM) of the spatial weighting matrix (SWM) provided (listw) and optimizes the selection of a subset of MEM variables relative to response variable(s) stored in x. The optimization is done either by maximizing the adjusted R-squared (R2) of all (method = "global") or a subset (method = "FWD") of MEM variables or by minimizing the residual spatial autocorrelation (method = "MIR") (see details in Bauman et al. 2018a).

Usage

mem.select(x, listw, MEM.autocor = c("positive", "negative", "all"),
  method = c("FWD", "MIR", "global"), MEM.all = FALSE, nperm = 999,
  nperm.global = 9999, alpha = 0.05, verbose = FALSE, ...)

Arguments

A vector, matrix, or dataframe of response variable(s). The method = "MIR" is only implemented for a vector response. Note that x can also contain the residuals of a model when other (e.g., environmental) variables should be considered in the model.

listw

A spatial weighting matrix of class listw; can be created with functions of the package spdep, or with the user-friendly function listw.candidates. Note that, the function listw.candidates returns a list of listw and subselection by [[]] should be performed in this case (see Example)

MEM.autocor

Sign of the spatial eigenvectors to generate; "positive", "negative", or "all", for positively, negatively autocorrelated eigenvectors, or both, respectively; default is "positive"

method

Criterion to select the best subset of MEM variables. Either forward (default option), "MIR" (for univariate x only), or "global" (see Details)

MEM.all

A logical indicating if the complete set of MEM variables should be returned

nperm

Number of permutations to perform the tests in the selection procedure; Default is 999

nperm.global

Number of permutations to perform the tests in the global test; Default is 9999

alpha

Significance threshold value for the tests; Default is 0.05

verbose

If 'TRUE' more diagnostics are printed. The default setting is FALSE

…

Other parameters (for internal use with listw.select)

Value

The function returns a list with:

global.test: An object of class randtest containing the result of the global test associated to all MEM (adjusted R2 and p-value). If MEM.autocor = "all", a list with two elements (positive and negative) corresponding to the results of the global tests performed on positive and negative MEM respectively.
MEM.all: An object of class orthobasisSp containing the complete set of generated MEM variables (generated by scores.listw). Only returned if MEM.all = TRUE.
summary: A dataframe summarizing the results of the selection procedure
MEM.select: An object of class orthobasisSp containing the subset of significant MEM variables.

Details

The function provides three different methods to select a subset of MEM variables. For all methods, a global test is firstly performed. If MEM.autocor = "all", two global tests are performed and p-values are corrected for multiple comparison (Sidak correction).

If the MEM variables are to be further used in a model including actual predictors (e.g. environmental), then a subset of spatial eigenvectors needs to be selected before proceeding to further analyses to avoid model overfitting and/or a loss of statistical power to detect the contribution of the environment to the variability of the response data (Griffith 2003, Dray et al. 2006, Blanchet et al. 2008, Peres-Neto and Legendre 2010, Diniz-Filho et al. 2012). Although several eigenvector selection approaches have been proposed to select a best subset of eigenvectors, Bauman et al. (2018b) showed that two main procedures should be preferred, depending on the underlying objective: the forward selection with double stopping criterion (Blanchet et al. 2008; method = "FWD") or the minimization of the residual spatial autocorrelation (Griffith and Peres-Neto 2006; MIR selection in Bauman et al. 2018a,b, method = "MIR"). The most powerful and accurate selection method, in terms of R2 estimation, is the forward selection. This method should be preferred when the objective is to capture as accurately as possible the spatial patterns of x. If the objective is to optimize the detection of the spatial patterns in the residuals of a model of the response variable(s) against a set of environmental predictors, for instance, then x should be the model residuals, and method = "FWD". This allows optimizing the detection of residual spatial patterns once the effect of the environmental predictors has been removed. If however the objective is only to remove the spatial autocorrelation from the residuals of a model of x against a set of actual predictors (e.g. environmental) with a small number of spatial predictors, then accuracy is not as important and one should focus mainly on the number of spatial predictors (Bauman et al. 2018b). In this case, method = "MIR" is more adapted, as it has the advantage to maintain the standard errors of the actual predictor coefficients as low as possible. Note that method = "MIR" can only be used for a univariate x, as the Moran's I is a univariate index. If x is multivariate, then the best criterion is the forward selection (see Bauman et al. 2018b). A third option is to not perform any selection of MEM variables (method = "global"). This option may be interesting when the complete set of MEM variables will be used, like in Moran spectral randomizations (Wagner and Dray 2015, Bauman et al. 2018c) or when using smoothed MEM (Munoz 2009).

For method = "MIR", the global test consists in computing the Moran's I of x (e.g. residuals of the model of the response variable against environmental variables) and tests it by permutation (results stored in global.test). If the Moran's I is significant, the function performs a selection procedure that searches among the set of generated spatial predictors the one that best minimizes the value of the Moran's I. A model of x against the selected eigenvector is built, and the significance of the Moran's I of the model residuals is tested again. The procedure goes on until the Moran's I of the model residuals is not significant anymore, hence the name of Minimization of moran's I in the Residuals (MIR).

For method = "global" and method = "FWD", the global test consists in computing the adjusted global R2, that is, the R2 of the model of x against the whole set of generated MEM variables and tests it by permutation (results stored in global.test).

For method = "global", if the adjusted global R2 is significant, the functions returns the whole set of generated MEM variables in MEM.select.

For method = "FWD", if the adjusted global R2 is significant, the function performs a forward selection with double stopping criterion that searches among the set of generated spatial predictors the one that best maximizes the R2 of the model. The procedure is repeated untill one of the two stopping criterion is reached (see Blanchet et al. 2008). Note that in a few cases, the forward selection does not select any variable even though the global model is significant. This can happen for example when a single variable has a strong relation with the response variable(s), because the integration of the variable alone yields an adjusted R2 slightly higher than the global adjusted R2. In this case, we recommend checking that this is indeed the reason why the first selected variable was rejected, and rerun the analysis with a second stopping criterion equal to the global adjusted R2 plus a small amount allowing avoiding this issue (e.g. 5 done through the argument adjR2thresh of function forward.sel, until the solution is implemented in mem.select.

For the method = "FWD" and method = "MIR", the MEM selected by the procedure are returned in MEM.select and a summary of the results is provided in summary. If no MEM are selected, then MEM.select and summary are not returned.

References

Bauman D., Fortin M-J, Drouet T. and Dray S. (2018a) Optimizing the choice of a spatial weighting matrix in eigenvector-based methods. Ecology

Bauman D., Drouet T., Dray S. and Vleminckx J. (2018b) Disentangling good from bad practices in the selection of spatial or phylogenetic eigenvectors. Ecography, 41, 1--12

Bauman D., Vleminckx J., Hardy O., Drouet T. (2018c) Testing and interpreting the shared space-environment fraction in variation partitioning analyses of ecological data. Oikos

Blanchet G., Legendre P. and Borcard D. (2008) Forward selection of explanatory variables. Ecology, 89(9), 2623--2632

Diniz-Filho J.A.F., Bini L.M., Rangel T.F., Morales-Castilla I. et al. (2012) On the selection of phylogenetic eigenvectors for ecological analyses. Ecography, 35, 239--249

Dray S., Legendre P. and Peres-Neto P. R. (2006) Spatial modeling: a comprehensive framework for principal coordinate analysis of neighbor matrices (PCNM). Ecological Modelling, 196, 483--493

Griffith D. (2003) Spatial autocorrelation and spatial filtering: gaining understanding through theory and scientific visualization. Springer, Berlin

Griffith D. and Peres-Neto P. (2006) Spatial modeling in Ecology: the flexibility of eigenfunction spatial analyses. Ecology, 87, 2603--2613

Munoz, F. 2009. Distance-based eigenvector maps (DBEM) to analyse metapopulation structure with irregular sampling. Ecological Modelling, 220, 2683<U+2013>-2689

Peres-Neto P. and Legendre P. (2010) Estimating and controlling for spatial structure in the study of ecological communities. Global Ecology and Biogeography, 19, 174--184

Wagner H., Dray S. (2015). Generating spatially constrained null models for irregularly spaced data using Moran spectral randomization methods. Methods in Ecology and Evolution, 6, 1169--1178

Examples

Run this code

# NOT RUN {
if(require(vegan)){ 
# Illustration of the MIR selection on the oribatid mite data
# (Borcard et al. 1992, 1994 for details on the dataset):
# *******************************************************
# Community data (response matrix):
data(mite)
# We will compute the example on a single species:
spe <- mite[, 2]
# Environmental explanatory dataset:
data(mite.env)
# We only use two numerical explanatory variables:
env <- mite.env[, 1:2]
dim(env)
# Coordinates of the 70 sites:
data(mite.xy)
coord <- mite.xy
# We build the model we are interested in:
mod <- lm(spe ~ ., data = env)


# In order to avoid possible type I error rate inflation issues, we check 
# whether the model residuals are independent, and if they are spatially
# autocorrelated, we select a small subset of MEM variables to add to the
# model as covariables with the MIR selection:

# 1) We build a spatial weighting matrix based on Gabriel graph with a
# weighting function decreasing linearly with the distance:
w <- listw.candidates(coord, nb = "gab", weights = "flin")


# 2) We test the spatial autocorrelation of the model residuals and, if
# necessary, select a subset of spatial predictors:
y <- residuals(mod)
MEM <- mem.select(x = y, listw = w[[1]], method = "MIR", MEM.autocor = "positive",
         nperm = 999, alpha = 0.05)
dim(MEM$MEM.select)
# The residuals of the model presented spatial autocorrelation. The selection
# of MEM variables is thus performed to remove residual autocorrelation.

# 3) We can reconstruct our model adding the selected MEM variable as covariables:
env2 <- cbind(env, MEM$MEM.select)
mod_complete <- lm(spe ~ ., data = env2)
summary(mod_complete)$coefficient[, 1]   # Coefficient estimates
summary(mod_complete)$coefficient[, 2]   # Standard errors
}


# }

Run the code above in your browser using DataLab