ENMevaluate : Tuning and evaluation of ENMs with Maxent

Description

ENMevaluate automatically executes Maxent (Phillips et al. 2006; Phillips and Dudik 2008) across a range of settings, returning a data.frame of evaluation metrics to aid in identifying settings that balance model fit and predictive ability. Since version 0.3.0, the default function uses the maxnet function in the maxnet package (Phillips et al. 2017) to implement the Maxent algorithm (see notes).

Usage

ENMevaluate(occ, env, bg.coords = NULL, occ.grp = NULL, 
		bg.grp = NULL, RMvalues = seq(0.5, 4, 0.5), 
		fc = c("L", "LQ", "H", "LQH", "LQHP", "LQHPT"),
		categoricals = NULL, n.bg = 10000, method = NULL, 
		algorithm = 'maxnet', overlap = FALSE, 
		aggregation.factor = c(2, 2), kfolds = NA, 
		bin.output = FALSE, clamp = TRUE, rasterPreds = TRUE, 
		parallel = FALSE, numCores = NULL, progbar = TRUE, 
		updateProgress = FALSE, ...)
tuning(occ, env, bg.coords, occ.grp, bg.grp, method, 
	algorithm, args, args.lab, categoricals, 
	aggregation.factor, kfolds, bin.output, clamp, alg,
	rasterPreds, parallel, numCores, progbar, 
	updateProgress, userArgs)

Arguments

occ

Two-column matrix or data.frame of longitude and latitude (in that order) of occurrence localities.

env

RasterStack of model predictor variables (environmental layers).

bg.coords

Two-column matrix or data.frame of longitude and latitude (in that order) of background localities (required for 'user' method).

occ.grp

Vector of bins of occurrence localities (required for 'user' method).

bg.grp

Vector of bins of background localities (required for 'user' method).

RMvalues

Vector of (non-negative) values to use for the regularization multiplier.

Character vector of feature class combinations to be included in analysis.

algorithm

Character vector. Use 'maxnet' to use the maxnet package [default] or 'maxent.jar' to use the dismo package and the 'maxent.jar' Java program. See details for more information on these different implementations.

alg

categoricals

Vector indicating which (if any) of the input environmental layers are categorical.

n.bg

The number of random background localities to draw from the study extent.

method

Character string designating the method used for data partitioning. Choices are: "jackknife", "randomkfold", "user", "block", "checkerboard1", "checkerboard2". See details and get.evaluation.bins for more information.

overlap

logical; If TRUE, provides pairwise metric of niche overlap (see details and calc.niche.overlap).

aggregation.factor

List giving the factor by which the original input grid should be aggregated for checkerboard partitioning methods (see details and get.evaluation.bins).

kfolds

Number of bins to use in the k-fold random method of data partitioning.

bin.output

logical; If TRUE, appends evaluations metrics for each evaluation bin to results table (i.e., in addition to the average values across bins).

args

Arguments to pass to Maxent that are generated by the make.args function

args.lab

Character labels describing feature classes and regularization multiplier values for Maxent runs provided by the make.args function.

clamp

logical; If TRUE, 'clamping' is used (see Maxent documentation and tutorial for more details).

rasterPreds

logical; If TRUE, the predict function from dismo is used to predict each full model across the extent of the input environmental variables. Note that AICc (and associated values) are NOT calculated if rasterPreds=FALSE because these calculations require the predicted surfaces. However, setting to FALSE can significantly reduce run time.

parallel

logical; If TRUE, parallel processing is used to execute tuning function.

numCores

numeric; indicates the number of cores to use if running in parallel. If parallel=TRUE and this is not specified, the total number of available cores are used.

progbar

logical; used internally.

updateProgress

logical; used internally.

...

character vector; use this to pass other arguments (e.g., prevalence) to the `maxent` call. Note that not all options are functional or relevant.

userArgs

character vector; use this to pass other arguments (e.g., prevalence) to the `maxent` call. Note that not all options are functional or relevant.

Value

An object of class ENMevaluation with named slots:

@results data.frame of evaluation metrics. If bin.output=TRUE, evaluation metrics calculated separately for each evaluation bin are included in addition to the averages and corrected variances (see corrected.var) across k bins. Note that the names of some columns changed as of Version 0.3.0.

@predictions RasterStack of full model predictions with each layer named as: fc_RM (e.g., L_1). This will be an empty RasterStack if the rasterPreds=FALSE.

@models List of objects of class "MaxEnt" from the dismo package. Each of these entries include slots for lambda values and the original Maxent results table. See Maxent documentation for more information.

@partition.method character vector with the method used for data partitioning.

@occ.pts data.frame of the latitude/longitude of input occurrence localities.

@occ.grp vector identifying the bin for each occurrence locality.

@bg.pts data.frame of the latitude/longitude of input background localities.

@bg.grp vector identifying the bin for each background locality.

@overlap matrix of pairwise niche overlap (blank if overlap = FALSE).

Details

ENMevaluate is the primary function for general use in the ENMeval package; the tuning function is used internally.

Since version 0.3.0, the default ENMevaluate runs the the Maxent algorithm by calling the maxnet package (Phillips et al. 2017) instead of the previous implementation (still available) that relies on the 'maxent.jar' Java program called by the dismo package. This choice is controlled by the argument algorithm='maxnet'. A major advantage of this change is that it removes the reliance on Java and the rJava package, which is great but can sometimes cause confusing problems on different computers. There are some differences between the 'maxnet' and 'maxent.jar' algorithms that may lead to slight numeric differences in the results (at least when hinge feature classes are used). See Phillips et al. (2017) and Phillips (2017) for more details. Additionally, the 'maxnet' algorithm does not provide information on variable importance (from the var.importance() function) because of differences in the underlying models. Users can still choose to use the 'maxent.jar' implementation by setting algorithm='maxent.jar' in the ENMevaluate function (also see note below). Our team has done some fairly extensive testing to ensure this implementation gives the expected results but the maxnet implementation is relatively new (at the time of writing this) and we encourage users to scrutinize their results.

Maxent settings: In the current default implementation of Maxent, the combination of feature classes (fcs) allowed depends on the number of occurrence localities, and the value for the regularization multiplier (RM) is 1.0. ENMevaluate provides an automated way to execute ecological niche models in Maxent across a user-specified range of (RM) values and (fc) combinations, regardless of sample size. Acceptable values for the fc argument include: L=linear, Q=quadratic, P=product, T=threshold, and H=hinge (see Maxent help documentation, Phillips et al. (2006), Phillips and Dudik (2008), Elith et al. (2011), and Merow et al. (2013) for additional details on RM and fcs). Categorical feature classes (C) are specified by the categoricals argument.

Methods for partitioning data: ENMevaluate includes six methods to partition occurrence and background localities into bins for training and testing ('jackknife', 'randomkfold', 'user', 'block', 'checkerboard1', 'checkerboard2'). The jackknife method is a special case of k-fold cross validation where the number of folds (k) is equal to the number of occurrence localities (n) in the dataset. The randomkfold method partitions occurrence localities randomly into a user-specified number of (k) bins - this is equivalent to the method of k-fold cross validation currently provided by Maxent. The user method enables users to define bins a priori. For this method, the user is required to provide background coordinates (bg.coords) and bin designations for both occurrence localities (occ.grp) and background localities (bg.grp). The block method partitions the data into four bins according to the lines of latitude and longitude that divide the occurrence localities into bins of as equal number as possible. The checkerboard1 (and checkerboard2) methods partition data into two (or four) bins based on one (or two) checkerboard patterns with grain size defined as one (or two) aggregation factor(s) of the original environmental layers. Although the checkerboard1 (and checkerboard2) methods are designed to partition occurrence localities into two (and four) evaluation bins, they may give fewer bins depending on the location of occurrence localities with respect to the checkerboard grid(s) (e.g., all records happen to fall in the "black" squares). A warning is given if the number of bins is < 4 for the checkerboard2 method, and an error is given if all localities fall in a single evaluation bin. Additional details can be found in get.evaluation.bins.

Evaluation metrics: Four evaluation metrics are calculated using the partitioned dataset, and one additional metric is provided based on the full dataset. ENMevaluate uses the same background localities and evaluation bin designations for each of the k iterations (for each unique combination of RM and fc) to facilitate valid comparisons among model settings.

avg.test.AUC is the area under the curve of the receiver operating characteristic plot made based on the testing data (i.e., AUCtest), averaged across k bins. In each iteration, as currently implemented, the AUCtest value is calculated with respect to the full set of background localities to enable comparisons across the k iterations (Radosavljevic and Anderson 2014). As a relative measure for a given study species and region, high values of avg.test.AUC are associated with the degree to which a model can successfully discriminate occurrence from background localities. This rank-based non-parametric metric, however, does not reveal the model goodness-of-fit (Lobo et al. 2008; Peterson et al. 2011).

To quantify the degree of overfitting, ENMevaluate calculates three metrics. The first is the difference between training and testing AUC, averaged across k bins (avg.diff.AUC) (Warren and Seifert 2011). avg.diff.AUC is expected to be high for models overfit to the training data. ENMevaluate also calculates two threshold-dependent omission rates that quantify overfitting when compared with the omission rate expected by the threshold employed: the proportion of testing localities with Maxent output values lower than the value associated with (1) the training locality with the lowest value (i.e., the minimum training presence, MTP; = 0 percent training omission) (avg.test.orMTP) and (2) the value that excludes the 10 percent of training localities with the lowest predicted suitability (avg.test.or10pct) (Pearson et al. 2007). ENMevaluate uses corrected.var to calculate the variance for each of these metrics across k bins (i.e., variances are corrected for non-independence of cross-validation iterations; see Shcheglovitova and Anderson 2013). The value of these metrics for each of the individual k bins is returned if bin.output = TRUE.

Based on the unpartitioned (full) dataset, ENMevaluate uses calc.aicc to calculate the AICc value for each model run and provides delta.AIC, AICc weights, as well as the number of parameters for each model (Warren and Seifert 2011). Note that AICc (and associated values) are NOT calculated if rasterPreds=FALSE because these calculations require the predicted surfaces. The AUCtrain value for the full model is also returned (train.AUC).

To quantify how resulting predictions differ in geographic space depending on the settings used, ENMevaluate includes an option to compute pairwise niche overlap between all pairs of full models (i.e., using the unpartitioned dataset) with Schoeners D statistic (Schoener 1968; Warren et al. 2009).

References

Elith, J., Phillips, S. J., Hastie, T., Dudik, M., Chee, Y. E., and Yates, C. J. (2011) A statistical explanation of MaxEnt for ecologists. Diversity and Distributions, 17: 43-57.

Hijmans, R. J., Phillips, S., Leathwick, J. and Elith, J. (2011) dismo package for R. Available online at: https://cran.r-project.org/package=dismo.

Lobo, J. M., Jimenez-Valverde, A., and Real, R. (2008) AUC: A misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography, 17: 145-151.

Muscarella, R., Galante, P.J., Soley-Guardia, M., Boria, R.A., Kass, J., Uriarte, M. and Anderson, R.P. (2014) ENMeval: An R package for conducting spatially independent evaluations and estimating optimal model complexity for ecological niche models. Methods in Ecology and Evolution, 5: 1198-1205.

Pearson, R. G., Raxworthy, C. J., Nakamura, M. and Peterson, A. T. 2007. Predicting species distributions from small numbers of occurrence records: a test case using cryptic geckos in Madagascar. Journal of Biogeography, 34: 102-117.

Peterson, A. T., Soberon, J., Pearson, R. G., Anderson, R. P., Martinez-Meyer, E., Nakamura, M. and Araujo, M. B. (2011) Ecological Niches and Geographic Distributions. Monographs in Population Biology, 49. Princeton University Press, Princeton, NJ.

Phillips, S. J. 2017. maxnet package for R. Available online at: https://CRAN.R-project.org/package=maxnet.

Phillips, S. J., Anderson, R. P., Dud<U+00ED>k, M., Schapire, R. E. and Blair, M. E. 2017. Opening the black box: an open-source release of Maxent. Ecography, 40: 887<U+2013>893.

Phillips, S. J., Anderson, R. P., and Schapire, R. E. (2006) Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190: 231-259.

Phillips, S. J. and Dudik, M. (2008) Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography, 31: 161-175.

Merow, C., Smith, M., and Silander, J. A. (2013) A practical guide to Maxent: what it does, and why inputs and settings matter. Ecography, 36: 1-12.

Radosavljevic, A. and Anderson, R. P. 2014. Making better Maxent models of species distributions: complexity, overfitting and evaluation. Journal of Biogeography, 41: 629-643.

Schoener, T. W. (1968) The Anolis lizards of Bimini: resource partitioning in a complex fauna. Ecology, 49: 704-726.

Shcheglovitova, M. and Anderson, R. P. (2013) Estimating optimal complexity for ecological niche models: A jackknife approach for species with small sample sizes. Ecological Modelling, 269: 9-17.

Warren, D. L., Glor, R. E., Turelli, M. and Funk, D. (2009) Environmental niche equivalency versus conservatism: quantitative approaches to niche evolution. Evolution, 62: 2868-2883; Erratum: Evolution, 65: 1215.

Warren, D.L. and Seifert, S.N. (2011) Ecological niche modeling in Maxent: the importance of model complexity and the performance of model selection criteria. Ecological Applications, 21: 335-342.

Examples

Run this code

# NOT RUN {
require(raster)

### Simulated data environmental covariates
set.seed(1)
r1 <- raster(matrix(nrow=50, ncol=50, data=runif(10000, 0, 25)))
r2 <- raster(matrix(nrow=50, ncol=50, data=rep(1:100, each=100), byrow=TRUE))
r3 <- raster(matrix(nrow=50, ncol=50, data=rep(1:100, each=100)))
r4 <- raster(matrix(nrow=50, ncol=50, data=c(rep(1,1000),rep(2,500)),byrow=TRUE))
values(r4) <- as.factor(values(r4))
env <- stack(r1,r2,r3,r4)

### Simulate occurrence localities
nocc <- 50
x <- (rpois(nocc, 2) + abs(rnorm(nocc)))/11
y <- runif(nocc, 0, .99)
occ <- cbind(x,y)

# }
# NOT RUN {
### This call gives the results loaded below
enmeval_results <- ENMevaluate(occ, env, method="block", n.bg=500, 
                                categoricals=4, algorithm='maxent.jar')
# }
# NOT RUN {
data(enmeval_results)
enmeval_results

### See table of evaluation metrics
enmeval_results@results

### Plot prediction with lowest AICc
plot(enmeval_results@predictions[[which (enmeval_results@results$delta.AICc == 0) ]])
points(enmeval_results@occ.pts, pch=21, bg=enmeval_results@occ.grp)

### Niche overlap statistics between model predictions
enmeval_results@overlap
	
# }

Run the code above in your browser using DataLab