sar_average: Fit a multimodel averaged SAR curve

Description

Construct a multimodel averaged species-area relationship curve using information criterion weights and up to twenty SAR models.

Usage

sar_average(obj = c("power",
  "powerR","epm1","epm2","p1","p2","loga","koba",
  "monod","negexpo","chapman","weibull3","asymp",
  "ratio","gompertz","weibull4","betap","logistic", "heleg", "linear"), data =
  NULL, crit = "Info", normaTest = "none", homoTest = "none", homoCor =
  "spearman", neg_check = FALSE, alpha_normtest = 0.05, alpha_homotest =
  0.05, grid_start = "partial", grid_n = NULL, confInt = FALSE, ciN = 100,
  verb = TRUE, display = TRUE)

Value

If no models have been successfully fitted and passed the model checks, an error is returned. If only a single model is successfully fitted, this individual model fit object (of class 'sars') is returned, given no model averaging can be undertaken. If more than two models have been successfully fitted and passed the model checks, a list of class "multi" and class "sars" with two elements. The first element ('mmi') contains the fitted values of the multimodel sar curve. The second element ('details') is a list with the following components:

mod_names Names of the models that were successfully fitted and passed any model check
fits A fit_collection object containing the successful model fits
ic The information criterion selected
norm_test The residual normality test selected
homo_test The residual homogeneity test selected
alpha_norm_test The alpha value used in the residual normality test
alpha_homo_test The alpha value used in the residual homogeneity test
ics The information criterion values (e.g. AIC values) of the model fits
delta_ics The delta information criterion values
weights_ics The information criterion weights of each model fit
n_points Number of data points
n_mods The number of successfully fitted models
no_fit Names of the models which could not be fitted or did not pass model checks
convergence Logical value indicating whether optim model convergence code = 0, for each model

The summary.sars function returns a more useful summary of the model fit results, and the plot.multi plots the multimodel curve.

Arguments

obj: Either a vector of model names or a fit_collection object created using sar_multi. If a vector of names is provided, sar_average first calls sar_multi before generating the averaged multimodel curve.
data: A dataset in the form of a dataframe with two columns: the first with island/site areas, and the second with the species richness of each island/site. If obj is a fit_collection object, data should be NULL.
crit: The criterion used to compare models and compute the model weights. The default crit = "Info" switches to AIC or AICc depending on the number of data points in the dataset. AIC (crit = "AIC") or AICc (crit = "AICc") can be chosen regardless of the sample size. For BIC, use crit ="Bayes".
normaTest: The test used to test the normality of the residuals of each model. Can be any of "lillie" (Lilliefors Kolmogorov-Smirnov test), "shapiro" (Shapiro-Wilk test of normality), "kolmo" (Kolmogorov-Smirnov test), or "none" (no residuals normality test is undertaken; the default).
homoTest: The test used to check for homogeneity of the residuals of each model. Can be any of "cor.fitted" (a correlation of the squared residuals with the model fitted values), "cor.area" (a correlation of the squared residuals with the area values), or "none" (no residuals homogeneity test is undertaken; the default).
homoCor: The correlation test to be used when homoTest != "none". Can be any of "spearman" (the default), "pearson", or "kendall".
neg_check: Whether or not a check should be undertaken to flag any models that predict negative richness values.
alpha_normtest: The alpha value used in the residual normality test (default = 0.05, i.e. any test with a P value < 0.05 is flagged as failing the test).
alpha_homotest: The alpha value used in the residual homogeneity test (default = 0.05, i.e. any test with a P value < 0.05 is flagged as failing the test).
grid_start: Should a grid search procedure be implemented to test multiple starting parameter values. Can be one of 'none', 'partial' or 'exhaustive' The default is set to 'partial'.
grid_n: If grid_start = exhaustive, the number of points sampled in the starting parameter space (see details).
confInt: A logical argument specifying whether confidence intervals should be calculated for the multimodel curve using bootstrapping.
ciN: The number of bootstrap samples to be drawn to calculate the confidence intervals (if confInt == TRUE).
verb: verbose - Whether or not to print certain warnings (default: verb == TRUE)
display: Show the model fitting output and related messages. (default: display == TRUE).

Details

The multimodel SAR curve is constructed using information criterion weights (see Burnham & Anderson, 2002; Guilhaumon et al. 2010). If obj is a vector of n model names the function fits the n models to the dataset provided using the sar_multi function. A dataset must have four or more datapoints to fit the multimodel curve. If any models cannot be fitted they are removed from the multimodel SAR. If obj is a fit_collection object (created using the sar_multi function), any model fits in the collection which are NA are removed. In addition, if any other model checks have been selected (i.e. residual normality and heterogeneity tests, and checks for negative predicted richness values), these are undertaken and any model that fails the selected test(s) is removed from the multimodel SAR. The order of the additional checks inside the function is (if all are turned on): normality of residuals, homogeneity of residuals, and a check for negative fitted values. Once a model fails one test it is removed and thus is not available for further tests. Thus, a model may fail multiple tests but the returned warning will only provide information on a single test. We have now changed the defaults so that no checks are undertaken, so it is up to the user to select any checks if appropriate.

The resultant models are then used to construct the multimodel SAR curve. For each model in turn, the model fitted values are multiplied by the information criterion weight of that model, and the resultant values are summed across all models (Burnham & Anderson, 2002). Confidence intervals can be calculated (using confInt) around the multimodel averaged curve using the bootstrap procedure outlined in Guilhaumon et al (2010).The procedure transforms the residuals from the individual model fits and occasionally NAs / Inf values can be produced - in these cases, the model is removed from the confidence interval calculation (but not the multimodel curve itself). There is also a constraint within the procedure to remove any transformed residuals that result in negative richness values. When several SAR models are used, when grid_start is turned on and when the number of bootstraps (ciN) is large, generating the confidence intervals can take a (very) long time. Parallel processing will be added to future versions.

Choosing starting parameter values for non-linear regression optimisation algorithms is not always straight forward, depending on the data at hand. In the package, we use various approaches to choose default starting parameters. However, we also use a grid search process which creates a large array of different possible starting parameter values (within certain bounds) and then randomly selects a proportion of these to test. There are three options for the grid_start argument to control this process. The default (grid_start = "partial") randomly samples 500 different sets of starting parameter values for each model, adds these to the model's default starting values and tests all of these. A more comprehensive set of starting parameter estimates can be used (grid_start = "exhaustive") - this option allows the user to choose the number of starting parameter sets to be tested (using the grid_n argument) and includes a range of additional starting parameter estimates, e.g. very small values and particular values we have found to be useful for individual models. Using grid_start = "exhaustive" in combination with a large grid_n can be very time consuming; however, we would recommend it as it makes it more likely that the optimal model fit will be found, particularly for the more complex models. This is particularly true if any of the model fits does not converge, returns a singular gradient at parameter estimates, or the plot of the model fit does not look optimum. The grid start procedure can also be turned off (grid_start = "none"), meaning just the default starting parameter estimates are used. Note that grid_start has been disabled for a small number of models (e.g. Weibull 3 par.). See the vignette for more information. Remember that, as grid_start has a random component, when grid_start != "none", you can get slightly different results each time you fit a model or run sar_average.

Even with grid_start, occasionally a model fit will be able to be fitted and pass the model fitting checks (e.g. residual normality) but the resulting fit is nonsensical (e.g. a horizontal line with intercept at zero). Thus, it can be useful to plot the resultant 'multi' object to check the individual model fits. To re-run the sar_average function without a particular model, simply remove it from the obj argument.

The sar_models() function can be used to bring up a list of the 20 model names. display_sars_models() generates a table of the 20 models with model information.

References

Burnham, K. P., & Anderson, D. R. (2002). Model selection and multi-model inference: a practical information-theoretic approach (2nd ed.). New-York: Springer.

Guilhaumon, F., Mouillot, D., & Gimenez, O. (2010). mmSAR: an R-package for multimodel species-area relationship inference. Ecography, 33, 420-424.

Matthews, T. J., K. A. Triantis, R. J. Whittaker, & F. Guilhaumon. (2019). sars: an R package for fitting, evaluating and comparing species–area relationship models. Ecography, 42, 1446–55.

Examples

Run this code

data(galap)
#attempt to construct a multimodel SAR curve using all twenty sar models
#using no grid_start just for speed here (not recommended generally)
fit <- sar_average(data = galap, grid_start = "none")
summary(fit)
plot(fit)

# construct a multimodel SAR curve using a fit_collection object
ff <- sar_multi(galap, obj = c("power", "loga", "monod", "weibull3"))
fit2 <- sar_average(obj = ff, data = NULL)
summary(fit2)

if (FALSE) { 
# construct a multimodel SAR curve using a more exhaustive set of starting 
# parameter values 
fit3 <- sar_average(data = galap, grid_start = "exhaustive", grid_n = 1000) 
}

Run the code above in your browser using DataLab