The function computes vectors of marginality and specialization
according to Rinnan & Lawler (2019) via Environmental Niche Factor
Analysis (ENFA) and phylogenetic imputation (Garland & Ives, 2000).
It takes a list of Simple Features (or sf) objects and a
phylogenetic tree to train ENFA and/or ENphylo models. Both model techniques
are calibrated and evaluated while accounting for phylogenetic uncertainty.
Calibrations are made on a random subset of the data under the bootstrap
cross-validation scheme. The predictive power of the different models is
estimated using five different evaluation metrics.
ENphylo_modeling(input_data, tree, input_mask, obs_col, time_col=NULL,
min_occ_enfa=30, boot_test_perc=20, boot_reps=10, swap.args= list(nsim=10,
si=0.2, si2=0.2), eval.args=list(eval_metric_for_imputation="AUC",
eval_threshold=0.7,output_options="best"),clust=0.5,output.dir)The function does not return the output into .GlobalEnv. Use
the function getENphylo_results to collect results from local
folders.
a list of sf::data.frame objects containing species
occurrence data in binary format (ones for presence, zero for background
points) along with the explanatory continuous variables to be used in ENFA
or ENphylo. Each element of the list must be named using the names of the
target species. Alternatively, ENFA model outputs generated through
ENphylo_modeling can be supplied as named elements of
input_data list.
an object of class phylo including all the species listed
in input_data. The tree needs not to be ultrametric or fully
dichotomous. Any species in the tree that do not match species in
input_data are automatically dropped from the tree.
a SpatRaster object. It represents the geographical
mask defining the spatial domain encompassing the background area enclosing
all the species in input_data.
character. Name of the input_data column containing the
vector of species occurrence data in binary format.
character. Name of the input_data column containing the
time intervals associated to each species presence and background point
(optional).
numeric. The minimum number of occurrence data required for a species to be modeled with ENFA.
numeric. Percentage of data (ranging between 0 and 100)
used to calibrate ENFA and/or ENphylo models within a bootstrap
cross-validation scheme. The remaining percentage
(100-boot_test_perc) will be used to evaluate model performances.
numeric. Number of evaluation runs performed within the
bootstrap cross-validation scheme to evaluate ENFA and/or ENphylo models. If
set to 0, models evaluation is skipped and the internal evaluation element
returns NULL.
list of ENphylo parameters. It includes:
nsim
= number of alternative phylogenies generated by altering topology and
branch lengths of the reference tree by means of
swapONE. nsim must be greater than or equal to
1 (see details);
si,si2 = arguments passed to
RRphylo::swapONE.
list of evaluation model parameters. It includes:
eval_threshold = the minimum evaluation score required to assess ENFA
and ENphylo performance. ENFA models having
eval_metric_for_imputation lower than eval_threshold are
compared to ENphylo models to keep the one fitting best. Additionally,
within ENphylo, models derived from the swapped trees
having eval_metric_for_imputation lower than eval_threshold
are excluded from the output;
output_options = the strategy adopted to
return ENphylo models results (see details). The viable options are:
"full", "weighted.mean", and "best".
numeric. The proportion of cores used to train ENFA and ENphylo
models. If NULL, parallel computing is disabled. It is set at 0.5 by
default.
the file path wherein ENphylo_modeling creates
"ENphylo_enfa_models" and "ENphylo_imputed_models" folders to store modeling
outputs (see details).
Alessandro Mondanaro, Mirko Di Febbraro, Silvia Castiglione, Carmela Serio, Marina Melchionna, Pasquale Raia
ENphylo_modeling automatically arranges input_data in a
suitable format to run ENFA or ENphylo. The internal call of the function is
"calibrated_enfa" for ENFA and "calibrated_imputed" for
ENphylo, respectively.
Phylogenetic uncertainty
The function does not work with nsim < 1 since one of the strongest
points of ENphylo_modeling is to test alternative phylogenies to
provide the most accurate reconstruction of species environmental
preferences. Similarly, setting nsim = 1 limits the power of the
function, as it will use the original tree without generating alternative
phylogenies.
Phylogenetic Imputation
ENphylo_modeling automatically switches from ENFA to ENphylo
algorithm for any species having less than min_occ_enfa occurrences
or ENFA model accuracy below eval_threshold. In this latter case, the
function performs both models and retains the one performing best according
to eval_metric_for_imputation. Phylogenetic imputation is allowed for
up to 30% of the species on the tree. If the number of species to impute
exceeds 30%, ENphylo_modeling automatically splits the original tree
into smaller subtrees, so that the maximum percentage of imputation is
observed. Each subtree is designed to impute phylogenetically distant
species and to retain species phylogenetically close to the taxa to be
imputed (so that imputation is robust). In this case, the function prints
the number of phylogenies used.
Outputs
If ENphylo_modeling runs the ENphylo algorithm, the outputs depend on
the strategy adopted by the user through the output_options argument.
If output_options="full", all CO matrices and evaluation metrics for
all the swapped trees tested are returned. Under
output_options="weighted.mean", the output consists of a subset of CO
matrices and evaluation metrics for those tree swapping iterations achieving
a predictive accuracy in terms of eval_metric_for_imputation above
eval_threshold. Finally, if output_options="best", a single CO
matrix and evaluation scores list corresponding to the most accurate swapped
tree is returned. If any tree swapping iterations under either "best"
or "weighted.mean" results in accuracy below the threshold, the
function automatically switches to "full" strategy.
Eventually, the function creates two new folders, "ENphylo_enfa_models" and
"ENphylo_imputed_models", in output.dir. In each of these folders, a
number of new named subfolders equal to the number of modeled species are
created. Therein, model outputs and background area are saved as
model_outputs.RData and study_area.tif, respectively.
model_outputs.RData includes a list of three elements, regardless of
whether ENFA or ENphylo is used:
$call a character specifying the algorithm used to model the species (i.e. ENFA or ENphylo).
$formatted data a list of
input data formatted to run either ENFA or ENphylo algorithms. Specifically,
the list reports: the presence data points ($input_ones),
the background points ($input_back),the name
of the columns associated to the arguments OBS_col and
time_col (if specified), the name of the column containing the cell
numbers (geoID_col), and the coordinates of presence data only
($one_coords).
$calibrated_model a list. The output objects are different depending on whether ENFA or ENphylo is used to model the species:
ENFA
$full_ model: a list containing marginality and specialization factors, the 'co' matrix, the number of significant axes, and all the other objects generated by applying ENFA on the entire occurrence dataset (see Rinnan et al. 2019 for additional details).
$evaluation: a matrix containing the evaluation scores of the ENFA model assessed by all possible evaluation metrics (i.e. Area Under the Curve (AUC), True Skill Statistic (TSS), Boyce Index (CBI), Sorensen Index, and Omission Rate (OMR)) for each model evaluations run.
ENphylo
$co: a list of the 'co' matrices of length equal to the number of
alternative phylogenies tested (i.e. nsim argument). The number of
'co' matrices also reflects the selected output_option strategy.
$evaluation: a data.frame containing the evaluation scores of ENphylo
model assessed by all possible evaluation metrics for each alternative
phylogeny. The output of this object depends on the strategy adopted by the
user through the output_options argument.Specifically, the function
internally selects the model (or models) with the highest evaluation score
according to the specified evaluation metric.
$output_options: a
character vector including the argument output_options and
eval_metric_for_imputation set to run the of ENphylo model.
Rinnan, D. S., & Lawler, J. (2019). Climate-niche factor analysis: a spatial approach to quantifying species vulnerability to climate change. Ecography, 42(9), 1494–1503. doi/full/10.1111/ecog.03937
Garland, T., & Ives, A. R. (2000). Using the past to predict the present: Confidence intervals for regression equations in phylogenetic comparative methods. American Naturalist, 155(3),346–364. doi.org/10.1086/303327
Mondanaro, A., Di Febbraro, M., Castiglione, S., Melchionna, M., Serio, C., Girardi, G., Blefiore, A.M., & Raia, P. (2023). ENphylo: A new method to model the distribution of extremely rare species. Methods in Ecology and Evolution, 14: 911-922. doi:10.1111/2041-210X.14066
getENphylo_results; ENphylo vignette
# \donttest{
library(ape)
library(terra)
library(sf)
library(RRgeo)
newwd<-tempdir()
# newwd<-"YOUR_DIRECTORY"
latesturl<-RRgeo:::get_latest_version("12734585")
curl::curl_download(url = paste0(latesturl,"/files/dat.Rda?download=1"),
destfile = file.path(newwd,"dat.Rda"), quiet = FALSE)
load(file.path(newwd,"dat.Rda"))
read.tree(system.file("exdata/Eucopdata_tree.txt", package="RRgeo"))->tree
tree$tip.label<-gsub("_"," ",tree$tip.label)
curl::curl_download(paste0(latesturl,"/files/X35kya.tif?download=1"),
destfile = file.path(newwd,"X35kya.tif"), quiet = FALSE)
rast(file.path(newwd,"X35kya.tif"))->map35
project(map35,st_crs(dat[[1]])$proj4string,res = 50000)->map
ENphylo_modeling(input_data=dat[c(1,11)],
tree=tree,
input_mask=map[[1]],
obs_col="OBS",
time_col="age",
min_occ_enfa=15,
boot_test_perc=20,
boot_reps=10,
swap.args=list(nsim=5,si=0.2,si2=0.2),
eval.args=list(eval_metric_for_imputation="AUC",
eval_threshold=0.7,
output_options="best"),
clust=NULL,
output.dir=newwd)
# }
Run the code above in your browser using DataLab