This function computes metrics that quantify how species and sites relate to clusters (bioregions or chorotypes). Depending on the type of clustering, metrics can measure how species are distributed across bioregions (site clusters), how sites relate to chorotypes (species clusters), or both.
site_species_metrics(
bioregionalization,
bioregion_metrics = c("Specificity", "NSpecificity", "Fidelity", "IndVal", "NIndVal",
"Rho"),
bioregionalization_metrics = "P",
data_type = "auto",
cluster_on = "site",
comat,
similarity = NULL,
include_cluster = FALSE,
index = names(similarity)[3],
verbose = TRUE
)A list containing one or more data.frame elements, depending on the
selected metrics and clustering type:
When sites are clustered (cluster_on = "site"):
species_bioregions: Metrics for each species x bioregion combination (e.g., Specificity, IndVal). One row per species x bioregion pair.
species_bioregionalization: Summary metrics for each species across all bioregions (e.g., Participation coefficient). One row per species.
site_bioregions: Metrics for each site x bioregion combination (e.g., MeanSim, Richness). One row per site x bioregion pair.
site_bioregionalization: Summary metrics for each site (e.g., Silhouette). One row per site.
When species are clustered (cluster_on = "species"):
site_chorotypes: Metrics for each site x chorotype combination (e.g., Specificity, IndVal). One row per site x chorotype pair.
site_chorological: Summary metrics for each site across all chorotypes (e.g., Participation coefficient). One row per site.
Note that if bioregionalization contains multiple partitions
(i.e., if dim(bioregionalization$clusters) > 2), a nested list will be
returned, with one sublist per partition.
A bioregion.clusters object.
A character vector or a single character string
specifying the metrics to compute for each cluster. Available metrics depend
on the type of clustering (see arg cluster_on):
When sites are clustered into bioregions (default case):
species-level metrics include "Specificity", "NSpecificity",
"Fidelity", "IndVal", "NIndVal", "Rho", and "CoreTerms".
Site-level metrics include "Richness", "Rich_Endemics",
"Prop_Endemics", "MeanSim", and "SdSim".
When species are clustered into chorotypes (e.g., bipartite
network clustering): site-level metrics include "Specificity",
"NSpecificity", "Fidelity", "IndVal", "NIndVal", "Rho",
and "CoreTerms".
Use "all" to compute all available metrics. See Details for metric
descriptions.
A character vector or a single character
string specifying summary metrics computed across all clusters. These
metrics assess how an entity (species or site) is distributed across the
entire bioregionalization, rather than relative to each individual cluster:
"P": Participation coefficient measuring how evenly a species or
site is distributed across clusters (0 = restricted to one cluster,
1 = evenly spread).
"Silhouette": How well a site fits its assigned bioregion compared
to the nearest alternative bioregion (requires similarity data).
Use "all" to compute all available metrics.
A character string specifying whether metrics should be
computed based on presence/absence ("occurrence") or abundance values
("abundance"). This affects how Specificity, Fidelity, IndVal, Rho and
CoreTerms are calculated:
"auto" (default): Automatically detected from input data (bioregionalization and/or comat).
"occurrence": Metrics based on presence/absence only.
"abundance": Metrics weighted by abundance values.
"both": Compute both versions of the metrics.
A character string specifying what was clustered in the
bioregionalization, which determines what types of metrics can be computed:
"site" (default): Sites were clustered into bioregions. Metrics
describe how each species is distributed across bioregions.
"species": Species were clustered into chorotypes. Metrics describe
how each site relates to chorotypes. Only available when species have
been assigned to clusters (e.g., bipartite network clustering).
"both": Compute metrics for both perspectives. Only available
when both sites and species have cluster assignments.
A site-species matrix with sites as rows and species as
columns. Values can be occurrence (1/0) or abundance. Required for most
metrics.
A site-by-site similarity object from similarity() or
dissimilarity_to_similarity(). Required only for similarity-based metrics
("MeanSim", "SdSim", "Silhouette").
A boolean indicating whether to add an Assigned
column in the output, marking TRUE for rows where the site belongs to the
bioregion being evaluated. Useful for quickly identifying a site's own
bioregion. Default is FALSE.
The name or number of the column to use as similarity.
By default, the third column name of similarity is used.
A boolean indicating whether to
display progress messages. Set to FALSE to suppress these messages.
Maxime Lenormand (maxime.lenormand@inrae.fr)
Boris Leroy (leroy.boris@gmail.com)
Pierre Denelle (pierre.denelle@gmail.com)
This function computes metrics that characterize the relationship between species, sites, and clusters. The available metrics depend on whether you clustered sites (into bioregions) or species (into chorotypes).
Bioregions are clusters of sites with similar species composition.
Chorotypes are clusters of species with similar distributions.
In general, the package is designed to cluster sites into bioregions. However, it is possible to group species into clusters. We call these species clusters 'chorotypes', following conceptual definitions in the biogeographical literature, to avoid any confusion in the calculation of metrics.
In some cases, such as bipartite network clustering, both species and sites
receive the same clusters. We maintain the name distinction in the
calculation of metrics - but remember that in this case
BIOREGION IDs = CHOROTYPE IDs.
The cluster_on argument determines
which perspective to use.
cluster_on = "site" or cluster_on = "both") ---
Species-per-bioregion metrics quantify how each species is distributed across bioregions.
These metrics are derived from three core terms (see the online vignette for a visual diagram):
n_sb: Number of sites in bioregion b where species s is present
n_s: Total number of sites in which species s is present.
n_b: Total number of sites in bioregion b.
Abundance version of these core terms can also be calculated when
data_type = "abundance" (or data_type = "auto" and
bioregionalization was based on abundance):
w_sb: Sum of abundances of species s in sites of bioregion b.
w_s: Total abundance of species s.
w_b: Total abundance of all species present in sites of bioregion b.
The species-per-bioregion metrics are (click on metric names to access formulas):
Specificity: Fraction of a species' occurrences found in a given bioregion (De Cáceres & Legendre 2009). A value of 1 means the species occurs only in that bioregion.
NSpecificity: Normalized specificity that accounts for differences in bioregion size (De Cáceres & Legendre 2009).
Fidelity: Fraction of sites in a bioregion where the species occurs (De Cáceres & Legendre 2009). A value of 1 means the species is present in all sites of that bioregion.
IndVal: Indicator Value = Specificity × Fidelity (De Cáceres & Legendre 2009). High values identify species that are both restricted to and frequent within a bioregion.
NIndVal: Normalized IndVal accounting for bioregion size (De Cáceres & Legendre 2009).
Rho: Standardized contribution index comparing observed vs. expected co-occurrence under random association (Lenormand 2019).
CoreTerms: Raw counts (n, n_b, n_s, n_sb) for custom calculations.
These metrics can be found in the output slot species_bioregions.
Site-per-bioregion metrics characterize sites relative to bioregions:
Richness: Number of species in the site.
Rich_Endemics: Number of species in the site that are endemic to one bioregion.
Prop_Endemics: Proportion of endemic species in the site.
MeanSim: Mean similarity of a site to all sites in each bioregion.
SdSim: Standard deviation of similarity values.
These metrics can be found in the output slot site_bioregions.
Summary metrics across the whole bioregionalization:
These metrics summarize how an entity (species or site) is distributed across all clusters, rather than in relation to each individual cluster.
Species-level summary metric:
P
(Participation): Evenness of species distribution across bioregions
(Denelle et al. 2020). Found in output slot species_bioregionalization.
Site-level summary metric:
Silhouette:
How well a site fits its assigned bioregion vs. the nearest alternative
(Rousseeuw 1987). Found in output slot site_bioregionalization.
cluster_on = "species" or cluster_on = "both") ---
Site-per-chorotype metrics quantify how each site relates to species clusters (chorotypes).
The same metrics as above (Specificity, Fidelity, IndVal, etc.) can be computed, but their interpretation is inverted. These metrics are based on the following core terms:
n_gc: Number of species belonging to chorotype c that are present in site g.
n_g: Total number of species present in site g.
n_c: Total number of species belonging to chorotype c.
Abundance version of these core terms can also be calculated when
data_type = "abundance" (or data_type = "auto" and
bioregionalization was based on abundance).
Their interpretation changes, for example:
Specificity: Fraction of a site's species belonging to a chorotype.
Fidelity: Fraction of a chorotype's species present in the site.
IndVal: Indicator value for site-chorotype associations.
P: Evenness of sites across chorotypes
De Cáceres M & Legendre P (2009) Associations between species and groups of sites: indices and statistical inference. Ecology 90, 3566--3574.
Denelle P, Violle C & Munoz F (2020) Generalist plants are more competitive and more functionally similar to each other than specialist plants: insights from network analyses. Journal of Biogeography 47, 1922–-1933.
Lenormand M, Papuga G, Argagnon O, Soubeyrand M, Alleaume S & Luque S (2019) Biogeographical network analysis of plant species distribution in the Mediterranean region. Ecology and Evolution 9, 237--250.
Rousseeuw PJ (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20, 53--65.
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a5_2_summary_metrics.html.
Associated functions: bioregion_metrics bioregionalization_metrics
data(fishmat)
fishsim <- similarity(fishmat, metric = "Jaccard")
bioregionalization <- hclu_hierarclust(similarity_to_dissimilarity(fishsim),
index = "Jaccard",
method = "average",
randomize = TRUE,
optimal_tree_method = "best",
n_clust = c(1,2,3),
verbose = FALSE)
ind <- site_species_metrics(bioregionalization = bioregionalization,
bioregion_metrics = "all",
bioregionalization_metrics = "all",
data_type = "auto",
cluster_on = "site",
comat = fishmat,
similarity = fishsim,
include_cluster = TRUE,
index = 3,
verbose = TRUE)
Run the code above in your browser using DataLab