As k-means algorithms use Euclidean distance to estimate clusters, the input covariates should be quantitative variables. Since
variables with wider ranges of values might dominate the clusters and bias the environmental clustering (Hastie et al., 2009),
all the input rasters are first standardized within the function. This is done either by normalizing based on subtracting the
mean and dividing by the standard deviation of each raster (the default) or optionally by standardizing using linear scaling
to constrain all raster values between 0 and 1.
By default, the clustering is done in the raster space. In this approach the clusters will be consistent throughout the region
and across species (in the same region). However, this may result in a cluster(s) that covers none of the species records (the spatial location of response samples),
espcially when species data is not dispersed throughout the region or the number of clusters (k or folds) is high. In this
case, the number of folds is less than specified k
. If rasterBlock = FALSE
, the clustering will be done in
species points and the number of the folds will be the same as k
.
Note that the input raster layer should cover all the species points, otherwise an error will rise. The records with no raster
value should be deleted prior to the analysis or another raster layer would be provided.