This method finds a subset of variables that have low collinearity. It provides
three methods: cor_caret
, a stepwise approach to remove variables with a pairwise correlation
above a given cutoff, choosing the variable with the greatest mean correlation (based on the algorithm in caret::findCorrelation
);
vif_step
, a stepwise approach to remove variables with an variance inflation factor
above a given cutoff (based on the algorithm in usdm::vifstep
), and vif_cor
, a stepwise
approach that, at each step, find the pair of variables with the highest correlation above the cutoff and removes the
one with the largest vif.
such that all have a correlation
below a certain cutoff. There are methods for terra::SpatRaster
,
data.frame
and matrix
. For terra::SpatRaster
and data.frame
, only numeric variables will be
considered.
filter_collinear(
x,
cutoff = NULL,
verbose = FALSE,
names = TRUE,
to_keep = NULL,
method = "cor_caret",
cor_type = "pearson",
max_cells = Inf,
...
)# S3 method for default
filter_collinear(
x,
cutoff = NULL,
verbose = FALSE,
names = TRUE,
to_keep = NULL,
method = "cor_caret",
cor_type = "pearson",
max_cells = Inf,
...
)
# S3 method for SpatRaster
filter_collinear(
x,
cutoff = NULL,
verbose = FALSE,
names = TRUE,
to_keep = NULL,
method = "cor_caret",
cor_type = "pearson",
max_cells = Inf,
exhaustive = FALSE,
...
)
# S3 method for data.frame
filter_collinear(
x,
cutoff = NULL,
verbose = FALSE,
names = TRUE,
to_keep = NULL,
method = "cor_caret",
cor_type = "pearson",
max_cells = Inf,
...
)
# S3 method for matrix
filter_collinear(
x,
cutoff = NULL,
verbose = FALSE,
names = TRUE,
to_keep = NULL,
method = "cor_caret",
cor_type = "pearson",
max_cells = Inf,
...
)
A vector of names of columns that are below the correlation threshold
(when names = TRUE
), otherwise a vector of indices. Note that the indices
are only for numeric variables (i.e. if factors are present, the indices do
not take them into account).
A terra::SpatRaster
object, a data.frame (with only numeric
variables)
A numeric value used as a threshold to remove variables. For, "cor_caret" and "vif_cor", it is the pair-wise absolute correlation cutoff, which defaults to 0.7. For "vif_step", it is the variable inflation factor, which defaults to 10
A boolean whether additional information should be provided on the screen
a logical; should the column names be returned TRUE
or
the column index FALSE
)?
A vector of variable names that we want to force in the set (note that the function will return an error if the correlation among any of those variables is higher than the cutoff).
character. One of "cor_caret", "vif_cor" or "vif_step".
character. For methods that use correlation, which type of correlation: "pearson", "kendall", or "spearman". Defaults to "pearson"
positive integer. The maximum number of cells to be used. If this is smaller than ncell(x), a regular sample of x is used
additional arguments specific to a given object type
boolean. Used only for terra::SpatRaster
when downsampling
to max_cells
, if we require the exhaustive
approach in terra::spatSample()
.
This is only needed for rasters that are very sparse and not too large, see the help
page of terra::spatSample()
for details.
for cor_caret
: Original R code by Dong Li, modified by Max Kuhn
and Andrea Manica; for vif_step
and vif_cor
, original algorithm by Babak
Naimi, rewritten by Andrea Manica for tidysdm
Naimi, B., Hamm, N.A.S., Groen, T.A., Skidmore, A.K., and Toxopeus, A.G. 2014. Where is positional uncertainty a problem for species distribution modelling?, Ecography 37 (2): 191-203.