A variable selection algorithm based on the directed dependence coefficient (didec).
mfoci(
X,
Y,
trans = FALSE,
trans.method = c("standardization"),
estim.method = c("copula"),
perm = FALSE,
perm.method = c("decreasing"),
pre.selected = NULL,
select.method = c("forward"),
autostop = TRUE,
max.num = NULL
)A list containing:
A vector listing the pre.selected features in X if pre.selected != NULL;
A data.frame listing the selected and ranked variables and the corresponding values of the directed dependence coefficient if select.method == "forward"; A vector listing the selected features if select.method == "subset";
The values of the directed dependence coefficient if select.method == "subset".
A numeric matrix or data.frame/data.table. Contains the predictor vector X.
A numeric matrix or data.frame/data.table. Contains the response vector Y.
A logical. If TRUE the inputs of X are standardized (transformed) before the variable selection.
An optional character string specifying a method for data standardization. This must be one of the strings "standardization" (default), "rank" or "rescaling".
An optional character string specifying a method for estimating the directed dependence coefficient didec. This must be one of the strings "codec" or "copula" (default).
A logical. If TRUE a version of didec that takes into account the permutations of the response variables is used in the variable selection algorithm.
An optional character string specifying a method for permuting the response variables. This must be one of the strings "sample", "increasing", "decreasing" (default) or "full".
An integer vector for indexing pre-selected components from predictor X.
An optional character string specifying a feature selection method. This must be one of the strings "forward" (default) or "subset".
A logical. If True (default) the forward feature selection algorithm stops at the first non-increasing value of didec.
An integer for limiting the maximal number of selected variables if select.method == "subset".
Sebastian Fuchs, Jonathan Ansari, Yuping Wang
mfoci involves a forward feature selection algorithm for multiple-outcome data that employs the directed dependence coefficient (didec) at each step.
If autostop == TRUE the algorithm stops at the first non-increasing value of didec, thereby selecting a subset of variables.
Otherwise, all predictor variables are ranked according to their predictive strength measured by didec.
In addition to the forward feature selection algorithm, this function also provides a best subset selection, which can be accomplished by select.method == "subset".
This method selects features by calculating the directed dependence coefficient of all possible feature combinations.
Note that the features selected by this method are not ordered.
J. Ansari, S. Fuchs, A direct extension of Azadkia & Chatterjee's rank correlation to multi-response vectors, Available at https://arxiv.org/abs/2212.01621, 2025.
library(didec)
df <- as.data.frame(bioclimatic)
X <- df[, c(9:12)]
Y <- df[, c(1,8)]
mfoci(X, Y, pre.selected = c(1, 3))
Run the code above in your browser using DataLab