Learn R Programming

didec (version 1.1.0)

mfoci: Multivariate feature ordering by conditional independence.

Description

A variable selection algorithm based on the directed dependence coefficient (didec).

Usage

mfoci(
  X,
  Y,
  trans = FALSE,
  trans.method = c("standardization"),
  estim.method = c("copula"),
  perm = FALSE,
  perm.method = c("decreasing"),
  pre.selected = NULL,
  select.method = c("forward"),
  autostop = TRUE,
  max.num = NULL
)

Value

A list containing:

pre.selected.features

A vector listing the pre.selected features in X if pre.selected != NULL;

selected.features

A data.frame listing the selected and ranked variables and the corresponding values of the directed dependence coefficient if select.method == "forward"; A vector listing the selected features if select.method == "subset";

valueT

The values of the directed dependence coefficient if select.method == "subset".

Arguments

X

A numeric matrix or data.frame/data.table. Contains the predictor vector X.

Y

A numeric matrix or data.frame/data.table. Contains the response vector Y.

trans

A logical. If TRUE the inputs of X are standardized (transformed) before the variable selection.

trans.method

An optional character string specifying a method for data standardization. This must be one of the strings "standardization" (default), "rank" or "rescaling".

estim.method

An optional character string specifying a method for estimating the directed dependence coefficient didec. This must be one of the strings "codec" or "copula" (default).

perm

A logical. If TRUE a version of didec that takes into account the permutations of the response variables is used in the variable selection algorithm.

perm.method

An optional character string specifying a method for permuting the response variables. This must be one of the strings "sample", "increasing", "decreasing" (default) or "full".

pre.selected

An integer vector for indexing pre-selected components from predictor X.

select.method

An optional character string specifying a feature selection method. This must be one of the strings "forward" (default) or "subset".

autostop

A logical. If True (default) the forward feature selection algorithm stops at the first non-increasing value of didec.

max.num

An integer for limiting the maximal number of selected variables if select.method == "subset".

Author

Sebastian Fuchs, Jonathan Ansari, Yuping Wang

Details

mfoci involves a forward feature selection algorithm for multiple-outcome data that employs the directed dependence coefficient (didec) at each step.

If autostop == TRUE the algorithm stops at the first non-increasing value of didec, thereby selecting a subset of variables. Otherwise, all predictor variables are ranked according to their predictive strength measured by didec.

In addition to the forward feature selection algorithm, this function also provides a best subset selection, which can be accomplished by select.method == "subset". This method selects features by calculating the directed dependence coefficient of all possible feature combinations. Note that the features selected by this method are not ordered.

References

J. Ansari, S. Fuchs, A direct extension of Azadkia & Chatterjee's rank correlation to multi-response vectors, Available at https://arxiv.org/abs/2212.01621, 2025.

Examples

Run this code
library(didec)
df <- as.data.frame(bioclimatic)
X <- df[, c(9:12)]
Y <- df[, c(1,8)]
mfoci(X, Y, pre.selected = c(1, 3))

Run the code above in your browser using DataLab