UFS: Unsupervised Feature Selection

Description

Performs unsupervised feature selection for mixed type data. Both algorithms are based on the heterogeneous correlation matrix.

Usage

UFS(
  data = NULL,
  alpha = 0.05,
  missing = FALSE,
  pv_adj = "none",
  smooth.tol = 10^-12,
  method = "c"
)

Value

An list of elements:

rearranged.data.set: Original data frame with with numerical features first
selected.features: A data frame of the selected features
feature.indices: The indices of the selected features from the original data frame
original.corr.matrix: The \(p\) by \(p\) extended correlation matrix of all the inputted features
corr.matrix: The \(d\) by \(d\) extended correlation matrix of the selected features
original.p.value.matrix: The \(p\) by \(p\) p-values matrix of all the inputted features
p.value.matrix: The \(d\) by \(d\) p-values matrix of the selected features

Arguments

data: A data frame. Values of type 'numeric' or 'integer' are treated as numerical, factors as ordinal categorical.
alpha: Significance level to be used for testing, default = 0.05.
missing: Pairwise complete by default, set to TRUE for complete deletion.
pv_adj: Correction method for p-value, "none" by default. For options see p.adjust.
smooth.tol: Minimum acceptable eigenvalue for the smoothing, default = 10^-12.
method: Algorithm used. c (cell-wise) by default, r (row-wise) as the alternative.

References

Tortora C., Madhvani S., Punzo A. (2025). Designing unsupervised mixed-type feature selection techniques using the heterogeneous correlation matrix. International Statistical Review. https://doi.org/10.1111/insr.70016

Examples

Run this code

data(ESI)#Loading the data
data = ESI[,-c(1,3,4,6,9)]##removing categorical features
res = UFS(data)

### visualize selected features
colnames(res$selected.features)

Run the code above in your browser using DataLab