do.mcfs: Multi-Cluster Feature Selection

Description

Multi-Cluster Feature Selection (MCFS) is an unsupervised feature selection method. Based on a multi-cluster assumption, it aims at finding meaningful features using sparse reconstruction of spectral basis using LASSO.

Usage

do.mcfs(X, ndim = 2, type = c("proportion", 0.1), preprocess = c("null",
  "center", "scale", "cscale", "whiten", "decorrelate"),
  K = max(round(nrow(X)/5), 2), lambda = 1, t = 10)

Arguments

an \((n\times p)\) matrix or data frame whose rows are observations and columns represent independent variables.

ndim

an integer-valued target dimension.

type

a vector of neighborhood graph construction. Following types are supported; c("knn",k), c("enn",radius), and c("proportion",ratio). Default is c("proportion",0.1), connecting about 1/10 of nearest data points among all data points. See also aux.graphnbd for more details.

preprocess

an additional option for preprocessing the data. Default is "null". See also aux.preprocess for more details.

assumed number of clusters in the original dataset.

lambda

\(\ell_1\) regularization parameter in \((0,\infty)\).

bandwidth parameter for heat kernel in \((0,\infty)\).

Value

a named list containing

Y: an \((n\times ndim)\) matrix whose rows are embedded observations.
featidx: a length-\(ndim\) vector of indices with highest scores.
trfinfo: a list containing information for out-of-sample prediction.
projection: a \((p\times ndim)\) whose columns are basis for projection.

References

cai_unsupervised_2010Rdimtools

Examples

Run this code

# NOT RUN {
## generate data of 3 types with clear difference
dt1  = aux.gensamples(n=33)-100
dt2  = aux.gensamples(n=33)
dt3  = aux.gensamples(n=33)+100

## merge the data and create a label correspondingly
X      = rbind(dt1,dt2,dt3)
label  = c(rep(1,33), rep(2,33), rep(3,33))

## try different regularization parameters
out1 = do.mcfs(X, lambda=0.01)
out2 = do.mcfs(X, lambda=0.1)
out3 = do.mcfs(X, lambda=1)

## visualize
par(mfrow=c(1,3))
plot(out1$Y[,1], out1$Y[,2], main="lambda=0.01")
plot(out2$Y[,1], out2$Y[,2], main="lambda=0.1")
plot(out3$Y[,1], out3$Y[,2], main="lambda=1")
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab