Learn R Programming

Rdimtools (version 0.3.2)

do.mcfs: Multi-Cluster Feature Selection

Description

Multi-Cluster Feature Selection (MCFS) is an unsupervised feature selection method. Based on a multi-cluster assumption, it aims at finding meaningful features using sparse reconstruction of spectral basis using LASSO.

Usage

do.mcfs(X, ndim = 2, type = c("proportion", 0.1), preprocess = c("null",
  "center", "scale", "cscale", "whiten", "decorrelate"),
  K = max(round(nrow(X)/5), 2), lambda = 1, t = 10)

Arguments

X

an \((n\times p)\) matrix or data frame whose rows are observations and columns represent independent variables.

ndim

an integer-valued target dimension.

type

a vector of neighborhood graph construction. Following types are supported; c("knn",k), c("enn",radius), and c("proportion",ratio). Default is c("proportion",0.1), connecting about 1/10 of nearest data points among all data points. See also aux.graphnbd for more details.

preprocess

an additional option for preprocessing the data. Default is "null". See also aux.preprocess for more details.

K

assumed number of clusters in the original dataset.

lambda

\(\ell_1\) regularization parameter in \((0,\infty)\).

t

bandwidth parameter for heat kernel in \((0,\infty)\).

Value

a named list containing

Y

an \((n\times ndim)\) matrix whose rows are embedded observations.

featidx

a length-\(ndim\) vector of indices with highest scores.

trfinfo

a list containing information for out-of-sample prediction.

projection

a \((p\times ndim)\) whose columns are basis for projection.

References

cai_unsupervised_2010Rdimtools

Examples

Run this code
# NOT RUN {
## generate data of 3 types with clear difference
dt1  = aux.gensamples(n=33)-100
dt2  = aux.gensamples(n=33)
dt3  = aux.gensamples(n=33)+100

## merge the data and create a label correspondingly
X      = rbind(dt1,dt2,dt3)
label  = c(rep(1,33), rep(2,33), rep(3,33))

## try different regularization parameters
out1 = do.mcfs(X, lambda=0.01)
out2 = do.mcfs(X, lambda=0.1)
out3 = do.mcfs(X, lambda=1)

## visualize
par(mfrow=c(1,3))
plot(out1$Y[,1], out1$Y[,2], main="lambda=0.01")
plot(out2$Y[,1], out2$Y[,2], main="lambda=0.1")
plot(out3$Y[,1], out3$Y[,2], main="lambda=1")
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab