Multi-Cluster Feature Selection (MCFS) is an unsupervised feature selection method. Based on a multi-cluster assumption, it aims at finding meaningful features using sparse reconstruction of spectral basis using LASSO.
do.mcfs(
X,
ndim = 2,
type = c("proportion", 0.1),
preprocess = c("null", "center", "scale", "cscale", "whiten", "decorrelate"),
K = max(round(nrow(X)/5), 2),
lambda = 1,
t = 10
)
a named list containing
an
a length-
a list containing information for out-of-sample prediction.
a
an
an integer-valued target dimension.
a vector of neighborhood graph construction. Following types are supported;
c("knn",k)
, c("enn",radius)
, and c("proportion",ratio)
.
Default is c("proportion",0.1)
, connecting about 1/10 of nearest data points
among all data points. See also aux.graphnbd
for more details.
an additional option for preprocessing the data.
Default is "null". See also aux.preprocess
for more details.
assumed number of clusters in the original dataset.
bandwidth parameter for heat kernel in
Kisung You
cai_unsupervised_2010Rdimtools
## generate data of 3 types with clear difference
dt1 = aux.gensamples(n=20)-100
dt2 = aux.gensamples(n=20)
dt3 = aux.gensamples(n=20)+100
## merge the data and create a label correspondingly
X = rbind(dt1,dt2,dt3)
label = rep(1:3, each=20)
## try different regularization parameters
out1 = do.mcfs(X, lambda=0.01)
out2 = do.mcfs(X, lambda=0.1)
out3 = do.mcfs(X, lambda=1)
## visualize
opar <- par(no.readonly=TRUE)
par(mfrow=c(1,3))
plot(out1$Y, pch=19, col=label, main="lambda=0.01")
plot(out2$Y, pch=19, col=label, main="lambda=0.1")
plot(out3$Y, pch=19, col=label, main="lambda=1")
par(opar)
Run the code above in your browser using DataLab