Learn R Programming

mt (version 2.0-1.20)

fs.pca: Feature Selection by PCA

Description

Feature selection using PCA loadings.

Usage

fs.pca(x,thres=0.8, ...)

Value

A list with components:

fs.rank

A vector of feature ranking scores.

fs.order

A vector of feature order from best to worst.

stats

A vector of measurements.

Arguments

x

A data frame or matrix of data set.

thres

The threshold of the cumulative percentage of PC's explained variances.

...

Additional arguments to prcomp.

Author

Wanchang Lin

Details

Since PCA loadings is a matrix with respect to PCs, the Mahalanobis distance of loadings is applied to select the features. (Other ways, for example, the sum of absolute values of loadings, or squared root of loadings, can be used.)

It should be noticed that this feature selection method is unsupervised.

See Also

feat.rank.re

Examples

Run this code
## prepare data set
data(abr1)
cls <- factor(abr1$fact$class)
dat <- abr1$pos
## dat <- abr1$pos[,110:1930]

## fill zeros with NAs
dat <- mv.zene(dat)

## missing values summary
mv <- mv.stats(dat, grp=cls) 
mv    ## View the missing value pattern

## filter missing value variables
## dim(dat)
dat <- dat[,mv$mv.var < 0.15]
## dim(dat)

## fill NAs with mean
dat <- mv.fill(dat,method="mean")

## log transformation
dat <- preproc(dat, method="log10")

## select class "1" and "2" for feature ranking
ind <- grepl("1|2", cls)
mat <- dat[ind,,drop=FALSE] 
mat <- as.matrix(mat)
grp <- cls[ind, drop=TRUE]   

## feature selection by PCA
res <- fs.pca(dat)
names(res)

Run the code above in your browser using DataLab