Learn R Programming

mt (version 2.0-1.20)

fs.rf: Feature Selection Using Random Forests (RF)

Description

Feature selection using Random Forests (RF).

Usage

fs.rf(x,y,...)
  fs.rf.1(x,y,fs.len="power2",...)

Value

A list with components:

fs.rank

A vector of feature ranking scores.

fs.order

A vector of feature order from best to worst.

stats

A vector of measurements. For fs.rf, it is Random Forest important score. For fs.rf.1, it is a dummy variable (current ignored).

Arguments

x

A data frame or matrix of data set.

y

A factor or vector of class.

fs.len

Method or numeric sequence for feature lengths. For details, see get.fs.len

...

Arguments to pass to randomForests.

Author

Wanchang Lin

Details

fs.rf.1 select features based on successively eliminating the least important variables.

Examples

Run this code
data(abr1)
cls <- factor(abr1$fact$class)
dat <- abr1$pos

## fill zeros with NAs
dat <- mv.zene(dat)

## missing values summary
mv <- mv.stats(dat, grp=cls) 
mv    ## View the missing value pattern

## filter missing value variables
dat <- dat[,mv$mv.var < 0.15]

## fill NAs with mean
dat <- mv.fill(dat,method="mean")

## log transformation
dat <- preproc(dat, method="log10")

## select class "1" and "2" for feature ranking
ind <- grepl("1|2", cls)
mat <- dat[ind,,drop=FALSE] 
mat <- as.matrix(mat)
grp <- cls[ind, drop=TRUE]   

## apply random forests for feature selection/ranking
res   <- fs.rf(mat,grp)
res.1 <- fs.rf.1(mat,grp)

## compare the results
fs <- cbind(fs.rf=res$fs.order, fs.rf.1=res.1$fs.order)

## plot the important score of 'fs.rf' (not 'fs.rf.1')
score <- res$stats
score <- sort(score, decreasing = TRUE)
plot(score)

Run the code above in your browser using DataLab