Learn R Programming

roughrf (version 1.0)

rrfa: Roughenen Random Forests - A (RRFA)

Description

RRFA algorithm

1.Impose missing values under the mechanism of missing completely at random on all covariates of both training and testing datasets.

2.Impute the missing data by median imputation for continuous variables and mode imputation for categorical variables.

3.Build one tree in random forests using the above imputed training dataset, and then use it to predict the binary outcomes in the imputed testing dataset.

4.Repeat 1 to 3 for number.trees times.

Usage

rrfa(dat, yvar = ncol(dat), tr, te, mispct, number.trees)

Arguments

dat
A data frame containing both training and testing datasets
yvar
The column number of the binary outcome variable, a factor variable. The default value is set as ncol(dat)
tr
Row numbers of all training data
te
Row numbers of all testing data
mispct
Rate of missing data, ranging from 0 to 1
number.trees
Number of trees used in roughened random forests

Value

A prediction matrix. Each column shows the predicted values by a single tree. Each row is sequentially associated with the observations in the testing dataset. Each cell value is either 0 or 1.

References

Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.

Liaw, A. & Wiener, M., 2002. Classification and regression by randomForest. R News, 2(3), pp. 18-22.

Xiong, Kuangnan. "Roughened Random Forests for Binary Classification." PhD diss., State University of New York at Albany, 2014.

See Also

rrfb, rrfc1, rrfc2, rrfc3, rrfc4, rrfc5, rrfc6, rrfc7, rrfd, rrfe

Examples

Run this code
if(require(MASS)){
if(require(caTools)){

dat=rbind(Pima.tr,Pima.te)
number.trees=50
#number.trees=500
tr=1:200
te=201:532
mispct=0.4
yvar=ncol(dat)
  
#AUC value for the testing dataset based on the original random forests
rf=randomForest(dat[tr,-yvar],dat[tr,yvar],dat[te,-yvar],ntree=number.trees)
print(colAUC(rf$test$votes[,2],dat[te,yvar])) 

#AUC value for the testing dataset based on RRFA
pred.rrfa=rrfa(dat,yvar,tr,te,mispct,number.trees)
print(colAUC(apply(pred.rrfa$pred,1,mean),dat[te,yvar]))
}}

Run the code above in your browser using DataLab