
Last chance! 50% off unlimited learning
Sale ends in
1.Impose missing values under the mechanism of missing completely at random on all covariates of the training dataset.
2.Impute the missing values in a continuous variable by its mean value and impute the missing values in a categorical variable by its mode value (Mean/mode imputation).
3.Build one tree in random forests using the above imputed training dataset, and then use it to predict the binary outcomes in the original testing dataset.
4.Repeat 1 to 3 for number.trees
times.
rrfc1(dat, yvar = ncol(dat), tr, te, mispct, number.trees)
Liaw, A. & Wiener, M., 2002. Classification and regression by randomForest. R News, 2(3), pp. 18-22.
Xiong, Kuangnan. "Roughened Random Forests for Binary Classification." PhD diss., State University of New York at Albany, 2014.
rrfa
, rrfb
, rrfc2
, rrfc3
, rrfc4
, rrfc5
, rrfc6
, rrfc7
, rrfd
, rrfe
if(require(MASS)){
if(require(caTools)){
dat=rbind(Pima.tr,Pima.te)
number.trees=50
#number.trees=500
tr=1:200
te=201:532
mispct=0.4
yvar=ncol(dat)
#AUC value for the testing dataset based on the original random forests
rf=randomForest(dat[tr,-yvar],dat[tr,yvar],dat[te,-yvar],ntree=number.trees)
print(colAUC(rf$test$votes[,2],dat[te,yvar]))
#AUC value for the testing dataset based on RRFC1
pred.rrfc1=rrfc1(dat,yvar,tr,te,mispct,number.trees)
print(colAUC(apply(pred.rrfc1$pred,1,mean),dat[te,yvar]))
}}
#
Run the code above in your browser using DataLab