Learn R Programming

KnowGRRF (version 1.0)

rf.repeat: Build random forest multiple times and return AUC for both training and test set if available

Description

Due to the randomness of random forest, RF models can be built multiple times to get a better estimation of model performance by assessing AUC on both out-of-bag prediction of training and test predictions on test set. Work for classification only.

Usage

rf.repeat(X.train, Y.train, X.test, Y.test, fea, times = 10)

Arguments

X.train

a data frame or matrix (like x) containing predictors for the training set.

Y.train

response for the training set. If a factor, classification is assumed, otherwise regression is assumed. If omitted, will run in unsupervised mode.

X.test

a data frame or matrix (like x) containing predictors for the test set.

Y.test

response for the test set.

fea

feature index or feature names used to train the model.

times

the number of RF models built. The more repeats, the less standard error.

Value

return a list, including

AUC

AUC calculated from out-of-bag prediction from random forest classification model

Test.AUC

AUC calculated from test prediction from random forest classification model. Only available when test set is given

References

Guan, X., & Liu, L. (2018). Know-GRRF: Domain-Knowledge Informed Biomarker Discovery with Random Forests.

Examples

Run this code
# NOT RUN {
##---- Example: classification  ----

set.seed(1)
X<-data.frame(matrix(rnorm(100*100), nrow=100))
b=seq(0.1, 2.2, 0.2) 
##y has a linear relationship with first 10 variables
y=b[5]*X$X4+b[6]*X$X5+b[7]*X$X6+b[8]*X$X7+b[9]*X$X8+b[10]*X$X9+b[11]*X$X10 
y=as.factor(ifelse(y>0, 1, 0)) ##classification

##split training and test set
X.train=X[1:70,]
X.test=X[71:100,]
y.train=y[1:70]
y.test=y[71:100]

rf.repeat(X.train, y.train, fea=1:20)  ##no test set
rf.repeat(X.train, y.train, X.test, y.test, 1:10)  ##relevant feature set
rf.repeat(X.train, y.train, X.test, y.test, 11:20)  ##irrelevant feature set

# }

Run the code above in your browser using DataLab