
Last chance! 50% off unlimited learning
Sale ends in
This function selects optimal trees for classification from a total of t.initial
trees grown by random forest. Number of trees in the initial set, t.initial
, is specified by the user. If not specified then the default t.initial = 1000
is used.
OTClass(XTraining, YTraining, method=c("oob+independent","oob","sub-sampling"),
p = 0.1,t.initial = NULL,nf = NULL, ns = NULL, info = TRUE)
An n x d
dimensional training data matrix/frame consiting of traing observation where n
is the number of observations and d
is the number of features.
A vector of length n
consisting of class labels for the training data. Should be binary (0,1).
Method used in the selection of optimal trees. method="oob+independent"
used out-of-bag observation from the bootstrap sample taken for growing the individual tree for indidual tree assessment while an independent training data for their collective assessement. method="oob"
use the out-of-bag observations both for individual and collective assessment. method="sub-sampling"
uses a sub-sample of the training data for individual tree assessment as well as its contribution towards the ensemble.
Percent of the best t.initial
trees to be selected on the basis of performance on out-of-bag observations.
Size of the initial set of classification trees.
Number of features to be sampled for spliting the nodes of the trees. If equal to NULL
then the default sqrt(number of features)
is executed.
Node size: Minimal number of samples in the nodes. If equal to NULL
then the default 1
is executed.
If TRUE
, displays processing information.
A trained object consisting of the selected trees.
Large values are recommended for t.initial
for better performance as possible under the available computational resources.
Khan, Z., Gul, A., Perperoglou, A., Miftahuddin, M., Mahmoud, O., Adler, W., & Lausen, B. (2019). Ensemble of optimal trees, random forest and random projection ensemble classification. Advances in Data Analysis and Classification, 1-20.
Liaw, A. and Wiener, M. (2002) ``Classification and regression by random forest'' R news. 2(3). 18--22.
# NOT RUN {
#load the data
data(Body)
data <- Body
#Divide the data into training and test parts
set.seed(9123)
n <- nrow(data)
training <- sample(1:n,round(2*n/3))
testing <- (1:n)[-training]
X <- data[,1:24]
Y <- data[,25]
#Train OTClass on the training data
Opt.Trees <- OTClass(XTraining=X[training,],YTraining = Y[training],
t.initial=200,method="oob+independent")
#Predict on test data
Prediction <- Predict.OTClass(Opt.Trees, X[testing,],YTesting=Y[testing])
#Objects returned
names(Prediction)
Prediction$Confusion.Matrix
Prediction$Predicted.Class.Labels
# }
Run the code above in your browser using DataLab