PPforest
implements a random forest using projection pursuit trees algorithm (based on PPtreeViz package).
PPforest(data, class, std = TRUE, size.tr, m, PPmethod, size.p,
lambda = .1, parallel = FALSE, cores = 2)
Data frame with the complete data set.
A character with the name of the class variable.
if TRUE standardize the data set, needed to compute global importance measure.
is the size proportion of the training if we want to split the data in training and test.
is the number of bootstrap replicates, this corresponds with the number of trees to grow. To ensure that each observation is predicted a few times we have to select this number no too small. m = 500
is by default.
is the projection pursuit index to optimize in each classification tree. The options are LDA
and PDA
, linear discriminant and penalized linear discriminant. By default it is LDA
.
proportion of variables randomly sampled in each split.
penalty parameter in PDA index and is between 0 to 1 . If lambda = 0
, no penalty parameter is added and the PDA index is the same as LDA index. If lambda = 1
all variables are treated as uncorrelated. The default value is lambda = 0.1
.
logical condition, if it is TRUE then parallelize the function
number of cores used in the parallelization
An object of class PPforest
with components.
predicted values for training data set.
error of the training data set.
predicted values for the test data set if testap = TRUE
(default).
error of the test data set if testap = TRUE
(default).
out of bag error in the forest.
out of bag error for each tree in the forest.
information of bootrap samples.
output from a trees_pp
for each bootrap sample.
Proximity matrix, if two cases are classified in the same terminal node then the proximity matrix is increased by one in PPforest
there are one terminal node per class.
a matrix with one row for each input data point and one column for each class, giving the fraction of (OOB) votes from the PPforest
.
number of trees grown in PPforest
.
number of predictor variables selected to use for spliting at each node.
classification.
confusion matrix of the prediction (based on OOB data).
the original call to PPforest
.
is the training data based on size.tr
sample proportion
is the test data based on 1-size.tr
sample proportion
# NOT RUN {
#crab example with all the observations used as training
pprf.crab <- PPforest(data = crab, class = 'Type',
std = FALSE, size.tr = 1, m = 200, size.p = .5, PPmethod = 'LDA' , parallel = TRUE, cores = 2)
pprf.crab
# }
Run the code above in your browser using DataLab