PPforest
implements a random forest using projection pursuit trees algorithm (based on PPtreeViz package).
PPforest(data, class, std = TRUE, size.tr, m, PPmethod, size.p,
lambda = .1, parallel = FALSE, cores = 2, rule = 1)
An object of class PPforest
with components.
predicted values for training data set.
error of the training data set.
predicted values for the test data set if testap = TRUE
(default).
error of the test data set if testap = TRUE
(default).
out of bag error in the forest.
out of bag error for each tree in the forest.
information of bootrap samples.
output from a trees_pp
for each bootrap sample.
Proximity matrix, if two cases are classified in the same terminal node then the proximity matrix is increased by one in PPforest
there are one terminal node per class.
a matrix with one row for each input data point and one column for each class, giving the fraction of (OOB) votes from the PPforest
.
number of trees grown in PPforest
.
number of predictor variables selected to use for spliting at each node.
classification.
confusion matrix of the prediction (based on OOB data).
the original call to PPforest
.
is the training data based on size.tr
sample proportion
is the test data based on 1-size.tr
sample proportion
Data frame with the complete data set.
A character with the name of the class variable.
if TRUE standardize the data set, needed to compute global importance measure.
is the size proportion of the training if we want to split the data in training and test.
is the number of bootstrap replicates, this corresponds with the number of trees to grow. To ensure that each observation is predicted a few times we have to select this number no too small. m = 500
is by default.
is the projection pursuit index to optimize in each classification tree. The options are LDA
and PDA
, linear discriminant and penalized linear discriminant. By default it is LDA
.
proportion of variables randomly sampled in each split.
penalty parameter in PDA index and is between 0 to 1 . If lambda = 0
, no penalty parameter is added and the PDA index is the same as LDA index. If lambda = 1
all variables are treated as uncorrelated. The default value is lambda = 0.1
.
logical condition, if it is TRUE then parallelize the function
number of cores used in the parallelization
split rule 1: mean of two group means 2: weighted mean of two group means - weight with group size 3: weighted mean of two group means - weight with group sd 4: weighted mean of two group means - weight with group se 5: mean of two group medians 6: weighted mean of two group medians - weight with group size 7: weighted mean of two group median - weight with group IQR 8: weighted mean of two group median - weight with group IQR and size
Natalia da Silva, Dianne Cook & Eun-Kyung Lee (2021) A Projection Pursuit Forest Algorithm for Supervised Classification, Journal of Computational and Graphical Statistics, DOI: 10.1080/10618600.2020.1870480
#crab example with all the observations used as training
pprf.crab <- PPforest(data = crab, class = 'Type',
std = FALSE, size.tr = 1, m = 200, size.p = .5,
PPmethod = 'LDA' , parallel = TRUE, cores = 2, rule=1)
pprf.crab
Run the code above in your browser using DataLab