trimTrees(xtrain, ytrain, xtest, ytest, ntree = 500,
mtry = if (!is.null(ytrain) && !is.factor(ytrain))
max(floor(ncol(xtrain)/3), 1) else floor(sqrt(ncol(xtrain))),
nodesize = if (!is.null(ytrain) && !is.factor(ytrain)) 5 else 1,
trim = 0,trimIsExterior = TRUE,
uQuantiles = seq(0.05, 0.95, 0.05), methodIsCDF = TRUE)TRUE, the trimming is done exteriorly, or from the ends of the ordered vector. If FALSE, the trimming is done interiorly, or from the middle of the ordered vector.uQuantiles=c(0.25,0.75), then the 0.25-quantile and the 0.75-quantile of the trimmed and untrimmed ensembles are scored.TRUE, the method for forming the trimmed opinion pool is according to the cdf approach in Jose et al (2014). If FALSE, the moment approach is used.trimTrees, which is a list with the following components:ytrain values (not necessarily unique) that are both inbag and in the xtest's terminal node. Note that the ytrain values may not be unique. This component is an ntrain-by-ntree matrix where ntrain is the number of rows in the training set.treeValues and lists them by their unique values. This component is an nSupport-by-ntree matrix. nSupport is the number of unique ytrain values, or support points of the forest.treeCounts of dimension (nSupport+1)-by-ntree.treeCumCounts for the last testing set row only. This component is an (nSupport+1)-by-ntree matrix. Note that the first row in this matrix is all zeros.nSupport-by-ntree matrix.ntest-by-ntree matrix where ntest is the number of rows in the testing set.ntest-by-ntree matrix.ytest value. This component is a ntest-by-ntree matrix.uQuantiles, the empirical cdf evaluated at the realized ytest value. This component is an ntree-by-nQuantile matrix where nQuantile is the number of elements in uQuantiles.ntest-by-ntree. It is useful for generating calibration curves (stated probabilities in bins vs. their observed frequencies) for binary classification.2*p*(1-p) where p is the fraction of trees' means above the ytest value.ntree-by-ntree matrix.predict.randomForest with type="prob".rfClassEnsembleCDFs for each element in uQuantiles.rfClassEnsembleCDFs for each element in uQuantiles.rfClassEnsembleCDFs. See Jose and Winkler (2009) for a description of the linear and log quantile scores.rfClassEnsembleCDFs. See Gneiting and Raftery (2007) for a description of the ranked probability score.ytest in the form of a cdf.uQuantiles.ytest in the form of a cdf.uQuantiles.hitRate, cinbag# Load the data
set.seed(201) # Can be removed; useful for replication
data <- as.data.frame(mlbench.friedman1(500, sd=1))
summary(data)
# Prepare data for trimming
train <- data[1:400, ]
test <- data[401:500, ]
xtrain <- train[,-11]
ytrain <- train[,11]
xtest <- test[,-11]
ytest <- test[,11]
# Run trimTrees
set.seed(201) # Can be removed; useful for replication
trimming <- trimTrees(xtrain, ytrain, xtest, ytest,trim=0.15)
# Outputs from trimTrees
colMeans(trimming$trimmedEnsembleScores)
colMeans(trimming$untrimmedEnsembleScores)
mean(hitRate(trimming$treePITs))
hitRate(trimming$trimmedEnsemblePITs)
hitRate(trimming$untrimmedEnsemblePITs)
hist(trimming$trimmedEnsemblePITs, prob=TRUE)
hist(trimming$untrimmedEnsemblePITs, prob=TRUE)Run the code above in your browser using DataLab