## S3 method for class 'default':
VSURF(x, y, ntree = 2000, mtry = max(floor(ncol(x)/3), 1),
nfor.thres = 50, nmin = 1, nfor.interp = 25, nsd = 1,
nfor.pred = 25, nmj = 1, ...)## S3 method for class 'formula':
VSURF(formula, data, ..., na.action = na.fail)
## S3 method for class 'default':
VSURF.parallel(x, y, ntree = 2000,
mtry = max(floor(ncol(x)/3), 1), nfor.thres = 50, nmin = 1,
nfor.interp = 25, nsd = 1, nfor.pred = 25, nmj = 1,
clusterType = "PSOCK", ncores = detectCores() - 1, ...)
## S3 method for class 'formula':
VSURF.parallel(formula, data, ..., na.action = na.fail)
randomForest it is only used with the formula-type call.)randomForest.randomForest.err.interp is multiplied.randomForest
function (see ?randomForest for further information)VSURF, which is a list with the following
components:varselect.thres variables.$x contains the mean importances sorted in decreasing
order. $ix contains indexes of the variables.ord.imp.VSURF.parallel (only if parallel version of VSURF is used).VSURF.parallel
(only if parallel version of VSURF is used).VSURF.nfor.thresrandom forests are computed using the functionrandomForestwith
argumentsimportance=TRUE. Then variables are sorted according to
their mean variable importance (VI), in decreasing order. This order is
kept all along the procedure. Next, a threshold is computed:min.thres, the minimum predicted value of a pruned CART tree fitted
to the curve of the standard deviations of VI. Finally, the actual
"thresholding step" is performed: only variables with a mean VI larger thannmin*min.thresare kept.nfor.interpembedded random forests models
are grown, starting with the random forest build with only the most
important variable and ending with all variables selected in the first step.
Then,err.minthe minimum mean out-of-bag (OOB) error of these models
and its associated standard deviationsd.minare computed. Finally,
the smallest model (and hence its corresponding variables) having a mean OOB
error less thanerr.min+nsd*sd.minis selected.mean.jump, the mean jump value is calculated using
variables that have been left out by the second step, and is set as the mean
absolute difference between mean OOB errors of one model and its first
following model. Hence a variable is included in the model if the mean OOB
error decrease is larger thannmj*mean.jump.VSURF.parallel is able to run VSURF using mutliple cores in parallel
(see clusterType and ncores arguments).
plot.VSURF, summary.VSURF,
VSURF.thres, VSURF.interp,
VSURF.pred, tunedata(iris)
iris.vsurf <- VSURF(x=iris[,1:4], y=iris[,5], ntree=100, nfor.thres=20,
nfor.interp=10, nfor.pred=10)
iris.vsurf
# A more interesting example with toys data (see \code{\link{toys}})
# (a few minutes to execute)
data(toys)
toys.vsurf <- VSURF(x=toys$x, y=toys$y)
toys.vsurf
# VSURF run on 2 cores in parallel (using a SOCKET cluster):
data(toys)
toys.vsurf.para <- VSURF.parallel(x=toys$x, y=toys$y, ncores=2)Run the code above in your browser using DataLab