## S3 method for class 'default':
VSURF(x, y, ntree = 2000, mtry = max(floor(ncol(x)/3), 1),
nfor.thres = 50, nmin = 1, nfor.interp = 25, nsd = 1,
nfor.pred = 25, nmj = 1, ...)## S3 method for class 'formula':
VSURF(formula, data, ..., na.action = na.fail)
## S3 method for class 'default':
VSURF.parallel(x, y, ntree = 2000,
mtry = max(floor(ncol(x)/3), 1), nfor.thres = 50, nmin = 1,
nfor.interp = 25, nsd = 1, nfor.pred = 25, nmj = 1,
clusterType = "PSOCK", ncores = detectCores() - 1, ...)
## S3 method for class 'formula':
VSURF.parallel(formula, data, ..., na.action = na.fail)
randomForest
it is only used with the formula-type call.)randomForest
.randomForest
.err.interp
is multiplied.randomForest
function (see ?randomForest for further information)VSURF
, which is a list with the following
components:varselect.thres
variables.$x
contains the mean importances sorted in decreasing
order. $ix
contains indexes of the variables.ord.imp
.VSURF.parallel
(only if parallel version of VSURF is used).VSURF.parallel
(only if parallel version of VSURF is used).VSURF
.nfor.thres
random forests are computed using the functionrandomForest
with
argumentsimportance=TRUE
. Then variables are sorted according to
their mean variable importance (VI), in decreasing order. This order is
kept all along the procedure. Next, a threshold is computed:min.thres
, the minimum predicted value of a pruned CART tree fitted
to the curve of the standard deviations of VI. Finally, the actual
"thresholding step" is performed: only variables with a mean VI larger thannmin
*min.thres
are kept.nfor.interp
embedded random forests models
are grown, starting with the random forest build with only the most
important variable and ending with all variables selected in the first step.
Then,err.min
the minimum mean out-of-bag (OOB) error of these models
and its associated standard deviationsd.min
are computed. Finally,
the smallest model (and hence its corresponding variables) having a mean OOB
error less thanerr.min
+nsd
*sd.min
is selected.mean.jump
, the mean jump value is calculated using
variables that have been left out by the second step, and is set as the mean
absolute difference between mean OOB errors of one model and its first
following model. Hence a variable is included in the model if the mean OOB
error decrease is larger thannmj
*mean.jump
.VSURF.parallel is able to run VSURF using mutliple cores in parallel
(see clusterType
and ncores
arguments).
plot.VSURF
, summary.VSURF
,
VSURF.thres
, VSURF.interp
,
VSURF.pred
, tune
data(iris)
iris.vsurf <- VSURF(x=iris[,1:4], y=iris[,5], ntree=100, nfor.thres=20,
nfor.interp=10, nfor.pred=10)
iris.vsurf
# A more interesting example with toys data (see \code{\link{toys}})
# (a few minutes to execute)
data(toys)
toys.vsurf <- VSURF(x=toys$x, y=toys$y)
toys.vsurf
# VSURF run on 2 cores in parallel (using a SOCKET cluster):
data(toys)
toys.vsurf.para <- VSURF.parallel(x=toys$x, y=toys$y, ncores=2)
Run the code above in your browser using DataLab