Learn R Programming

VSURF (version 0.8.2)

VSURF.thres.default: Thresholding step of VSURF

Description

Thresholding step is dedicated to roughly eliminate irrelevant variables a the dataset. This is the first step of the VSURF function. For refined variable selection, see VSURF other steps: VSURF.interp and VSURF.pred.

Usage

## S3 method for class 'default':
VSURF.thres(x, y, ntree = 2000,
  mtry = max(floor(ncol(x)/3), 1), nfor.thres = 50, nmin = 1, ...)

## S3 method for class 'formula': VSURF.thres(formula, data, ..., na.action = na.fail)

## S3 method for class 'default': VSURF.thres.parallel(x, y, ntree = 2000, mtry = max(floor(ncol(x)/3), 1), nfor.thres = 50, nmin = 1, clusterType = "PSOCK", ncores = detectCores() - 1, ...)

## S3 method for class 'formula': VSURF.thres.parallel(formula, data, ..., na.action = na.fail)

Arguments

data
a data frame containing the variables in the model.
na.action
A function to specify the action to be taken if NAs are found. (NOTE: If given, this argument must be named, and as randomForest it is only used with the formula-type call.)
x,formula
A data frame or a matrix of predictors, the columns represent the variables. Or a formula describing the model to be fitted.
y
A response vector (must be a factor for classification problems and numeric for regression ones).
ntree
Number of trees in each forest grown. Standard randomForest parameter.
mtry
Number of variables randomly sampled as candidates at each split. Standard randomForest parameter.
nfor.thres
Number of forests grown.
nmin
Number of times the "minimum value" is multiplied to set threshold value. See details below.
clusterType
Type of the multiple cores cluster used to run VSURF in parallel. Must be chosen among "PSOCK" (default: SOCKET cluster available locally on all OS), "FORK" (local too, only available for Linux and Mac OS) and "MPI" (can be used on a remote cluster, which
ncores
Number of cores to use. Default is set to the number of cores detected by R minus 1.
...
others parameters to be passed on to the randomForest function (see ?randomForest for further information)

Value

  • An object of class VSURF.thres, which is a list with the following components:
  • varselect.thresA vector of indices of selected variables, sorted according to their mean VI, in decreasing order.
  • imp.varselect.thresA vector of importances of the varselect.thres variables.
  • min.thresThe minimum predicted value of a pruned CART tree fitted to the curve of the standard deviations of VI.
  • num.varselect.thresThe number of selected variables.
  • ord.impA list containing the order of all variables mean importance. $x contains the mean importances in decreasing order. $ix contains indices of the variables.
  • ord.sdA vector of standard deviations of all variables importances. The order is given by ord.imp.
  • mean.perfThe mean OOB error rate, obtained by a random forests build with all variables.
  • pred.pruned.treeThe predictions of the CART tree fitted to the curve of the standard deviations of VI.
  • comput.timeComputation time.
  • clusterTypeThe type of the cluster used to run VSURF.parallel (only if parallel version of VSURF is used).
  • ncoresThe number of cores used to run VSURF.parallel (only if parallel version of VSURF is used).
  • callThe original call to VSURF.
  • termsTerms associated to the formula (only if formula-type call was used).

Details

First, nfor.thres random forests are computed using the function randomForest with arguments importance=TRUE. Then variables are sorted according to their mean variable importance (VI), in decreasing order. This order is kept all along the procedure. Next, a threshold is computed: min.thres, the minimum predicted value of a pruned CART tree fitted to the curve of the standard deviations of VI. Finally, the actual thresholding is performed: only variables with a mean VI larger than nmin * min.thres are kept.

References

Genuer, R. and Poggi, J.M. and Tuleau-Malot, C. (2010), Variable selection using random forests, Pattern Recognition Letters 31(14), 2225-2236

See Also

VSURF, tune

Examples

Run this code
data(iris)
iris.thres <- VSURF.thres(x=iris[,1:4], y=iris[,5], ntree=100, nfor.thres=20)
iris.thres

# A more interesting example with toys data (see \code{\link{toys}})
# (a few minutes to execute)
data(toys)
toys.thres <- VSURF.thres(x=toys$x, y=toys$y)
toys.thres

Run the code above in your browser using DataLab