Learn R Programming

VSURF (version 0.8.2)

VSURF.interp.default: Interpretation step of VSURF

Description

Interpretation step aims to select all variables related to the response for interpretation prupose. This is the second step of the VSURF function. It is designed to be executed after the thresholding step VSURF.thres.

Usage

## S3 method for class 'default':
VSURF.interp(x, y, vars, nfor.interp = 25, nsd = 1, ...)

## S3 method for class 'formula': VSURF.interp(formula, data, ..., na.action = na.fail)

## S3 method for class 'default': VSURF.interp.parallel(x, y, vars, nfor.interp = 25, nsd = 1, clusterType = "PSOCK", ncores = detectCores() - 1, ...)

## S3 method for class 'formula': VSURF.interp.parallel(formula, data, ..., na.action = na.fail)

Arguments

data
a data frame containing the variables in the model.
na.action
A function to specify the action to be taken if NAs are found. (NOTE: If given, this argument must be named, and as randomForest it is only used with the formula-type call.)
x,formula
A data frame or a matrix of predictors, the columns represent the variables. Or a formula describing the model to be fitted.
y
A response vector (must be a factor for classification problems and numeric for regression ones).
vars
A vector of variable indices. Typically, indices of variables selected by thresholding step (see value varselect.thres of VSURF.thres function).
nfor.interp
Number of forests grown.
nsd
Number of times the standard deviation of the minimum value of err.interp is multiplied. See details below.
clusterType
Type of the multiple cores cluster used to run VSURF in parallel. Must be chosen among "PSOCK" (default: SOCKET cluster available locally on all OS), "FORK" (local too, only available for Linux and Mac OS) and "MPI" (can be used on a remote cluster, which
ncores
Number of cores to use. Default is set to the number of cores detected by R minus 1.
...
others parameters to be passed on to the randomForest function (see ?randomForest for further information)

Value

  • An object of class VSURF.interp, which is a list with the following components:
  • varselect.interpA vector of indices of selected variables.
  • err.interpA vector of the mean OOB error rates of the embedded random forests models.
  • sd.minThe standard deviation of OOB error rates associated to the random forests model attaining the minimum mean OOB error rate.
  • num.varselect.interpThe number of selected variables.
  • varselect.thresA vector of indexes of variables selected after "thresholding step", sorted according to their mean VI, in decreasing order.
  • comput.timeComputation time.
  • clusterTypeThe type of the cluster used to run VSURF.parallel (only if parallel version of VSURF is used).
  • ncoresThe number of cores used to run VSURF.parallel (only if parallel version of VSURF is used).
  • callThe original call to VSURF.
  • termsTerms associated to the formula (only if formula-type call was used).

Details

nfor.interp embedded random forests models are grown, starting with the random forest build with only the most important variable and ending with all variables. Then, err.min the minimum mean out-of-bag (OOB) error rate of these models and its associated standard deviation sd.min are computed. Finally, the smallest model (and hence its corresponding variables) having a mean OOB error less than err.min + nsd * sd.min is selected.

References

Genuer, R. and Poggi, J.M. and Tuleau-Malot, C. (2010), Variable selection using random forests, Pattern Recognition Letters 31(14), 2225-2236

See Also

VSURF, tune

Examples

Run this code
data(iris)
iris.thres <- VSURF.thres(x=iris[,1:4], y=iris[,5], ntree=100, nfor.thres=20)
iris.interp <- VSURF.interp(x=iris[,1:4], y=iris[,5], vars=iris.thres$varselect.thres,
                            nfor.interp=10)
iris.interp

# A more interesting example with toys data (see \code{\link{toys}})
# (a few minutes to execute)
data(toys)
toys.thres <- VSURF.thres(x=toys$x, y=toys$y)
toys.interp <- VSURF.interp(x=toys$x, y=toys$y, vars=toys.thres$varselect.thres)
toys.interp

Run the code above in your browser using DataLab