Learn R Programming

VSURF (version 0.8.2)

VSURF.pred.default: Prediction step of VSURF

Description

Prediction step refines the selection of intepretation step VSURF.interp by eliminating redundancy in the set of variables selected, for prediction prupose. This is the third step of the VSURF function.

Usage

## S3 method for class 'default':
VSURF.pred(x, y, err.interp, varselect.interp,
  nfor.pred = 25, nmj = 1, ...)

## S3 method for class 'formula': VSURF.pred(formula, data, ..., na.action = na.fail)

Arguments

data
a data frame containing the variables in the model.
na.action
A function to specify the action to be taken if NAs are found. (NOTE: If given, this argument must be named, and as randomForest it is only used with the formula-type call.)
x,formula
A data frame or a matrix of predictors, the columns represent the variables. Or a formula describing the model to be fitted.
y
A response vector (must be a factor for classification problems and numeric for regression ones).
err.interp
A vector of the mean OOB error rates of the embedded random forests models build during interpretation step (value err.interp of function VSURF.interp).
varselect.interp
A vector of indices of variables selected after interpretation step.
nfor.pred
Number of forests grown.
nmj
Number of times the mean jump is multiplied. See details below.
...
others parameters to be passed on to the VSURF function.

Value

  • An object of class VSURF.pred, which is a list with the following components:
  • varselect.predA vector of indices of variables selected after "prediction step".
  • err.predA vector of the mean OOB error rates of the random forests models build during the "prediction step".
  • mean.jumpThe mean jump value computed during the "prediction step".
  • num.varselect.predThe number of selected variables.
  • comput.timeComputation time.
  • callThe original call to VSURF.
  • termsTerms associated to the formula (only if formula-type call was used).

Details

nfor.pred embedded random forests models are grown, starting with the random forest build with only the most important variable. Variables are added to the model in a stepwise manner. The mean jump value mean.jump is calculated using variables that have been left out by interpretation step, and is set as the mean absolute difference between mean OOB errors of one model and its first following model. Hence a variable is included in the model if the mean OOB error decrease is larger than nmj * mean.jump.

References

Genuer, R. and Poggi, J.M. and Tuleau-Malot, C. (2010), Variable selection using random forests, Pattern Recognition Letters 31(14), 2225-2236

See Also

VSURF

Examples

Run this code
data(iris)
iris.thres <- VSURF.thres(x=iris[,1:4], y=iris[,5], ntree=100, nfor.thres=20)
iris.interp <- VSURF.interp(x=iris[,1:4], y=iris[,5], vars=iris.thres$varselect.thres,
                            nfor.interp=10)
iris.pred <- VSURF.pred(x=iris[,1:4], y=iris[,5], err.interp=iris.interp$err.interp,
                        varselect.interp=iris.interp$varselect.interp, nfor.pred=10)
iris.pred

# A more interesting example with toys data (see \code{\link{toys}})
# (a few minutes to execute)
data(toys)
toys.thres <- VSURF.thres(x=toys$x, y=toys$y)
toys.interp <- VSURF.interp(x=toys$x, y=toys$y, vars=toys.thres$varselect.thres)
toys.pred <- VSURF.pred(x=toys$x, y=toys$y, err.interp=toys.interp$err.interp,
                        varselect.interp=toys.interp$varselect.interp)
toys.pred

Run the code above in your browser using DataLab