MachineShop (version 3.7.0)

rfe: Recursive Feature Elimination

Description

A wrapper method of backward feature selection in which a given model is fit to nested subsets of most important predictor variables in order to select the subset whose resampled predictive performance is optimal.

Usage

rfe(...)

# S3 method for formula rfe(formula, data, model, ...)

# S3 method for matrix rfe(x, y, model, ...)

# S3 method for ModelFrame rfe(input, model, ...)

# S3 method for recipe rfe(input, model, ...)

# S3 method for ModelSpecification rfe( object, select = NULL, control = MachineShop::settings("control"), props = 4, sizes = integer(), random = FALSE, recompute = TRUE, optimize = c("global", "local"), samples = c(rfe = 1, varimp = 1), metrics = NULL, stat = c(resample = MachineShop::settings("stat.Resample"), permute = MachineShop::settings("stat.TrainingParams")), progress = FALSE, ... )

# S3 method for MLModel rfe(model, ...)

# S3 method for MLModelFunction rfe(model, ...)

Value

TrainingStep class object containing a summary of the numbers of predictor variables retained (size), their names (terms), logical indicators for the optimal model selected (selected), and associated performance metrics (metrics).

Arguments

...

arguments passed from the generic function to its methods, from the MLModel and MLModelFunction methods to first arguments of others, and from others to the ModelSpecification method. The first argument of each fit method is positional and, as such, must be given first in calls to them.

formula, data

formula defining the model predictor and response variables and a data frame containing them.

model

model function, function name, or object; or another object that can be coerced to a model. A model can be given first followed by any of the variable specifications.

x, y

matrix and object containing predictor and response variables.

input

input object defining and containing the model predictor and response variables.

object

model input or specification.

select

expression indicating predictor variables that can be eliminated (see subset for syntax) [default: all].

control

control function, function name, or object defining the resampling method to be employed.

props

numeric vector of the proportions of most important predictor variables to retain in fitted models or an integer number of equal spaced proportions to generate automatically; ignored if sizes are given.

sizes

integer vector of the set sizes of most important predictor variables to retain.

random

logical indicating whether to eliminate variables at random with probabilities proportional to their importance.

recompute

logical indicating whether to recompute variable importance after eliminating each set of variables.

optimize

character string specifying a search through all props to identify the globally optimal model ("global") or a search that stops after identifying the first locally optimal model ("local").

samples

numeric vector or list giving the number of permutation samples for each of the rfe and varimp algorithms. One or both of the values may be specified as named arguments or in the order in which their defaults appear. Larger numbers of samples decrease variability in estimated model performances and variable importances at the expense of increased computation time. Samples are more expensive computationally for rfe than for varimp.

metrics

metric function, function name, or vector of these with which to calculate performance. If not specified, default metrics defined in the performance functions are used.

stat

functions or character strings naming functions to compute summary statistics on resampled metric values and permuted samples. One or both of the values may be specified as named arguments or in the order in which their defaults appear.

progress

logical indicating whether to display iterative progress during elimination.

See Also

performance, plot, summary, varimp

Examples

Run this code
# \donttest{
## Requires prior installation of suggested package gbm to run

(res <- rfe(sale_amount ~ ., data = ICHomes, model = GBMModel))
summary(res)
summary(performance(res))
plot(res, type = "line")
# }

Run the code above in your browser using DataLab