klaR (version 0.6-2)

stepclass: Stepwise variable selection for classification

Description

Forward/backward variable selection for classification using any specified classification function and selecting by estimated classification performance measure from ucpm.

Usage

stepclass(x, ...)

## S3 method for class 'default':
stepclass(x, grouping, method, improvement = 0.05, maxvar = Inf, 
    start.vars = NULL, direction = c("both", "forward", "backward"), 
    criterion = "CR",  fold = 10, cv.groups = NULL, output = TRUE, 
    min1var = TRUE, ...)
## S3 method for class 'formula':
stepclass(formula, data, method, ...)

Arguments

x
matrix or data frame containing the explanatory variables (required, if formula is not given).
formula
A formula of the form groups ~ x1 + x2 + .... That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators. Interaction terms are not supported.
data
data matrix (rows=cases, columns=variables)
grouping
class indicator vector (a factor)
method
character, name of classification function (e.g. lda).
improvement
least improvement of performance measure desired to include or exclude any variable (
maxvar
maximum number of variables in model
start.vars
set variables to start with (indices or names). Default is no variables if direction is forward or both, and all variables if directi
direction
forward, backward or both (default)
criterion
performance measure taken from ucpm.
fold
parameter for cross-validation; omitted if cv.groups is specified.
cv.groups
vector of group indicators for cross-validation. By default assigned automatically.
output
indicator (logical) for textoutput during computation (slows down computation!)
min1var
logical, whether to include at least one variable in the model, even if the prior itself already is a reasonable model.
...
further parameters passed to classification function (method), e.g. priors etc.

Value

  • An object of class stepclass containing the following components:
  • callthe (matched) function call.
  • methodname of classification function used (e.g. lda).
  • start.variablesvector of starting variables.
  • processdata frame showing selection process (included/excluded variables and performance measure).
  • modelthe final model: data frame with 2 columns; indices and names of variables.
  • perfomance.measurevalue of the criterion used by ucpm
  • formulaformula of the form response ~ list + of + selected + variables

encoding

latin1

concept

Stepwise variable selection in classification

Details

The classification method (e.g. lda) must have its own predict method (like predict.lda for lda) that either returns a matrix of posterior probabilities or a list with an element posterior containing that matrix instead. It must be able to deal with matrices as in method(x, grouping, ...) Then a stepwise variable selection is performed. The initial model is defined by the provided starting variables; in every step new models are generated by including every single variable that is not in the model, and by excluding every single variable that is in the model. The resulting performance measure for these models are estimated (by cross-validation), and if the maximum value of the chosen criterion is better than improvement plus the value so far, the corresponding variable is in- or excluded. The procedure stops, if the new best value is not good enough, or if the specified maximum number of variables is reached. If direction is forward, the model is only extended (by including further variables), if direction is backward, the model is only reduced (by excluding variables from the model).

See Also

step, stepAIC, and greedy.wilks for stepwise variable selection according to Wilk's lambda

Examples

Run this code
data(iris)
library(MASS)
iris.d <- iris[,1:4]  # the data    
iris.c <- iris[,5]    # the classes 
sc_obj <- stepclass(iris.d, iris.c, "lda", start.vars = "Sepal.Width")
sc_obj
plot(sc_obj)

## or using formulas:
sc_obj <- stepclass(Species ~ ., data = iris, method = "qda", 
    start.vars = "Sepal.Width", criterion = "AS")  # same as above 
sc_obj
## now you can say stuff like
## qda(sc_obj$formula, data = B3)

Run the code above in your browser using DataLab