klaR (version 0.5-5)

rda: Regularized Discriminant Analysis (RDA)

Description

Builds a classification rule using regularized group covariance matrices that are supposed to be more robust against multicollinearity in the data.

Usage

rda(x, ...)

## S3 method for class 'default':
rda(x, grouping = NULL, prior = NULL, gamma = NA, 
    lambda = NA, regularization = c(gamma = gamma, lambda = lambda), 
    crossval = TRUE, fold = 10, train.fraction = 0.5, 
    estimate.error = TRUE, output = FALSE, startsimplex = NULL, 
    max.iter = 100, trafo = TRUE, simAnn = FALSE, schedule = 2, 
    T.start = 0.1, halflife = 50, zero.temp = 0.01, alpha = 2, 
    K = 100, ...)
## S3 method for class 'formula':
rda(formula, data, ...)

Arguments

x
Matrix or data frame containing the explanatory variables (required, if formula is not given).
formula
Formula of the form groups ~ x1 + x2 + ....
data
A data frame (or matrix) containing the explanatory variables.
grouping
(Optional) a vector specifying the class for each observation; if not specified, the first column of data is taken.
prior
(Optional) prior probabilities for the classes. Default: proportional to training sample sizes. prior=1 indicates equally likely classes.
gamma, lambda, regularization
One or both of the rda-parameters may be fixed manually. Unspecified parameters are determined by minimizing the estimated error rate (see below).
crossval
Logical. If TRUE, in the optimization step the error rate is estimated by Cross-Validation, otherwise by drawing several training- and test-samples.
fold
The number of Cross-Validation- or Bootstrap-samples to be drawn.
train.fraction
In case of Bootstrapping: the fraction of the data to be used for training in each Bootstrap-sample; the remainder is used to estimate the misclassification rate.
estimate.error
Logical. If TRUE, the apparent error rate for the final parameter set is estimated.
output
Logical flag to indicate whether text output during computation is desired.
startsimplex
(Optional) a starting simplex for the Nelder-Mead-minimization.
max.iter
Maximum number of iterations for Nelder-Mead.
trafo
Logical; indicates whether minimization is carrried out using transformed parameters.
simAnn
Logical; indicates whether Simulated Annealing shall be used.
schedule
Annealing schedule 1 or 2 (exponential or polynomial).
T.start
Starting temperature for Simulated Annealing.
halflife
Number of iterations until temperature is reduced to a half (schedule 1).
zero.temp
Temperature at which it is set to zero (schedule 1).
alpha
Power of temperature reduction (linear, quadratic, cubic,...) (schedule 2).
K
Number of iterations until temperature = 0 (schedule 2).
...

Value

  • A list of class rda containing the following components:
  • callThe (matched) function call.
  • regularizationvector containing the two regularization parameters (gamma, lambda)
  • classesthe names of the classes
  • priorthe prior probabilities for the classes
  • error.rateapparent error rate (if computation was not suppressed), and, if any optimization took place, the final (cross-validated or bootstrapped) error rate estimate as well.
  • meansGroup means.
  • covariancesArray of group covariances.
  • covpooledPooled covariance.
  • converged(Logical) indicator of convergence (only for Nelder-Mead).
  • iterNumber of iterations actually performed (only for Nelder-Mead).

encoding

latin1

More details

The explicit defintion of $\gamma$, $\lambda$ and the resulting covariance estimates is as follows: The pooled covariance estimate $\hat{\Sigma}$ is given as well as the individual covariance estimates $\hat{\Sigma}_k$ for each group. First, using $\lambda$, a convex combination of these two is computed: $$\hat{\Sigma}_k (\lambda) := (1-\lambda) \hat{\Sigma}_k + \lambda \hat{\Sigma}.$$ Then, another convex combination is constructed using the above estimate and a (scaled) identity matrix: $$\hat{\Sigma}_k (\lambda,\gamma) = (1-\gamma)\hat{\Sigma}_k(\lambda)+ \gamma\frac{1}{d}\mathrm{tr}[\hat{\Sigma}_k(\lambda)]\mathrm{I}.$$ The factor $\frac{1}{d}\mathrm{tr}[\hat{\Sigma}_k(\lambda)]$ in front of the identity matrix I is the mean of the diagonal elements of $\hat{\Sigma}_k(\lambda)$, so it is the mean variance of all $d$ variables assuming the group covariance $\hat{\Sigma}_k(\lambda)$. For the four extremes of ($\gamma$,$\lambda$) the covariance structure reduces to special cases:
  • ($\gamma=0$,$\lambda=0$): QDA - individual covariance for each group.
  • ($\gamma=0$,$\lambda=1$): LDA - a common covariance matrix.
  • ($\gamma=1$,$\lambda=0$): Conditional independent variables - similar to Naive Bayes, but variable variances within group (main diagonal elements) are equal.
  • ($\gamma=1$,$\lambda=1$): Classification using euclidean distance - as in previous case, but variances are the same for all groups. Objects are assigned to group with nearest mean.

concept

  • Regularized Discriminant Analysis
  • Linear Discriminant Analysis
  • Quadratic Discriminant Analysis
  • robust against multicollinearity

Details

J.H. Friedman (see references below) suggested a method to fix almost singular covariance matrices in discriminant analysis. Basically, individual covariances as in QDA are used, but depending on two parameters ($\gamma$ and $\lambda$), these can be shifted towards a diagonal matrix and/or the pooled covariance matrix. For ($\gamma=0$, $\lambda=0$) it equals QDA, for ($\gamma=0$, $\lambda=1$) it equals LDA. You may fix these parameters at certain values or leave it to the function to try to find optimal values. If one parameter is given, the other one is determined using the R-function optimize. If no parameter is given, both are determined numerically by a Nelder-Mead-(Simplex-)algorithm with the option of using Simulated Annealing. The goal function to be minimized is the (estimated) misclassification rate; the misclassification rate is estimated either by Cross-Validation or by repeatedly dividing the data into training- and test-sets (Boostrapping). Warning: If these sets are small, optimization is expected to produce almost random results. We recommend to adjust the parameters manually in such a case. In all other cases it is recommended to run the optimization several times in order to see whether stable results are gained. Since the Nelder-Mead-algorithm is actually intended for continuous functions while the observed error rate by its nature is discrete, a greater number of Boostrap-samples might improve the optimization by increasing the smoothness of the response surface (and, of course, by reducing variance and bias). If a set of parameters leads to singular covariance matrices, a penalty term is added to the misclassification rate which will hopefully help to maneuver back out of singularity (so do not worry about error rates greater than one during optimization).

References

Friedman, J.H. (1989): Regularized Discriminant Analysis. In: Journal of the American Statistical Association 84, 165-175. Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T. (1992): Numerical Recipes in C. Cambridge: Cambridge University Press.

See Also

predict.rda, lda, qda

Examples

Run this code
data(iris)
x <- rda(Species ~ ., data = iris, gamma = 0.05, lambda = 0.2)
predict(x, iris)

Run the code above in your browser using DataCamp Workspace