Builds a classification rule using regularized group covariance matrices that are supposed to be more robust against multicollinearity in the data.

`rda(x, ...)`# S3 method for default
rda(x, grouping = NULL, prior = NULL, gamma = NA,
lambda = NA, regularization = c(gamma = gamma, lambda = lambda),
crossval = TRUE, fold = 10, train.fraction = 0.5,
estimate.error = TRUE, output = FALSE, startsimplex = NULL,
max.iter = 100, trafo = TRUE, simAnn = FALSE, schedule = 2,
T.start = 0.1, halflife = 50, zero.temp = 0.01, alpha = 2,
K = 100, ...)
# S3 method for formula
rda(formula, data, ...)

x

Matrix or data frame containing the explanatory variables
(required, if `formula`

is not given).

formula

Formula of the form ‘`groups ~ x1 + x2 + ...`

’.

data

A data frame (or matrix) containing the explanatory variables.

grouping

(Optional) a vector specifying the class for
each observation; if not specified, the first column of
‘`data`

’ is taken.

prior

(Optional) prior probabilities for the classes.
Default: proportional to training sample sizes.
“`prior=1`

” indicates equally likely classes.

gamma, lambda, regularization

One or both of the rda-parameters may be fixed manually. Unspecified parameters are determined by minimizing the estimated error rate (see below).

crossval

Logical. If `TRUE`

, in the optimization
step the error rate is estimated by Cross-Validation,
otherwise by drawing several training- and test-samples.

fold

The number of Cross-Validation- or Bootstrap-samples to be drawn.

train.fraction

In case of Bootstrapping: the fraction of the data to be used for training in each Bootstrap-sample; the remainder is used to estimate the misclassification rate.

estimate.error

Logical. If `TRUE`

, the apparent
error rate for the final parameter set is estimated.

output

Logical flag to indicate whether text output during computation is desired.

startsimplex

(Optional) a starting simplex for the Nelder-Mead-minimization.

max.iter

Maximum number of iterations for Nelder-Mead.

trafo

Logical; indicates whether minimization is carrried out using transformed parameters.

simAnn

Logical; indicates whether Simulated Annealing shall be used.

schedule

Annealing schedule 1 or 2 (exponential or polynomial).

T.start

Starting temperature for Simulated Annealing.

halflife

Number of iterations until temperature is reduced to a half (schedule 1).

zero.temp

Temperature at which it is set to zero (schedule 1).

alpha

Power of temperature reduction (linear, quadratic, cubic,...) (schedule 2).

K

Number of iterations until temperature = 0 (schedule 2).

...

currently unused

A list of class `rda`

containing the following
components:

The (matched) function call.

vector containing the two regularization parameters (gamma, lambda)

the names of the classes

the prior probabilities for the classes

apparent error rate (if computation was not suppressed), and, if any optimization took place, the final (cross-validated or bootstrapped) error rate estimate as well.

Group means.

Array of group covariances.

Pooled covariance.

(Logical) indicator of convergence (only for Nelder-Mead).

Number of iterations actually performed (only for Nelder-Mead).

The explicit defintion of \(\gamma\), \(\lambda\) and the resulting covariance estimates is as follows:

The pooled covariance estimate \(\hat{\Sigma}\) is given as well as the individual covariance estimates \(\hat{\Sigma}_k\) for each group.

First, using \(\lambda\), a convex combination of these two is computed: $$\hat{\Sigma}_k (\lambda) := (1-\lambda) \hat{\Sigma}_k + \lambda \hat{\Sigma}.$$ Then, another convex combination is constructed using the above estimate and a (scaled) identity matrix: $$\hat{\Sigma}_k (\lambda,\gamma) = (1-\gamma)\hat{\Sigma}_k(\lambda)+ \gamma\frac{1}{d}\mathrm{tr}[\hat{\Sigma}_k(\lambda)]\mathrm{I}.$$ The factor \(\frac{1}{d}\mathrm{tr}[\hat{\Sigma}_k(\lambda)]\) in front of the identity matrix I is the mean of the diagonal elements of \(\hat{\Sigma}_k(\lambda)\), so it is the mean variance of all \(d\) variables assuming the group covariance \(\hat{\Sigma}_k(\lambda)\).

For the four extremes of (\(\gamma\),\(\lambda\)) the covariance structure reduces to special cases:

(\(\gamma=0\), \(\lambda=0\)): QDA - individual covariance for each group.

(\(\gamma=0\), \(\lambda=1\)): LDA - a common covariance matrix.

(\(\gamma=1\), \(\lambda=0\)): Conditional independent variables - similar to Naive Bayes, but variable variances within group (main diagonal elements) are equal.

(\(\gamma=1\), \(\lambda=1\)): Classification using euclidean distance - as in previous case, but variances are the same for all groups. Objects are assigned to group with nearest mean.

J.H. Friedman (see references below) suggested a method to fix almost singular covariance matrices in discriminant analysis. Basically, individual covariances as in QDA are used, but depending on two parameters (\(\gamma\) and \(\lambda\)), these can be shifted towards a diagonal matrix and/or the pooled covariance matrix. For (\(\gamma=0\), \(\lambda=0\)) it equals QDA, for (\(\gamma=0\), \(\lambda=1\)) it equals LDA.

You may fix these parameters at certain values or leave it to
the function to try to find “optimal” values. If one
parameter is given, the other one is determined using the
R-function ‘`optimize`

’. If no parameter is
given, both are determined numerically by a
Nelder-Mead-(Simplex-)algorithm with the option of using
Simulated Annealing.
The goal function to be minimized is the (estimated)
misclassification rate; the misclassification rate is estimated
either by Cross-Validation or by repeatedly dividing the data
into training- and test-sets (Boostrapping).

*Warning*: If these sets are small, optimization is expected
to produce almost random results. We recommend to adjust the
parameters manually in such a case.
In all other cases it is recommended to run the optimization
several times in order to see whether stable results are gained.

Since the Nelder-Mead-algorithm is actually intended for
*continuous* functions while the observed error rate
by its nature is *discrete*, a greater number of
Boostrap-samples might improve the optimization by increasing
the smoothness of the response surface (and, of course, by
reducing variance and bias).
If a set of parameters leads to singular covariance
matrices, a penalty term is added to the misclassification rate
which will hopefully help to maneuver back out of singularity
(so do not worry about error rates greater than one during
optimization).

Friedman, J.H. (1989): Regularized Discriminant Analysis.
In: *Journal of the American Statistical Association* 84,
165-175.

Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T. (1992):
*Numerical Recipes in C*. Cambridge: Cambridge University Press.

# NOT RUN { data(iris) x <- rda(Species ~ ., data = iris, gamma = 0.05, lambda = 0.2) predict(x, iris) # }