cv.msda: Cross-validation for DSDA/MSDA through function `msda`

Description

Performs K-fold cross validation for msda and returns the best tuning parameter \(\lambda\) in the user-specified or automatically generated choices.

Usage

cv.msda(x, y, model = NULL, nfolds = 5, lambda = NULL,
 lambda.opt = "min", ...)

Arguments

Input matrix of predictors. x is of dimension \(N \times p\); each row is an observation vector.

Class label. For K class problems, y takes values in \(\{1,\cdots,\code{K}\}\).

model

Method type. The model argument can be one of 'binary', 'multi.original', 'multi.modified' and the default is NULL. The function supports fitting DSDA and MSDA models by specifying method type. Without specification, the function will automatically choose one of the methods. If the response variable is binary, the function will fit a DSDA model. If the response variable is multi-class, the function will fit an original MSDA model for dimension \(p<=2000\) and a modified MSDA model for dimension \(p>2000\).

nfolds

Number of folds. Default value is 5. Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is nfolds=3 for multi.original and multi.modified.

lambda

User-specified lambda sequence for cross validation. If not specified, the algorithm will generate a sequence of lambdas based on all data and cross validate on the sequence.

lambda.opt

The optimal criteria when multiple elements in lambda return the same minimum classification error. "min" will return the smallest lambda with minimum cross validation error. "max" will return the largest lambda with the minimum cross validation error.

…

other arguments that can be passed to msda.

Value

An object of class cv.dsda or cv.msda.original or cv.msda.modified is returned, which is a list with the ingredients of the cross-validation fit.

lambda

The actual lambda sequence used. The user specified sequence or automatically generated sequence could be truncated by constraints on dfmax and pmax.

cvm

The mean of cross validation errors for each lambda.

cvsd

The standard error of cross validaiton errors for each lambda.

lambda.min

The lambda with minimum cross validation error. If lambda.opt is min, then returns the smallest lambda with minimum cross validation error. If lambda.opt is max, then returns the largest lambda with minimum cross validation error.

lambda.1se

The largest value of lambda such that error is within one standard error of the minimum. This arguement is only available for object cv.msda.original and cv.msda.modified.

model.fit

A fitted cv.dsda or cv.msda.original or cv.msda.modified object for the full data.

Details

The function cv.msda runs function msda nfolds+1 times. The first one fits model on all data. If lambda is specified, it will check if all lambda satisfies the constraints of dfmax and pmax in msda. If not, a lambda sequence will be generated according to lambda.factor in msda. Then the rest nfolds many replicates will fit model on nfolds-1 many folds data and predict on the omitted fold, repectively. Return the lambda with minimum average cross validation error and the largest lambda within one standard error of the minimum.

Similar as msda, user can specify which method to use by inputing argument model. Without specification, the function can automatically decide the method by number of classes and variables.

References

Mai, Q., Zou, H. and Yuan, M. (2012), "A direct approach to sparse discriminant analysis in ultra-high dimensions." Biometrica, 99, 29-42.

Mai, Q., Yang, Y., and Zou, H. (2017), "Multiclass sparse discriminant analysis." Statistica Sinica, in press.

URL: https://github.com/emeryyi/msda

Examples

Run this code

# NOT RUN {
data(GDS1615)
x <- GDS1615$x
y <- GDS1615$y
obj.cv <- cv.msda(x=x, y=y, nfolds=5, lambda.opt="max")
lambda.min <- obj.cv$lambda.min
obj <- msda(x=x, y=y, lambda=lambda.min)
pred <- predict(obj,x)
# }

Run the code above in your browser using DataLab