FADA: Factor Adjusted Discriminant Analysis 2 : Supervised classification on decorrelated data

Description

This function performs supervised classification on factor-adjusted data.

Usage

FADA(faobject,nfold.cv, nbf.cv=NULL, method = c("glmnet", "sda", "sparseLDA"), 
    sda.method = c("lfdr", "HC"), stop.par = 10, lambda, lambda.var, 
    lambda.freqs, diagonal = FALSE, alpha = 0.1,nfolds = 10)

Arguments

faobject

An object returned by function FA.

nfold.cv

Number of folds to estimate classification error rate, only when no testing data is provided. This function computes leave-one-out CV if nfold.cv is as large as the sample size and computes balanced cross validation otherwise.

nbf.cv

Number of factors for cross validation to compute error rate, only when no testing data is provided. By default, nbf = NULL and the number of factors is estimated for each fold of the cross validation. nbf can also be set to a po

method

The method used to perform supervised classification model. 3 options are available. If method = "glmnet", a Lasso penalized logistic regression is performed using glmnet R package. If method = "sda", a LDA or DDA (see

sda.method

The method used for variable selection, only if method="sda". If sda.method="lfdr", variables are selected through CAT scores and False Non Discovery Rate control. If sda.method="HC", the variable selection method is Higher Crist

stop.par

This parameter controls the number of variables to include in the model, only if method="sparseLDA".

lambda

The shrinkage intensity of correlation matrix, if method="sda".

lambda.var

The shrinkage intensity of variances, if method="sda".

lambda.freqs

The shrinkage intensity of frequencies, if method="sda".

diagonal

If diagonal = TRUE, an assumption of independence is made and a shrunken diagonal discriminant analysis is performed using sda R package. If diagonal = FALSE, FADA performs shrunken linear discriminant analysis and t

alpha

The proportion of the HC objective to be observed, only if method="sda" and sda.method="HC". Default is 0.1.

nfolds

Number of folds for estimation of lambda parameter in Lasso, which is used to estimate individual probabilities. Default is nfolds=10. The smallest value is nfolds = 3. To perform Leave-One-Out cross-validation,nfolds can be set

Value

Returns a list with the following elements:
methodRecall of the classification method
selectedA vector containing index of the selected variables
proba.testA matrix containing predicted group frequencies of testing data, if a testing data set has been provided
predict.testA matrix containing predicted classes of testing data, if a testing data set has been provided
cv.errorA numeric value containing the average classification error, computed by cross validation, if no testing data set has been provided
beta0A vector containing intercept parameters of the classification model
betaA matrix containing shape coefficients of the classification model

References

Ahdesmaki, M. and Strimmer, K. (2010), Feature selection in omics prediction problems using cat scores and false non-discovery rate control. Annals of Applied Statistics, 4, 503-519.

Clemmensen, L., Hastie, T. and Witten, D. and Ersboll, B. (2011), Sparse discriminant analysis. Technometrics, 53(4), 406-413.

Friedman, J., Hastie, T. and Tibshirani, R. (2010), Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1-22.

Friguet, C., Kloareg, M. and Causeur, D. (2009), A factor model approach to multiple testing under dependence. Journal of the American Statistical Association, 104:488, 1406-1415.

Perthame, E., Friguet, C. and Causeur, D. (2014), Stability of feature selection in classification issues for high-dimensional correlated data, Submitted.

Examples

Run this code

data(data.train)
data(data.test)

# When testing data set is provided
res = FA(data.train, data.test)
classif = FADA(res,method="sda",sda.method="lfdr")

### Not run 
# When no testing data set is provided
# res = FA(data.train)
# classif = FADA(res,nfold.cv = 30, method="sda",sda.method="lfdr")

Run the code above in your browser using DataLab