Learn R Programming

FADA (version 1.1)

FADA: Factor Adjusted Discriminant Analysis 2 : Supervised classification on decorrelated data

Description

This function performs supervised classification on factor-adjusted data.

Usage

FADA(faobject,nfold.cv, nbf.cv=NULL, method = c("glmnet", "sda", "sparseLDA"), 
    sda.method = c("lfdr", "HC"), stop.par = 10, lambda, lambda.var, 
    lambda.freqs, diagonal = FALSE, alpha = 0.1,nfolds = 10)

Arguments

faobject
An object returned by function FA.
nfold.cv
Number of folds to estimate classification error rate, only when no testing data is provided. This function computes leave-one-out CV if nfold.cv is as large as the sample size and computes balanced cross validation otherwise.
nbf.cv
Number of factors for cross validation to compute error rate, only when no testing data is provided. By default, nbf = NULL and the number of factors is estimated for each fold of the cross validation. nbf can also be set to a po
method
The method used to perform supervised classification model. 3 options are available. If method = "glmnet", a Lasso penalized logistic regression is performed using glmnet R package. If method = "sda", a LDA or DDA (see
sda.method
The method used for variable selection, only if method="sda". If sda.method="lfdr", variables are selected through CAT scores and False Non Discovery Rate control. If sda.method="HC", the variable selection method is Higher Crist
stop.par
This parameter controls the number of variables to include in the model, only if method="sparseLDA".
lambda
The shrinkage intensity of correlation matrix, if method="sda".
lambda.var
The shrinkage intensity of variances, if method="sda".
lambda.freqs
The shrinkage intensity of frequencies, if method="sda".
diagonal
If diagonal = TRUE, an assumption of independence is made and a shrunken diagonal discriminant analysis is performed using sda R package. If diagonal = FALSE, FADA performs shrunken linear discriminant analysis and t
alpha
The proportion of the HC objective to be observed, only if method="sda" and sda.method="HC". Default is 0.1.
nfolds
Number of folds for estimation of lambda parameter in Lasso, which is used to estimate individual probabilities. Default is nfolds=10. The smallest value is nfolds = 3. To perform Leave-One-Out cross-validation,nfolds can be set

Value

  • Returns a list with the following elements:
  • methodRecall of the classification method
  • selectedA vector containing index of the selected variables
  • proba.testA matrix containing predicted group frequencies of testing data, if a testing data set has been provided
  • predict.testA matrix containing predicted classes of testing data, if a testing data set has been provided
  • cv.errorA numeric value containing the average classification error, computed by cross validation, if no testing data set has been provided
  • beta0A vector containing intercept parameters of the classification model
  • betaA matrix containing shape coefficients of the classification model

References

Ahdesmaki, M. and Strimmer, K. (2010), Feature selection in omics prediction problems using cat scores and false non-discovery rate control. Annals of Applied Statistics, 4, 503-519.

Clemmensen, L., Hastie, T. and Witten, D. and Ersboll, B. (2011), Sparse discriminant analysis. Technometrics, 53(4), 406-413.

Friedman, J., Hastie, T. and Tibshirani, R. (2010), Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1-22.

Friguet, C., Kloareg, M. and Causeur, D. (2009), A factor model approach to multiple testing under dependence. Journal of the American Statistical Association, 104:488, 1406-1415.

Perthame, E., Friguet, C. and Causeur, D. (2014), Stability of feature selection in classification issues for high-dimensional correlated data, Submitted.

See Also

FADA, FA, sda, sda-package, glmnet-package

Examples

Run this code
data(data.train)
data(data.test)

# When testing data set is provided
res = FA(data.train, data.test)
classif = FADA(res,method="sda",sda.method="lfdr")

### Not run 
# When no testing data set is provided
# res = FA(data.train)
# classif = FADA(res,nfold.cv = 30, method="sda",sda.method="lfdr")

Run the code above in your browser using DataLab