FADA-package: Variable selection for supervised classification in high dimension

Description

The functions provided in the FADA (Factor Adjusted Discriminant Analysis) package aim at performing supervised classification of high-dimensional and correlated profiles. The procedure combines adecorrelation step based on a factor modeling of the dependence among covariates and a classification method. The available methods are Lasso regularized logistic model (see Friedman et al. (2010)), sparse linear discriminant analysis (see Clemmensen et al. (2011)), shrinkage linear and diagonal discriminant analysis (see M. Ahdesmaki et al. (2010)).

Arguments

Details

ll{ Package: FADA Type: Package Version: 1.0 Date: 2014-04-30 License: GPL (>= 2) } The functions available in this package are used in this order:

Step 1: Decorrelation of the training (and testing) dataset using a factor model of the covariance by theFAfunction. The number of factors of the model can be estimated or forced. Training and testing data set can be decorrelated together by theFAfunction or one after the other by using first theFAfunction to decorrelate the training data set and afterward thedecorrelatefunction to decorrelate the testing data set.
Step 2: Estimation of a supervised classification model using the decorrelated training dataset by theFADAfunction. User can choose among several classification methods (more details in the manual ofFADAfunction).
Step 3: If needed, computation of the error rate by theFADAfunction, either using a supplementary test dataset or by k-fold cross-validation.

References

Ahdesmaki, M. and Strimmer, K. (2010), Feature selection in omics prediction problems using cat scores and false non-discovery rate control. Annals of Applied Statistics, 4, 503-519.

Clemmensen, L., Hastie, T. and Witten, D. and Ersboll, B. (2011), Sparse discriminant analysis. Technometrics, 53(4), 406-413.

Friedman, J., Hastie, T. and Tibshirani, R. (2010), Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1-22.

Friguet, C., Kloareg, M. and Causeur, D. (2009), A factor model approach to multiple testing under dependence. Journal of the American Statistical Association, 104:488, 1406-1415.

Perthame, E., Friguet, C. and Causeur, D. (2014), Stability of feature selection in classification issues for high-dimensional correlated data, Submitted.

Examples

Run this code

### Not run 
 ### example of an entire analysis with FADA package if a testing data set is available
 ### loading data
 # data(data.train)
 # data(data.test)
 
 # dim(data.train$x) # 30 250
 # dim(data.test$x) # 1000 250
 
 ### decorrelation step
 # res = FA(data.train,data.test) # Optimal number of factors is 2
 
 ### decorrelation of the training data set only
 # res = FA(data.train)
 ### decorrelation of the testing data set afterward
 # res2 = decorrelate(res,data.test)

 ### classification step with sda, using local false discovery rate for variable selection
 ### linear discriminant analysis
 # FADA.LDA = FADA(res,method="sda",sda.method="lfdr")
 
 ### diagonal discriminant analysis 
 # FADA.DDA =  FADA(res, method="sda",sda.method="lfdr",diagonal=TRUE)


### example of an entire analysis with FADA package if no testing data set is available
 ### loading data
 
 ### decorrelation step
 # res = FA(data.train) # Optimal number of factors is 2
 
 ### classification step with sda, using local false discovery rate for variable selection
 ### linear discriminant analysis, error rate is computed by leave-one-out CV
 # FADA.LDA = FADA(res,nfold.cv = length(data.train$y),method="sda",sda.method="lfdr")

Run the code above in your browser using DataLab