Learn R Programming

FADA (version 1.1)

FA: Factor Adjusted Discriminant Analysis 1: Decorrelation of the data

Description

This function decorrelates the training dataset and, optionnally, the test dataset by adjusting data for the effects of latent factors of dependence.

Usage

FA(dta, test=NULL, nbf=NULL, maxnbfactors=12, nfolds=10,grouped=FALSE,
plot.diagnostic=FALSE, min.err = 0.001, verbose=TRUE)

Arguments

dta
A list containing the training dataset with the following components: x is the n x p matrix of explanatory variables, where n stands for the training sample size and p for the number of explanatory variables ; y is a numeric vect
test
A list containing the test dataset, with the same list structure as dta.
nbf
Number of factors. If nbf = NULL, the number of factors is estimated. nbf can also be set to a positive integer value. If nbf = 0, the data are not factor-adjusted.
maxnbfactors
The maximum number of factors. Default is maxnbfactors=12.
nfolds
Number of folds for estimation of lambda parameter in Lasso, which is used to estimate individual probabilities of group membership. Default is nfolds=10. The smallest value is nfolds = 3. To perform Leave-One-Out cross-validatio
grouped
If grouped=TRUE, a group Lasso penalty is applied in the multinomial case so that a selected variable is in the model for all groups or not. Default is grouped=FALSE
plot.diagnostic
If diagnostic.plot=TRUE, the values of the variance inflation criterion are plotted for each number of factors. Default is diagnostic.plot=FALSE. This option might be helpful to manually determine the optimal number of factors.
min.err
Threshold of convergence of the algorithm criterion. Default is min.err=0.001.
verbose
Print out number of factors and values of the objective criterion along the iterations. Default is TRUE.

Value

  • Returns a list with the following elements:
  • meanclassGroup means estimated after iterative decorrelation
  • fadtaDecorrelated training data
  • fatestDecorrelated testing data
  • PsiEstimation of the factor model parameters: specific variance
  • BEstimation of the factor model parameters: loadings
  • groupsRecall of group variable of training data

References

Friedman, J., Hastie, T. and Tibshirani, R. (2010), Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1-22.

Friguet, C., Kloareg, M. and Causeur, D. (2009), A factor model approach to multiple testing under dependence. Journal of the American Statistical Association, 104:488, 1406-1415.

Perthame, E., Friguet, C. and Causeur, D. (2014), Stability of feature selection in classification issues for high-dimensional correlated data, Submitted.

See Also

FADA-package FADA glmnet-package

Examples

Run this code
data(data.train)
data(data.test)
res = FA(data.train,data.test) #  when the optimal number of factors is unknown

### Not run 
# res0 = FA(data.train,data.test,nbf=2) #  when the number of factors is forced

Run the code above in your browser using DataLab