Learn R Programming

missMDA (version 1.9)

estim_ncpFAMD: Estimate the number of dimensions for the Factorial Analysis of Mixed Data by cross-validation

Description

Estimate the number of dimensions for the Factorial Analysis of Mixed Data by cross-validation

Usage

estim_ncpFAMD(don, ncp.min=0, ncp.max=5,  method = c("Regularized","EM"), 
     method.cv = c("Kfold","loo"), nbsim=100, pNA=0.05, threshold=1e-4)

Arguments

don
a data.frame with categorical variables; with missing entries or not
ncp.min
integer corresponding to the minimum number of components to test
ncp.max
integer corresponding to the maximum number of components to test
method
"Regularized" by default or "EM"
method.cv
"Kfold" for cross-validation or "loo" for leave-one-out
nbsim
number of simulations, useful only if method.cv="Kfold"
pNA
percentage of missing values added in the data set, useful only if method.cv="Kfold"
threshold
the threshold for assessing convergence

Value

  • ncpthe number of components retained for the FAMD
  • criterionthe criterion (the MSEP) calculated for each number of components

Details

For leave-one-out cross-validation (method.cv="loo"), each cell of the data matrix is alternatively removed and predicted with a FAMD model using ncp.min to ncp.max dimensions. The number of components which leads to the smallest mean square error of prediction (MSEP) is retained. For the Kfold cross-validation (method.cv="Kfold"), pNA percentage of missing values is inserted at random in the data matrix and predicted with a FAMD model using ncp.min to ncp.max dimensions. This process is repeated nbsim times. The number of components which leads to the smallest MSEP is retained. More precisely, for both cross-validation methods, the missing entries are predicted using the imputeFAMD function, it means using it means using the regularized iterative FAMD algorithm (method="Regularized") or the iterative FAMD algorithm (method="EM"). The regularized version is more appropriate to avoid overfitting issues.

References

Audigier, V., Husson, F. & Josse, J. (2014). A principal components method to impute mixed data. Advances in Data Analysis and Classification

See Also

imputeFAMD

Examples

Run this code
data(ozone)
result <- estim_ncpFAMD(ozone)

Run the code above in your browser using DataLab