Learn R Programming

missMDA (version 1.6)

estim_ncpMCA: Estimate the number of dimensions for the Multiple Correspondence Analysis by cross-validation

Description

Estimate the number of dimensions for the Multiple Correspondence Analysis by cross-validation

Usage

estim_ncpMCA(don, ncp.min=0, ncp.max=5, method.cv="gcv",
     nbsim=100, pNA=0.05, threshold=1e-4)

Arguments

don
a data.frame with categorical variables; with missing entries or not
ncp.min
integer corresponding to the minimum number of components to test
ncp.max
integer corresponding to the maximum number of components to test
method.cv
string with the values "gcv" for generalised cross-validation or "cv" for cross-validation
nbsim
number of simulations, useful only if method.cv="cv"
pNA
percentage of missing values added in the data set, useful only if method.cv="cv"
threshold
the threshold for assessing convergence

Value

  • ncpthe number of components retained for the MCA
  • criterionthe criterion (the MSEP) calculated for each number of components

Details

For the cross-validation, pNA percentage of missing values are removed at random and predicted with a MCA model using ncp.min to ncp.max dimensions. This process is repeated nbsim times. The number of components which leads to the smallest MSEP is retained. Each cell is predicted using the imputeMCA function, it means using the regularized iterative MCA algorithm. The cross-validation is time-consuming. For generalised cross-validation, the cross validation criterion is approximated.

References

Josse, J., Chavent, M., Liquet, B. and Husson, F. (2010). Handling missing values with Regularized Iterative Multiple Correspondence Analysis.

See Also

imputeMCA

Examples

Run this code
data(vnf)
result <- estim_ncpMCA(vnf,ncp.min=0, ncp.max=3)

Run the code above in your browser using DataLab