Learn R Programming

missMDA (version 1.6)

imputePCA: Impute dataset with PCA

Description

Impute the missing values of a dataset with the Principal Components Analysis model. Can be used as a preliminary step before performing a PCA on an incomplete dataset.

Usage

imputePCA(X, ncp = 2, scale = TRUE, method = "Regularized", 
       row.w=NULL, coeff.ridge=1, threshold = 1e-06, seed = NULL, nb.init = 1,  
	   maxiter = 1000, ...)

Arguments

X
a data.frame with continuous variables containing missing values
ncp
integer corresponding to the number of components used to reconstruct data with the PCA reconstruction formulae
scale
boolean. By default TRUE leading to a same weight for each variable
method
"Regularized" by default or "EM"
row.w
an optional row weights (by default, a vector of 1 over the number of rows for uniform row weights)
coeff.ridge
a positive coefficient that permits to shrink the eigenvalues more than by the mean of the last eigenvalues (by default, 1 the eigenvalues are shrunk by the mean of the last eigenvalues; a coefficient between 1 and 2 is required)
threshold
the threshold for assessing convergence
seed
a single value, interpreted as an integer for the set.seed function (if seed = NULL, missing values are initially imputed by the mean of each variable)
nb.init
integer corresponding to the number of random initializations; the first initialization is the mean of each variable
maxiter
integer, maximum number of iteration for the algorithm
...
further arguments passed to or from other methods

Value

  • completeObsthe imputed dataset; the observed values for non-missing entries and the imputed values for missing values
  • reconthe reconstructed data

Details

Impute the missing entries of a data frame using the iterative PCA algorithm (EM) or a regularized iterative PCA algorithm. The iterative PCA algorithm first imputes the missing values with initial values (the means of each variable), then performs PCA on the completed dataset, imputes the missing values with the reconstruction formulae of order ncp and iterates until convergence. The regularized version allows to avoid overfitting problems, especially important when there are many missing values.

References

J. Josse, F. Husson et J. Pag�s (2009) Gestion des donn�es manquantes en Analyse en Composantes Principales. Journal de la SFdS. 150 (2), pp. 28-51. Josse, J., Husson, F. (2010). Multiple Imputation in PCA.

See Also

estim_ncpPCA,MIPCA

Examples

Run this code
data(orange)
## First the number of components has to be chosen 
##   (for the reconstruction step)
## nb <- estim_ncpPCA(orange,ncp.max=5) ## Time consuming, nb = 2

## Imputation
res.comp <- imputePCA(orange,ncp=2)

## A PCA can be performed
res.pca <- PCA(res.comp$completeObs)

Run the code above in your browser using DataLab