mice.impute.2l.pls2: Imputation using Partial Least Squares for Dimension Reduction

Description

This function imputes a variable with missing values using PLS regression (Mevik & Wehrens, 2007) for a dimension reduction of the predictor space.

Usage

mice.impute.2l.pls2(y, ry, x, type, pls.facs = NULL, 
   pls.impMethod = "pmm", pls.print.progress = TRUE, 
   imputationWeights = rep(1, length(y)), pcamaxcols = 1E+09, 
   tricube.pmm.scale = NULL, min.int.cor = 0, min.all.cor=0, 
   N.largest = 0, pls.title = NULL, print.dims = TRUE,
   pls.maxcols=5000 , ...)

mice.impute.2l.pls(y, ry, x, type, pls.facs = NULL, 
   pls.impMethod = "tricube.pmm2", pls.method = NULL, 
   pls.print.progress = TRUE, imputationWeights = rep(1, length(y)), 
   pcamaxcols = 1E+09, tricube.pmm.scale = NULL, min.int.cor = 0, min.all.cor=0, 
   N.largest = 0, pls.title = NULL, print.dims = TRUE, ...)

Arguments

Incomplete data vector of length n

Vector of missing data pattern (FALSE -- missing, TRUE -- observed)

Matrix (n x p) of complete covariates.

type

type=1 -- variable is used as a predictor, type=4 -- create interactions with the specified variable with all other predictors, type=5 -- create a quadratic term of the specified variable type=6

pls.facs

Number of factors used in PLS regression. This argument can also be specified as a list defining different numbers of factors for all variables to be imputed.

pls.impMethod

Imputation method based in the PLS regression model: norm -- normal linear regression pmm -- predictive mean matching (pmm method from mice) pmm5 -- predictive mean matching (pmm

pls.method

Calculation method of PLS regression. See pls::plsr (pls) for more details.

pls.print.progress

Print progress during PLS regression.

imputationWeights

Vector of sample weights to be used in imputation models.

pcamaxcols

Maximum number of principal components.

tricube.pmm.scale

Scale factor for tricube predictive mean matching.

min.int.cor

Minimum absolute correlation for an interaction of two predictors to be included in the PLS regression model

min.all.cor

Minimum absolute correlation for inclusion in the PLS regression model.

N.largest

Number of variable to be included which do have the largest absolute correlations.

pls.title

Title for progress print in console output.

print.dims

An optional logical indicating whether dimensions of inputs should be printed.

pls.maxcols

Maximum number of interactions to be created.

...

Further arguments to be passed.

Value

A vector of length nmis=sum(!ry) with imputations if pls.impMethod != "xplsfacs". In case of pls.impMethod == "xplsfacs" a matrix with PLS factors is computed.

Details

The function mice.impute.2l.pls2 uses kernelpls.fit2 instead of kernelpls.fit from the pls package and is a bit faster.

References

Mevik, B. H., & Wehrens, R. (2007). The pls package: Principal component and partial least squares regression in R. Journal of Statistical Software, 18, 1-24.

Examples

Run this code

#############################################################################
# EXAMPLE 1: PLS imputation method for internet data
#############################################################################	

data(data.internet)
dat <- data.internet

# specify predictor matrix
predictorMatrix <- matrix( 1 , ncol(dat) , ncol(dat) )
rownames(predictorMatrix) <- colnames(predictorMatrix) <- colnames(dat)
diag( predictorMatrix) <- 0

# use PLS imputation method for all variables
impMethod <- rep( "2l.pls2" , ncol(dat) )
names(impMethod) <- colnames(dat)

# define predictors for interactions (entries with type 4 in predictorMatrix)
predictorMatrix[c("IN1","IN15","IN16"),c("IN1","IN3","IN10","IN13")] <- 4
# define predictors which should appear as linear and quadratic terms (type 5)
predictorMatrix[c("IN1","IN8","IN9","IN10","IN11"),c("IN1","IN2","IN7","IN5")] <- 5

# use 9 PLS factors for all variables
pls.facs <- as.list( rep( 9 , length(impMethod) ) )
names(pls.facs) <- names(impMethod)
pls.facs$IN1 <- 15   # use 15 PLS factors for variable IN1

# choose norm or pmm imputation method
pls.impMethod <- as.list( rep("norm" , length(impMethod) ) )
names(pls.impMethod) <- names(impMethod)
pls.impMethod[ c("IN1","IN6")] <- "pmm5"   

# Model 1: Three parallel chains
imp1 <- mice(data = dat , imputationMethod = impMethod ,  
     m=3 , maxit=5 , predictorMatrix = predictorMatrix ,
     pls.facs = pls.facs , # number of PLS factors
     pls.impMethod = pls.impMethod ,  # Imputation Method in PLS imputation
     pls.print.progress = TRUE )
summary(imp1)

# Model 2: One long chain
imp2 <- mice.1chain(data = dat , imputationMethod = impMethod ,  
     burnin=10 , iter=21 , Nimp=3 , predictorMatrix = predictorMatrix ,
     pls.facs = pls.facs , pls.impMethod = pls.impMethod )
summary(imp2)

Run the code above in your browser using DataLab