jomoImpute: Impute multilevel missing data using `jomo`

Description

This function provides a interface to the jomo package using the MCMC algorithms presented in Carpenter & Kenward (2013). Using this wrapper function, jomo supports imputation of (mixed) categorical and continuous variables (Goldstein et al., 2009) as well as imputation using random (residual) covariance matrices (Yucel, 2011). Imputations can be generated using type or formula, which offer different options for model specification.

Usage

jomoImpute(data, type, formula, random.L1=c("none","mean","full"), n.burn=5000,
  n.iter=100, m=10, group=NULL, prior=NULL, seed=NULL, save.pred=FALSE,
  silent=FALSE)

Arguments

data

A data frame containing incomplete and auxiliary variables, the cluster indicator variable, and any other variables that should be present in the imputed datasets.

type

An integer vector specifying the role of each variable in the imputation model (see details).

formula

A formula specifying the role of each variable in the imputation model. The basic model is constructed by model.matrix, thus allowing to include derived variables in the imputation model using I() (see details and examples).

random.L1

A character string denoting if the covariance matrix of residuals should vary across groups and how the values of these matrices are stored (see details). Default is to "none", assuming a common covariance matrix across clusters.

n.burn

The number of burn-in iterations before any imputations are drawn. Default is to 5,000.

n.iter

The number of iterations between imputations. Default is to 100.

The number of imputed data sets to generate. Default is to 10.

group

(optional) A character string denoting the name of an additional grouping variable to be used with the formula argument. When specified, the imputation model is run separately within each of these groups.

prior

(optional) A list with components Binv, Dinv, and a for specifying prior distributions for the covariance matrix of random effects and the covariance matrix of residuals (see details). Default is to using least-infor

seed

(optional) An integer value initializing R's random number generator for reproducible results. Default is to using the global seed.

save.pred

(optional) Logical flag indicating if variables derived using formula should be included in the imputed data sets. Default is to FALSE.

silent

(optional) Logical flag indicating if console output should be suppressed. Default is to FALSE.

Value

Returns an object of class mitml. A mitml class object is a list, each containing the following components:
dataThe original (incomplete) data set that has been sorted according to the cluster variable and (if given) the grouping variable. An attribute "sort" contains the original row order. An attribute "group" contains the optional grouping variable.
replacement.matA matrix containing the multiple replacements (i.e., imputations) for each missing value. The replacement matrix contains one row for each missing value and one one column for each imputed data set.
index.matA matrix containing the row and column index for each missing value. The index matrix is used to link the missing values in the data set with their corresponding rows in the replacement matrix.
callThe matched function call.
modelA list containing the names of the cluster variable, the target variables, and the predictor variables with fixed and random effects, respectively.
random.L1A character string denoting the handling of random residual covariance matrices (see details).
priorThe prior parameters used in the imputation model.
iterA list containing the number of burn-in iterations, the number of iterations between imputations, and the number of imputed data sets.
par.burninA multi-dimensional array containing the parameters of the imputation model from the burn-in phase.
par.imputationA multi-dimensional array containing the parameters of the imputation model from the imputation phase.

code

jomo

itemize

Binv: scale matrix for the residual covariance matrix

item

+: adds target or predictor variables to the model
*: adds an interaction term of two or more predictors
|: denotes cluster-specific random effects and specifies the cluster indicator (i.e., 1|ID)
I(): defines functions to be interpreted by model.matrix
Dinv: scale matrix for the covariance matrix of random effects
a: starting value for the degrees of freedom of random covariance matrices of residuals (only used with random.L1="mean" or random.L1="full")

emph

not

Details

This function serves as an interface to the jomo package. The function supports imputation of multilevel continuous and categorical data. In order for categorical variables to be detected correctly, these must be formatted as a factor variables (see examples). The imputation model can be specified using either the type or the formula argument.

The type interface is designed to provide quick-and-easy imputations using jomo. The type argument must be an integer vector denoting the role of each variable in the imputation model:

1: target variables containing missing data

2: predictors with fixed effect on all targets (completely observed) 3: predictors with random effect on all targets (completely observed) -1: grouping variable within which the imputation is run separately -2: cluster indicator variable 0: variables not featured in the model

References

Carpenter, J. R., & Kenward, M. G. (2013). Multiple imputation and its application. Hoboken, NJ: Wiley.

Goldstein, H., Carpenter, J., Kenward, M. G., & Levin, K. A. (2009). Multilevel models with multivariate mixed response types. Statistical Modelling, 9, 173-197.

Yucel, R. M. (2011). Random covariances and mixed-effects models for imputing multivariate multilevel continuous data. Statistical Modelling, 11, 351-370.

Examples

Run this code

# NOTE: The number of iterations in these examples is much lower than it
# should be! This is done in order to comply with CRAN policies, and more
# iterations are recommended for applications in practice!

data(studentratings)

# *** ................................
# the 'type' interface
# 

# * Example 1.1: 'ReadDis' and 'SES', predicted by 'ReadAchiev' and 
# 'CognAbility', with random slope for 'ReadAchiev'

type <- c(-2,0,0,0,0,0,3,1,2,0)
names(type) <- colnames(studentratings)
type

imp <- jomoImpute(studentratings, type=type, n.burn=100, n.iter=10, m=5)

# * Example 1.2: 'ReadDis' and 'SES' groupwise for 'FedState',
# and predicted by 'ReadAchiev'

type <- c(-2,-1,0,0,0,0,2,1,0,0)
names(type) <- colnames(studentratings)
type

imp <- jomoImpute(studentratings, type=type, n.burn=100, n.iter=10, m=5)

# *** ................................
# the 'formula' interface
# 

# * Example 2.1: imputation of 'ReadDis', predicted by 'ReadAchiev'
# (random intercept)

fml <- ReadDis ~ ReadAchiev + (1|ID)
imp <- jomoImpute(studentratings, formula=fml, n.burn=100, n.iter=10, m=5)

# ... the intercept can be suppressed using '0' or '-1' (here for fixed intercept)
fml <- ReadDis ~ 0 + ReadAchiev + (1|ID)
imp <- jomoImpute(studentratings, formula=fml, n.burn=100, n.iter=10, m=5)

# * Example 2.2: imputation of 'ReadDis', predicted by 'ReadAchiev'
# (random slope)

fml <- ReadDis ~ ReadAchiev + (1+ReadAchiev|ID)
imp <- jomoImpute(studentratings, formula=fml, n.burn=100, n.iter=10, m=5)

# * Example 2.3: imputation of 'ReadDis', predicted by 'ReadAchiev',
# groupwise for 'FedState'

fml <- ReadDis ~ ReadAchiev + (1|ID)
imp <- jomoImpute(studentratings, formula=fml, group="FedState", n.burn=100,
n.iter=10, m=5)

# * Example 2.4: imputation of 'ReadDis', predicted by 'ReadAchiev'
# including the cluster mean of 'ReadAchiev' as an additional predictor

fml <- ReadDis ~ ReadAchiev + I(clusterMeans(ReadAchiev,ID)) + (1|ID)
imp <- jomoImpute(studentratings, formula=fml, n.burn=100, n.iter=10, m=5)

# ... using 'save.pred' to save the calculated cluster means in the data set
fml <- ReadDis ~ ReadAchiev + I(clusterMeans(ReadAchiev,ID)) + (1|ID)
imp <- jomoImpute(studentratings, formula=fml, n.burn=100, n.iter=10, m=5,
save.pred=TRUE)

head(mitmlComplete(imp,1))

# * Example 2.5: imputation of 'ReadAchiev' and 'MathAchiev' using random
# covariances matrices at level 1 (residuals)

fml <- ReadAchiev + MathAchiev ~ (1|ID)
imp <- jomoImpute(studentratings, formula=fml, random.L1="full", n.burn=100,
n.iter=10, m=5)

# * Example 2.6: imputation of 'Sex' (categorical) and 'MathAchiev' (continuous),
# predicted by 'ReadAchiev' (random slopes)

# induce some artificial missing data for 'Sex'
studentratings <- within(studentratings,{ Sex[!duplicated(ID)] <- NA
Sex <- as.factor(Sex) })

fml <- Sex + MathAchiev ~ ReadAchiev + (1+ReadAchiev|ID)
imp <- jomoImpute(studentratings, formula=fml, n.burn=100, n.iter=10, m=5)

Run the code above in your browser using DataLab