jomoImpute: Impute multilevel missing data using `jomo`

Description

This function provides an interface to the jomo package, which uses the MCMC algorithms presented in Carpenter & Kenward (2013). Through this wrapper function, imputations can be generated for (mixed) categorical and continuous variables (Goldstein et al., 2009) at level 1 and level 2 as well as imputation using random (residual) covariance matrices (Yucel, 2011). Imputations can be generated using type or formula, which offer different options for model specification.

Usage

jomoImpute(data, type, formula, random.L1=c("none","mean","full"), n.burn=5000, n.iter=100, m=10, group=NULL, prior=NULL, seed=NULL, save.pred=FALSE, silent=FALSE)

Arguments

data

A data frame containing incomplete and auxiliary variables, the cluster indicator variable, and any other variables that should be present in the imputed datasets.

type

An integer vector specifying the role of each variable in the imputation model or a list of two vectors specifying a two-level model (see details).

formula

A formula specifying the role of each variable in the imputation model or a list of two formulas specifying a two-level model. The basic model is constructed by model.matrix, thus allowing to include derived variables in the imputation model using I() (see details and examples).

random.L1

A character string denoting if the covariance matrix of residuals should vary across groups and how the values of these matrices are stored (see details). Default is to "none", assuming a common covariance matrix across clusters.

n.burn

The number of burn-in iterations before any imputations are drawn. Default is to 5,000.

n.iter

The number of iterations between imputations. Default is to 100.

The number of imputed data sets to generate. Default is to 10.

group

(optional) A character string denoting the name of an additional grouping variable to be used with the formula argument. When specified, the imputation model is run separately within each of these groups.

prior

(optional) A list with components Binv, Dinv, and a for specifying prior distributions for the covariance matrix of random effects and the covariance matrix of residuals (see details). Default is to using least-informative priors.

seed

(optional) An integer value initializing R's random number generator for reproducible results. Default is to using the global seed.

save.pred

(optional) Logical flag indicating if variables derived using formula should be included in the imputed data sets. Default is to FALSE.

silent

(optional) Logical flag indicating if console output should be suppressed. Default is to FALSE.

Value

Returns an object of class mitml, containing the following components:

Details

This function serves as an interface to the jomo package and supports imputation of multilevel continuous and categorical data. In order for categorical variables to be detected correctly, these must be formatted as a factor variables (see examples). The imputation model can be specified using either the type or the formula argument.

The type interface is designed to provide quick-and-easy imputations using jomo. The type argument must be an integer vector denoting the role of each variable in the imputation model:

1: target variables containing missing data
2: predictors with fixed effect on all targets (completely observed)
3: predictors with random effect on all targets (completely observed)
-1: grouping variable within which the imputation is run separately
-2: cluster indicator variable
0: variables not featured in the model

At least one target variable and the cluster indicator must be specified. The intercept is automatically included both as a fixed and random effect. If a variable of type -1 is found, then separate imputations are performed within each level of that variable.

The formula argument is intended as more flexible and feature-rich interface to jomo. Specifying the formula argument is similar to specifying other formulae in R. Given below is a list of operators that jomoImpute currently understands:

~: separates the target (left-hand) and predictor (right-hand) side of the model
+: adds target or predictor variables to the model
*: adds an interaction term of two or more predictors
|: denotes cluster-specific random effects and specifies the cluster indicator (i.e., 1|ID)
I(): defines functions to be interpreted by model.matrix

Predictors are allowed to have fixed effects, random effects, or both on all target variables. The intercept is automatically included both as a fixed and a random effect, but it can be constrained if necessary (see panImpute). Note that, when specifying random effects other than the intercept, these will not be automatically added as fixed effects and must be included explicitly. Any predictors defined by I() will be used for imputation but not included in the data set unless save.pred=TRUE.

If missing data occur at both levels of the sample (level 1 and level 2), then a list of two formulas or types may be provided. The first element of this list denotes the imputation model for variables at level 1. The second element denotes the imputation model for variables at level 2. In such a case, missing values are imputed jointly at both levels (see examples, see also Carpenter and Kenward, 2013; Goldstein et al., 2009).

It is possible to model the covariance matrix of residuals at level 1 as random across clusters (Yucel, 2011; Carpenter & Kenward, 2013). The random.L1 argument determines this behavior and how the values of these matrices are stored. If set to "none", a common covariance matrix is assumed across groups (similar to panImpute). If set to "mean", the covariance matrices are random, but only the average covariance matrix is stored at each iteration. If set to "full", the covariance matrices are random, and all variances and covariances from all clusters are stored.

In order to run separate imputations for each level of an additional grouping variable, the group argument may be used. The name of the grouping variable must be given in quotes.

As a default prior, jomoImpute uses "least informative" inverse-Wishart priors for the covariance matrix of random effects (and residuals at level 2) and the covariance matrix of residuals at level 1, that is, with minimum degrees of freedom (largest dispersion) and identity matrices for scale. For better control, the prior argument may be used for specifying alternative prior distributions. These must be supplied as a list containing the following components:

Binv: scale matrix for the covariance matrix of residuals at level 1
Dinv: scale matrix for the covariance matrix of random effects and residuals at level 2
a: starting value for the degrees of freedom of random covariance matrices of residuals (only used with random.L1="mean" or random.L1="full")

Note that jomo does not allow for the degrees of freedom for the inverse-Wishart prior to be specified by the user. These are always set to the lowest value possible (largest dispersion) or determined iteratively if the residuals at level 1 are modeled as random (see above).

References

Carpenter, J. R., & Kenward, M. G. (2013). Multiple imputation and its application. Hoboken, NJ: Wiley.

Goldstein, H., Carpenter, J., Kenward, M. G., & Levin, K. A. (2009). Multilevel models with multivariate mixed response types. Statistical Modelling, 9, 173-197.

Yucel, R. M. (2011). Random covariances and mixed-effects models for imputing multivariate multilevel continuous data. Statistical Modelling, 11, 351-370.

Examples

Run this code

# NOTE: The number of iterations in these examples is much lower than it
# should be! This is done in order to comply with CRAN policies, and more
# iterations are recommended for applications in practice!

data(studentratings)
data(leadership)

# ***
# for further examples, see "panImpute"
#

?panImpute

# *** ................................
# the 'type' interface
# 

# * Example 1.1 (studentratings): 'ReadDis' and 'SES', predicted by 'ReadAchiev'
# (random slope)

type <- c(-2,0,0,0,0,1,3,1,0,0)
names(type) <- colnames(studentratings)
type

imp <- jomoImpute(studentratings, type=type, n.burn=100, n.iter=10, m=5)

# * Example 1.2 (leadership): all variables (mixed continuous and categorical
# data with missing values at level 1 and level 2)

type.L1 <- c(-2,1,0,1,1)   # imputation model at level 1
type.L2 <- c(-2,0,1,0,0)   # imputation model at level 2
names(type.L1) <- names(type.L2) <- colnames(leadership)

type <- list(type.L1, type.L2)
type

imp <- jomoImpute(leadership, type=type, n.burn=100, n.iter=10, m=5)


# *** ................................
# the 'formula' interface
# 

# * Example 2.1 (studentratings): 'ReadDis' and 'SES' predicted by 'ReadAchiev'
# (random slope)

fml <- ReadDis + SES ~ ReadAchiev + (1|ID)
imp <- jomoImpute(studentratings, formula=fml, n.burn=100, n.iter=10, m=5)

# * Example 2.2 (studentratings): 'ReadDis' predicted by 'ReadAchiev' and the
# the cluster mean of 'ReadAchiev'

fml <- ReadDis ~ ReadAchiev + I(clusterMeans(ReadAchiev,ID)) + (1|ID)
imp <- jomoImpute(studentratings, formula=fml, n.burn=100, n.iter=10, m=5)

# * Example 2.3 (studentratings): 'ReadDis' predicted by 'ReadAchiev', groupwise
# for 'FedState'

fml <- ReadDis ~ ReadAchiev + (1|ID)
imp <- jomoImpute(studentratings, formula=fml, group="FedState", n.burn=100, n.iter=10, m=5)

# * Example 2.4 (leadership): all variables (mixed continuous and categorical
# data with missing values at level 1 and level 2)

fml <- list( JOBSAT + NEGLEAD + WLOAD ~ 1 + (1|GRPID) , COHES ~ 1 )
imp <- jomoImpute(leadership, formula=fml, n.burn=100, n.iter=10, m=5)

Run the code above in your browser using DataLab