hcmm_impute: Generate multiply imputed datasets

Description

Imputations are generated using nonparametric Bayesian joint models (specifically the hierarchcially coupled mixture model with local dependence described in Murray and Reiter (2015); see citation(MixedDataImpute) or http://arxiv.org/abs/1410.0438).

Usage

hcmm_impute(X, Y, kz, kx, ky, hyperpar = NULL, num.impute, num.burnin, num.skip, thin.trace = -1, status = 50)

Arguments

A data frame of categorical variables (as factors)

A matrix or data frame of continuous variables

Number of top-level clusters

Number of X-model clusters

Number of Y-model clusters

hyperpar

A list of hyperparameter values (see hcmm_hyperpar)

num.impute

Number of imputations

num.burnin

Number of MCMC burn-in iterations

num.skip

Number of MCMC iterations between saved imputations

thin.trace

If negative, only save the num.impute datasets. If positive, save summaries of the model state at every thin.trace iterations for diagnostic purposes.

status

Interval at which to print status messages

Value

A list with three elements:imputations A list of length num.impute. Each element is an imputed dataset.trace MCMC output (currently the component sizes for the three mixture indices)model An interface to the C++ object containing the current state

Examples

Run this code

## Not run: 
# library(MixedDataImpute)
# library(mice) # For the functions implementing combining rules
# 
# data(sipp08)
# 
# set.seed(1)
# n = 1000
# s = sample(1:nrow(sipp08), n)
# 
# Y = sipp08[s,1:2]
# Y[,1] = log(Y[,1]+1)
# X = sipp08[s,-c(1:2,9)] # Also removes occ code, which has ~23 levels
# 
# # MCAR with probability 0.2, for illustration purposes (not matching the paper)
# 
# Y[runif(n)<0.2,1] = NA
# Y[runif(n)<0.2,2] = NA
# for(j in 1:ncol(X)) X[runif(n)<0.2,j] = NA
# 
# kz = 15
# ky = 60
# kx = 90
# 
# num.impute = 5
# num.burnin = 10000
# num.skip = 1000
# thin.trace = 10
# 
# imp = hcmm_impute(X, Y, kz=kz, kx=kx, ky=ky,
#                   num.impute=num.impute, num.burnin=num.burnin,
#                   num.skip=num.skip, thin.trace=thin.trace)
# 
# # Example of getting MI estimates for a regression, using the
# # pooling functions in mice
# form = total_earnings~age+I(age^2) + sex*I(own_kid!=0)
# 
# fits = lapply(imp$imputations, function(dat) lm(form, data=dat))
# pooled_ests = pool(as.mira(fits))
# summary(pooled_ests)
# 
# # original, complete data estimates for comparison
# comdat = sipp08[s,]
# comdat[,1] = log(comdat[,1]+10)
# summary(lm(form, data=comdat))
# 
# # true population values for comparison
# pop = sipp08
# pop[,1] = log(pop[,1]+10)
# summary(lm(form, data=pop))
# 
# ## End(Not run)

Run the code above in your browser using DataLab