Learn R Programming

MixedDataImpute (version 0.1)

hcmm_impute: Generate multiply imputed datasets

Description

Imputations are generated using nonparametric Bayesian joint models (specifically the hierarchcially coupled mixture model with local dependence described in Murray and Reiter (2015); see citation(MixedDataImpute) or http://arxiv.org/abs/1410.0438).

Usage

hcmm_impute(X, Y, kz, kx, ky, hyperpar = NULL, num.impute, num.burnin, num.skip, thin.trace = -1, status = 50)

Arguments

X
A data frame of categorical variables (as factors)
Y
A matrix or data frame of continuous variables
kz
Number of top-level clusters
kx
Number of X-model clusters
ky
Number of Y-model clusters
hyperpar
A list of hyperparameter values (see hcmm_hyperpar)
num.impute
Number of imputations
num.burnin
Number of MCMC burn-in iterations
num.skip
Number of MCMC iterations between saved imputations
thin.trace
If negative, only save the num.impute datasets. If positive, save summaries of the model state at every thin.trace iterations for diagnostic purposes.
status
Interval at which to print status messages

Value

A list with three elements:imputations A list of length num.impute. Each element is an imputed dataset.trace MCMC output (currently the component sizes for the three mixture indices)model An interface to the C++ object containing the current state

Examples

Run this code
## Not run: 
# library(MixedDataImpute)
# library(mice) # For the functions implementing combining rules
# 
# data(sipp08)
# 
# set.seed(1)
# n = 1000
# s = sample(1:nrow(sipp08), n)
# 
# Y = sipp08[s,1:2]
# Y[,1] = log(Y[,1]+1)
# X = sipp08[s,-c(1:2,9)] # Also removes occ code, which has ~23 levels
# 
# # MCAR with probability 0.2, for illustration purposes (not matching the paper)
# 
# Y[runif(n)<0.2,1] = NA
# Y[runif(n)<0.2,2] = NA
# for(j in 1:ncol(X)) X[runif(n)<0.2,j] = NA
# 
# kz = 15
# ky = 60
# kx = 90
# 
# num.impute = 5
# num.burnin = 10000
# num.skip = 1000
# thin.trace = 10
# 
# imp = hcmm_impute(X, Y, kz=kz, kx=kx, ky=ky,
#                   num.impute=num.impute, num.burnin=num.burnin,
#                   num.skip=num.skip, thin.trace=thin.trace)
# 
# # Example of getting MI estimates for a regression, using the
# # pooling functions in mice
# form = total_earnings~age+I(age^2) + sex*I(own_kid!=0)
# 
# fits = lapply(imp$imputations, function(dat) lm(form, data=dat))
# pooled_ests = pool(as.mira(fits))
# summary(pooled_ests)
# 
# # original, complete data estimates for comparison
# comdat = sipp08[s,]
# comdat[,1] = log(comdat[,1]+10)
# summary(lm(form, data=comdat))
# 
# # true population values for comparison
# pop = sipp08
# pop[,1] = log(pop[,1]+10)
# summary(lm(form, data=pop))
# 
# ## End(Not run)

Run the code above in your browser using DataLab