mi: Multiple Iterative Regression Imputation

Description

Produce a multiply imputed matrix applying the elementary functions iteratively to the variables with missingness in the data randomly imputing each variable and looping through until approximate convergence.

Usage

## S3 method for class 'data.frame':
mi( object, info,  n.imp = 3, n.iter = 30, 
    R.hat = 1.1, max.minutes = 20, rand.imp.method = "bootstrap", 
    preprocess = TRUE, continue.on.convergence = FALSE,
    seed = NA, check.coef.convergence = FALSE, 
    add.priors = prior.control(), post.run = TRUE)
    
## S3 method for class 'mi':
mi( object, info, n.imp = 3, n.iter = 30, 
    R.hat = 1.1, max.minutes = 20, rand.imp.method = "bootstrap", 
    preprocess = TRUE, continue.on.convergence = FALSE,
    seed = NA, check.coef.convergence = FALSE)

Arguments

object

A data frame containing the incomplete data. Missing data are coded as NA or mi object.

info

mi.info object.

n.imp

Number of multiple imputations. The default is m = 3.

n.iter

Number of iterations to get convergence. The default is 30.

R.hat

R.hat statistic for convergence check, default is 1.1.

max.minutes

Maximum minutes to stop iterating. The default is 20.

seed

Random seed

rand.imp.method

Method for random imputation, see random.imp

preprocess

Preprocess the data according to the variable types, see mi.preprocess

continue.on.convergence

If set to TRUE the mi will run until maximum iteration is reached or maximum minutes pass.

check.coef.convergence

default = FALSE

add.priors

a list of parameters for controlling the process of adding priors for mi. See the documentation for prior.control for details. Set add.p

post.run

default is TRUE which will run 20 more iterations after the mi is finished if and only if some priors have been added into the mi process. This is to mitigate the influence of the priors to the whole proce

Value

A list of object of class mi, which stands for multiple imputation. Each object is itself a list of 10 elements.
callTheimputation model
dataThe original data frame
mThe number of imputations.
mi.infoInformation matrix of the mi
impA list of length(m) of imputations.
convergedBinary variable to indicate if the mi has converged.
coef.convBinary variable to indicate if the coefs of mi model have converged, return NULL if check.coef.convergence = FALSE
bugsBUGS array of the mean and sd of each iteration.
preprocessBinary variable to indicate if preprocess=TRUE in the mi process
mi.info.preprocessedInformation matrix that actually used in the mi if preprocess=TRUE
Each imp[[m]] is itself a list containg k variable lists of 3 objects:
imp[[m]][[k]]@modelthe specified models used for imputing missing values
imp[[m]][[k]]@expecteda list of vectors of length n-n.mis (number of complete observed data), specifying the estimated values of the models
imp[[m]][[k]]@randoma list of vectors of length n.mis (number of NAs), specifying the random predicted values for imputing missing data

Details

Generate multiple imputations for incomplete data using iterative regression imputation. If the variables with missingness are a matrix Y with columns Y(1), . . . , Y(K) and the fully observed predictors are X, this entails first imputing all the missing Y values using some crude approach (for example, choosing imputed values for each variable by randomly selecting from the observed outcomes of that variable); and then imputing Y(1) given Y(2), . . . , Y(K) and X; imputing Y(2) given Y(1), Y(3), . . . , Y(K) and X (using the newly imputed values for Y(1)), and so forth, randomly imputing each variable and looping through until approximate convergence.

References

Kobi Abayomi, Andrew Gelman and Marc Levy. (2008). Diagnostics for multivariate imputations. Applied Statistics 57, Part 3: 273--291. Andrew Gelman and Maria Grazia Pittau. A flexible program for missing-data imputation and model checking. Technical report. Columbia University, New York. Andrew Gelman and Jennifer Hill. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.

Examples

Run this code

# simulate fake data
set.seed(100)
n <- 100
u1 <- rbinom(n, 1, .5)
v1 <- log(rnorm(n, 5, 1))
x1 <- u1*exp(v1)
u2 <- rbinom(n, 1, .5)
v2 <- log(rnorm(n, 5, 1))
x2 <- u2*exp(v2)
x3 <- rbinom(n, 1, prob=0.45)
x4 <- ordered(rep(seq(1, 5),100)[sample(1:n, n)])
x5 <- rep(letters[1:10],10)[sample(1:n, n)]
x6 <- trunc(runif(n, 1, 10))
x7 <- rnorm(n)
x8 <- factor(rep(seq(1,10),10)[sample(1:n, n)])
x9 <- runif(n, 0.1, .99)
x10 <- rpois(n, 10)
y <- x1 + x2 + x7 + x9 + rnorm(n)
fakedata <- cbind.data.frame(y, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10)

# randomly create missing values
dat <- mi:::.create.missing(fakedata, pct.mis=30)

# get information matrix of the data
inf <- mi.info(dat)

# update the variable type of a specific variable to mi.info
inf <- update(inf, "type", list(x10="count"))

# run the imputation
## this is for test only
IMP <- mi(dat, info=inf, n.iter=6, post.run=FALSE)
# no prior
# IMP <- mi(dat, info=inf, n.iter=6, add.priors=FALSE)

# pick up where you left off
# IMP <- mi(IMP)       ## NOT RUN

## this is the suggested (defautl) way of running mi, NOT RUN
# IMP <- mi(dat, info=inf)

# convergence checking
converged(IMP)  ## You should get FALSE here because only n.iter is small 
bugs.mi(IMP)    ## BUGS object to look at the R hat statistics

# visually check the imputation
plot(IMP)

Run the code above in your browser using DataLab