mvn_imputation: Multivariate Normal Imputation

Description

Impute values, assuming a multivariate normal distribution, with the posterior predictive distribution. For binary, ordinal, and mixed (a combination of discrete and continuous) data, the values are first imputed for the latent data and then converted to the original scale.

Usage

mvn_imputation(
  Y,
  type = "continuous",
  iter = 1000,
  progress = TRUE,
  save_all = FALSE
)

Arguments

Matrix (or data frame) of dimensions n (observations) by p (variables).

type

Character string. Which type of data for Y ? The options include continuous, binary, ordinal, or mixed. Note that mixed can be used for data with only ordinal variables. See the note for further details.

iter

Number of iterations (posterior samples; defaults to 1000).

progress

Logical. Should a progress bar be included (defaults to TRUE) ?

save_all

Logical. Should each imputed dataset be stored (defaults to FALSE which saves the imputed missing values) ?

Value

An object of class mvn_imputation:

Y The last imputed dataset.
ppd_missing A matrix of dimensions iter by the number of missing values.
ppd_mean A vector including the means of the posterior predictive distribution for the missing values.
Y_all An 3D array with iter matrices of dimensions n by p (NULL when save_all = FALSE).

Details

Missing values are imputed with the approach described in hoff2009first;textualBGGM. The basic idea is to impute the missing values with the respective posterior pedictive distribution, given the observed data, as the model is being estimated. Note that the default is TRUE, but this ignored when there are no missing values. If set to FALSE, and there are missing values, list-wise deletion is performed with na.omit.

References

Examples

Run this code

# NOT RUN {
# obs
n <- 5000

# n missing
n_missing <- 1000

# variables
p <- 16

# data
Y <- MASS::mvrnorm(n, rep(0, p), ptsd_cor1)

# for checking
Ymain <- Y

# all possible indices
indices <- which(matrix(0, n, p) == 0,
                 arr.ind = TRUE)

# random sample of 1000 missing values
na_indices <- indices[sample(5:nrow(indices),
                             size = n_missing,
                             replace = FALSE),]

# fill with NA
Y[na_indices] <- NA

# missing = 1
Y_miss <- ifelse(is.na(Y), 1, 0)

# true values (to check)
true <- unlist(sapply(1:p, function(x)
        Ymain[which(Y_miss[,x] == 1),x] ))

# impute
fit_missing <- mvn_imputation(Y, progress = FALSE, iter = 250)

print(fit_missing, n_rows = 20)


# plot
plot(x =  true,
     y = fit_missing$ppd_mean,
     main = "BGGM: Imputation",
     xlab = "Actual",
     ylab = "Posterior Mean")
# }

Run the code above in your browser using DataLab