impNorm: Imputation and prediction for incomplete multivariate normal data

Description

Simulates or predicts missing values from their predictive distribution given the observed data under a normal model with fixed parameters.

Usage


impNorm(obj, …)

# S3 method for default
impNorm(obj, x = NULL, intercept = TRUE, param,
   seeds = NULL, method = "random", …)

# S3 method for formula
impNorm(formula, data, param,
   seeds = NULL, method = "random", …)

# S3 method for norm
impNorm(obj, param = obj$param, seeds = NULL, 
   method = "random", …)

Arguments

obj

an object used to select a method. It may be y, a numeric matrix, vector or data frame of responses to be modeled as normal. Missing values (NAs) are allowed. If y is a data frame, any factors or ordered factors will be replaced by their internal codes, and a warning will be given. Alternatively, this first argument may be an object of class "norm" resulting from a call to emNorm or mcmcNorm; see DETAILS.

a numeric matrix, vector or data frame of covariates to be used as predictors for y. Missing values (NA's) are not allowed. If x is a matrix, it must have the same number of rows as y. If x is a data frame, any factors or ordered factors are replaced by their internal codes, and a warning is given. If NULL, it defaults to x = rep(1,nrow(y)), an intercept-only model.

intercept

if TRUE, then a column of 1's is appended to x. Ignored if x = NULL.

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model which is provided in lieu of y and x. The details of model specification are given under DETAILS.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which impNorm is called.

param

assumed values for the model parameters. This must be a list with two named components, beta and sigma, which are numeric matrices with correct dimensions. In most circumstances, the parameter values will be obtained from a run of emNorm or mcmcNorm; see DETAILS.

seeds

two integers to initialize the random number generator; see DETAILS.

method

if "random", the missing values in each row of y will be simulated from their joint predictive distribution given x and the observed values in y. If "predict", missing values will be replaced by regression predictions given the observed values. See DETAILS.

…

values to be passed to the methods.

Value

a data matrix resembling the original data y, but with NA's replaced by simulated values or predictions.

Details

This function is used primarily in conjunction with mcmcNorm to draw multiple imputations by the multiple-chain method. In those instances, the simplest way to call impNorm is to provide an object of class "norm" as its first argument, where that object is the result of a call to mcmcNorm. The parameter values stored in that object will then be passed to impNorm automatically.

Alternatively, one may call impNorm by providing as the first argument y, a vector or matrix of data to be modeled as normal, and an optional vector or matrix of predictors x. Missing values NA are allowed in y but not in x.

A third way to call impNorm is to provide formula, a formula for a (typically multivariate) linear regression model in the manner expected by lm. A formula is given as y ~ model, where y is either a single numeric variable or a matrix of numeric variables bound together with the function cbind. The right-hand side of the formula (everything to the right of ~) is a linear predictor, a series of terms separated by operators +, : or * to specify main effects and interactions. Factors are allowed on the right-hand side and will enter the model as contrasts among the levels. The intercept term 1 is included by default; to remove the intercept, use -1.

norm2 functions use their own internal random number generator which is seeded by two integers, for example, seeds=c(123,456), which allows results to be reproduced in the future. If seeds=NULL then the function will seed itself with two random integers from R. Therefore, results can also be made reproducible by calling set.seed beforehand and taking seeds=NULL.

References

Schafer, J.L. (1997) Analysis of Incomplete Multivariate Data. London: Chapman & Hall/CRC Press.

For more information about this function and other functions in the norm2 package, see User's Guide for norm2 in the library subdirectory doc.

Examples

Run this code

# NOT RUN {
## run EM for marijuana data with ridge prior
data(marijuana)
emResult <- emNorm(marijuana, prior="ridge", prior.df=0.5)

## generate 25 multiple imputations by running 25 chains
## of 100 iterations each, starting each chain at the 
## posterior mode
set.seed(456)
imp.list <- as.list(NULL)
for(m in 1:25){
   mcmcResult <- mcmcNorm(emResult, iter=100)
   imp.list[[m]] <- impNorm(mcmcResult)}

# }

Run the code above in your browser using DataLab