mice(data, m = 5,
method = vector("character", length = ncol(data)),
predictorMatrix = (1 - diag(1, ncol(data))),
visitSequence = (1:ncol(data))[apply(is.na(data), 2, any)],
form = vector("character", length = ncol(data)),
post = vector("character", length = ncol(data)),
defaultMethod = c("pmm", "logreg", "polyreg", "polr"),
maxit = 5, diagnostics = TRUE, printFlag = TRUE,
seed = NA, imputationMethod = NULL,
defaultImputationMethod = NULL, data.init = NULL, ...)NA.m=5.ncol(data), specifying the
elementary imputation method to be used for each column
in data. If specified as a single string, the same method
will be used for all columnncol(data) containing 0/1 data specifying the set
of predictors to be used for each target column. Rows
correspond to target variables (i.e. variables to be
imputed), in the sequence as they appear in data. A vncol(data), specifying expressions. Each string is
parsed and executed within the sampler() function
to postprocess imputed values. The default is to do
nothing, indicated by a vector of encol(data), specifying formulae. Each string is
parsed and executed within the sampler() function
to create terms for the predictor. The default is to do
nothing, indicated by a vector of TRUE,
diagnostic information will be appended to the value of
the function. If FALSE, only the imputed data are
saved. The default is TRUE.TRUE, mice will print
history on console. Use print=FALSE for silent
computation.set.seed() for offsetting the random number
generator. Default is to leave the random number
generator alone.method argument.
Included for backwards compatibility.defaultMethod argument. Included for backwards
compatibility.data, without missing data, used to initialize
imputations before the start of the iterative process.
The default NULL implies that starting imputation
are created by a simple ranmids (multiply imputed data
set)~A separate univariate imputation model can be specified for each column. The default imputation method depends on the measurement level of the target column. In addition to these, several other methods are provided. You can also write their own imputation functions, and call these from within the algorithm.
The data may contain categorical variables that are used
in a regressions on other variables. The algorithm
creates dummy variables for the categories of these
variables, and imputes these from the corresponding
categorical variable. The extended model containing the
dummy variables is called the padded model. Its structure
is stored in the list component pad.
Built-in elementary imputation methods are:
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
These corresponding functions are coded in the
mice library under names
mice.impute.method, where method is a
string with the name of the elementary imputation method
name, for example norm. The method argument
specifies the methods to be used. For the j'th
column, mice() calls the first occurence of
paste('mice.impute.',method[j],sep='') in the
search path. The mechanism allows uses to write
customized imputation function,
mice.impute.myfunc. To call it for all columns
specify method='myfunc'. To call it only for,
say, column 2 specify
method=c('norm','myfunc','logreg',...)
mice: Multivariate Imputation by Chained Equations
in R. Journal of Statistical Software,
45(3), 1-67.
van Buuren, S. (2012). Flexible Imputation of Missing Data. Boca Raton, FL: Chapman & Hall/CRC Press.
Van Buuren, S., Brand, J.P.L., Groothuis-Oudshoorn C.G.M., Rubin, D.B. (2006) Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76, 12, 1049--1064.
Van Buuren, S. (2007) Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research, 16, 3, 219--242.
Van Buuren, S., Boshuizen, H.C., Knook, D.L. (1999) Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine, 18, 681--694.
Brand, J.P.L. (1999) Development, implementation and evaluation of multiple imputation strategies for the statistical analysis of incomplete data sets. Dissertation. Rotterdam: Erasmus University.
mids, with.mids,
set.seed, complete# do default multiple imputation on a numeric matrix
imp <- mice(nhanes)
imp
# list the actual imputations for BMI
imp$imputations$bmi
# first completed data matrix
complete(imp)
# imputation on mixed data with a different method per column
mice(nhanes2, meth=c('sample','pmm','logreg','norm'))Run the code above in your browser using DataLab