mice(data, m = 5,
method = vector("character",length=ncol(data)),
predictorMatrix = (1 - diag(1, ncol(data))),
visitSequence = (1:ncol(data))[apply(is.na(data),2,any)],
post = vector("character", length = ncol(data)),
defaultMethod = c("pmm","logreg","polyreg"),
maxit = 5,
diagnostics = TRUE,
printFlag = TRUE,
seed = NA,
imputationMethod = NULL,
defaultImputationMethod = NULL
)
NA
.m=5
.ncol(data)
,
specifying the elementary imputation method to be used
for each column in data. If specified as a single
string, the same method will be used foncol(data)
containing 0/1 data specifying
the set of predictors to be used for each target column. Rows correspond
to target variables (i.e. variables to be imputed), in the sequence as
they appear in data. A value ncol(data)
,
specifying expressions. Each string is parsed and executed within the
sampler()
function to postprocess imputed values.
The default is to do nothing, indicated by a vectTRUE
, diagnostic
information will be appended to the value of the function. If
FALSE
, only the imputed data are saved. The default is TRUE
.TRUE
, mice
will print history on console. Use print=FALSE
for silent computation.set.seed()
for offsetting the random number generator. Default is to leave the random number generator alone.method
argument. Included for backwards compatibility.defaultMethod
argument. Included for backwards compatibility.mids
(multiply imputed data set) with componentsncol(data)
containing the number of missing observations
per columnncol(data)
components with the generated multiple imputations.
Each part of the list is a nmis[j]
by m matrix of imputed values for
variable data[,j]
. The component equals NULL
for columns without
missing data.ncol(data)
specifying the elementary
imputation method per columnncol(data)
containing 0/1 data specifying
the predictor setncol(data)
with commands for post-processingas.integer()
.
Note that observed data are not present in this mean.chainMean
, containing the variances
of the imputed values.pad$data
(data padded with columns for factors), pad$predictorMatrix
(predictor matrix for the padded data), pad$method
(imputation methods applied
to the padded data), the vector pad$visitSequence
(the visit sequence applied to the padded
data), pad$post
(post-processing commands for padded data) and
categories
(a matrix containing descriptive information about the padding
operation).pad
.
Built-in elementary imputation methods are:
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
These corresponding functions are coded in the mice
library under
names
mice.impute.method
, where method
is a string with the name of the elementary imputation method name,
for example norm
. The method
argument specifies the methods to be used.
For the j
'th column, mice()
calls the first occurence of
paste("mice.impute.",method[j],sep="")
in the search path.
The mechanism allows uses to write customized imputation function,
mice.impute.myfunc
. To call it for all columns specify
method="myfunc"
.
To call it only for, say, column 2 specify
method=c("norm","myfunc","logreg",...)
.
Passive imputation:
mice()
supports a special built-in method, called passive imputation. This
method can be used to ensure that a data transform always depends on the
most recently generated imputations.
In some cases, an imputation model may need transformed data in addition
to the original data (e.g. log, quadratic, recodes, interaction, sum scores,
and so on).
Passive imputation maintains consistency among different transformations of the same data.
Passive imputation is invoked if ~
is specified as the first
character of the string that specifies the elementary method.
mice()
interprets the entire string, including the ~
character,
as the formula argument
in a call to model.frame(formula, data[!r[,j],])
. This provides a simple
mechanism for specifying determinstic dependencies among the
columns. For example, suppose that the missing entries in
variables data$height
and data$weight
are imputed. The
body mass index (BMI) can be calculated within mice
by
specifying the string "~I(weight/height^2)"
as the elementary
imputation method for the target column data$bmi
.
Note that the ~
mechanism works only on those entries which have
missing values in the target column. You should make sure that the
combined observed and imputed parts of the target column make sense. An
easy way to create consistency is by coding all entries in the target as
NA
, but for large data sets, this could be inefficient.
Note that you may also need to adapt the default predictorMatrix
to
evade linear dependencies among the predictors that could cause errors
like Error in solve.default()
or Error: system is exactly singular
.
Though not strictly needed, it is often
useful to specify visitSequence
such that the column that is imputed by
the ~
mechanism is visited each time after one of its predictors was
visited. In that way, deterministic relation between columns will always
be synchronized.complete
, mids
, with.mids
, set.seed
# do default multiple imputation on a numeric matrix
imp <- mice(nhanes)
imp
# list the actual imputations for BMI
imp$imputations$bmi
# first completed data matrix
complete(imp)
# imputation on mixed data with a different method per column
mice(nhanes2, meth=c("sample","pmm","logreg","norm"))
Run the code above in your browser using DataLab