This is a wrapper function for mice
, using multiple cores to
execute mice
in parallel. As a result, the imputation
procedure can be sped up, which may be useful in general.
parlmice(
data,
m = 5,
seed = NA,
cluster.seed = NA,
n.core = NULL,
n.imp.core = NULL,
cl.type = "PSOCK",
...
)
A mids object as defined by mids-class
A data frame or matrix containing the incomplete data. Similar to
the first argument of mice
.
The number of desired imputated datasets. By default $m=5$ as with mice
A scalar to be used as the seed value for the mice algorithm within
each parallel stream. Please note that the imputations will be the same for all
streams and, hence, this should be used if and only if n.core = 1
and
if it is desired to obtain the same output as under mice
.
A scalar to be used as the seed value. It is recommended to put the seed value here and not outside this function, as otherwise the parallel processes will be performed with separate, random seeds.
A scalar indicating the number of cores that should be used.
A scalar indicating the number of imputations per core.
The cluster type. Default value is "PSOCK"
. Posix machines (linux, Mac)
generally benefit from much faster cluster computation if type
is set to type = "FORK"
.
Named arguments that are passed down to function mice
or
makeCluster
.
Gerko Vink, Rianne Schouten
This function relies on package parallel
, which is a base
package for R versions 2.14.0 and later. We have chosen to use parallel function
parLapply
to allow the use of parlmice
on Mac, Linux and Windows
systems. For the same reason, we use the Parallel Socket Cluster (PSOCK) type by default.
On systems other than Windows, it can be hugely beneficial to change the cluster type to
FORK
, as it generally results in improved memory handling. When memory issues
arise on a Windows system, we advise to store the multiply imputed datasets,
clean the memory by using rm
and gc
and make another
run using the same settings.
This wrapper function combines the output of parLapply
with
function ibind
in mice
. A mids
object is returned
and can be used for further analyses.
Note that if a seed value is desired, the seed should be entered to this function
with argument seed
. Seed values outside the wrapper function (in an
R-script or passed to mice
) will not result to reproducible results.
We refer to the manual of parallel
for an explanation on this matter.
Schouten, R. and Vink, G. (2017). parlmice: faster, paraleller, micer. https://www.gerkovink.com/parlMICE/Vignette_parlMICE.html
#'Van Buuren, S. (2018). Flexible Imputation of Missing Data. Second Edition. Chapman & Hall/CRC. Boca Raton, FL.
parallel
, parLapply
, makeCluster
,
mice
, mids-class
# 150 imputations in dataset nhanes, performed by 3 cores
if (FALSE) {
imp1 <- parlmice(data = nhanes, n.core = 3, n.imp.core = 50)
# Making use of arguments in mice.
imp2 <- parlmice(data = nhanes, method = "norm.nob", m = 100)
imp2$method
fit <- with(imp2, lm(bmi ~ hyp))
pool(fit)
}
Run the code above in your browser using DataLab