## S3 method for class 'default':
amelia(x, m = 5, p2s = 1,frontend = FALSE, idvars = NULL,
ts = NULL, cs = NULL, polytime = NULL, splinetime = NULL, intercs = FALSE,
lags = NULL, leads = NULL, startvals = 0, tolerance = 0.0001,
logs = NULL, sqrts = NULL, lgstc = NULL, noms = NULL, ords = NULL,
incheck = TRUE, collect = FALSE, arglist = NULL, empri = NULL,
priors = NULL, autopri = 0.05, emburn = c(0,0), bounds = NULL,
max.resample = 100, overimp = NULL,
parallel = c("no", "multicore", "snow"),
ncpus = getOption("amelia.ncpus", 1L), cl = NULL, ...)## S3 method for class 'amelia':
amelia(x, m = 5, p2s = 1, frontend = FALSE, ...)
## S3 method for class 'molist':
amelia(x, ...)
k
greater
than 3 create a spline with an additional
polytime
should vary across the
cross-section.amelia
. This should
only be set to FALSE
if you are extremely confident that your
settings are non-problematic and you aTRUE
if you are experiencing memory
issues as it can significantly slow down the impemburn[1]
is
a the minimum EM chain length and emburn[2]
is the
maximum EM chain length. These are ignored if they are less than 1.c(column.number, lower.bound,upper.bound)
See Details below.bounds
. After this value, imputed values are
set to the bounds.c(row,
column)
pair. Each of these cells will be treated as missing and
replaced with draws from the imputation model."amelia.parallel"
(and if that is not set, "no"
).parallel = "snow"
. If not supplied, a cluster on the local
machine is created for the duration of the amelia
call.m
with an imputed dataset in
each entry. The class (matrix or data.frame) of these entries will
match x
.m
EM chains.theta
, mu
and covMatrcies
objects
refers to the data as seen by the EM algorithm and is thusly centered,
scaled, stacked, tranformed and rearranged. See the manual for details
and how to access this information.m
imputed datatsets
with no missing values. The algorithm first bootstraps a sample dataset
with the same dimensions as the original data, estimates the sufficient statistics (with priors if specified) by EM, and then imputes the missing
values of sample. It repeats this process m
times to produce
the m
complete datasets where the observed values are the same and the unobserved values are drawn from their posterior distributions. The function will start a "fresh" run of the algorithm if x
is
either a incomplete matrix or data.frame. In this method, all of the
options will be user-defined or set to their default. If x
the output of
a previous Amelia run (that is, an object of class "amelia"), then
Amelia will run with the options used in that previous run. This is a
convenient way to run more imputations of the same model.
You can provide Amelia with informational priors about the missing
observations in your data. To specify priors, pass a four or five
column matrix to the priors
argument with each row specifying a
different priors as such:
one.prior <- c(row, column, mean,standard deviation)
or,
one.prior <- c(row, column, minimum, maximum, confidence)
.
So, in the first and second column of the priors matrix should be the row and column number of the prior being set. In the other columns should either be the mean and standard deviation of the prior, or a minimum, maximum and confidence level for the prior. You must specify your priors all as distributions or all as confidence ranges. Note that ranges are converted to distributions, so setting a confidence of 1 will generate an error.
Setting a priors for the missing values of an entire variable is done
in the same manner as above, but inputing a 0
for the row
instead of the row number. If priors are set for both the entire
variable and an individual observation, the individual prior takes
precedence.
In addition to priors, Amelia allows for logical bounds on
variables. The bounds
argument should be a matrix with 3
columns, with each row referring to a logical bound on a variable. The
first column should be the column number of the variable to be
bounded, the second column should be the lower bounds for that
variable, and the third column should be the upper bound for that
variable. As Amelia enacts these bounds by resampling, particularly
poor bounds will end up resampling forever. Amelia will stop
resampling after max.resample
attempts and simply set the
imputation to the relevant bound.
If each imputation is taking a long time to converge, you can increase
the empirical prior, empri
. This value has the effect of smoothing
out the likelihood surface so that the EM algorithm can more easily find
the maximum. It should be kept as low as possible and only used if needed.
Amelia assumes the data is distributed multivariate normal. There are a
number of variables that can break this assumption. Usually, though, a
transformation can make any variable roughly continuous and unbounded.
We have included a number of commonly needed transformations for data.
Note that the data will not be transformed in the output datasets and the
transformation is simply useful for climbing the likelihood.
Amelia can run its imputations in parallel using the methods of the
parallel
argument names the
parallel backend that Amelia should use. Users on Windows systems must
use the "snow"
option and users on Unix-like systems should use
"multicore"
. The multicore
backend sets itself up
automatically, but the snow
backend requires more setup. You
can pass a predefined cluster from the
parallel::makePSOCKcluster
function to the cl
argument. Without this cluster, Amelia will attempt to create a
reasonable default cluster and stop it once computation is
complete. When using the parallel backend, users can set the number of
CPUs to use with the ncpus
argument. The defaults for these two
arguments can be set with the options "amelia.parallel"
and
"amelia.ncpus"
.
Please refer to the Amelia manual for more information on the function
or the options.
missmap
, compare.density
,
overimpute
and disperse
. For time series
plots, tscsPlot
. Also: plot.amelia
,
write.amelia
, and ameliabind
.data(africa)
a.out <- amelia(x = africa, cs = "country", ts = "year", logs = "gdp_pc")
summary(a.out)
plot(a.out)
Run the code above in your browser using DataLab