The bootstrap datasets are computed by resampling independent
transitions between pairs of states (for non-hidden models without
censoring), or independent individual series (for hidden models or
models with censoring). Therefore this approach doesn't work if,
for example, the data for a HMM consist of a series of observations
from just one individual, and is inaccurate for small numbers of
independent transitions or individuals. Confidence intervals or standard errors for the corresponding
statistic can be calculated by summarising the returned list of
B
replicated outputs. This is currently implemented for most
the output functions qmatrix.msm
, ematrix.msm
,
qratio.msm
,
pmatrix.msm
, pmatrix.piecewise.msm
,
totlos.msm
and prevalence.msm
.
For other outputs, users will have to write their own code to
summarise the output of boot.msm
.
Most of msm's output functions present confidence intervals
based on asymptotic standard errors calculated from the
Hessian. These are expected to be underestimates of the true standard
errors (Cramer-Rao lower bound). Some of these functions use a
further approximation, the delta method (see
deltamethod
) to obtain standard errors of transformed
parameters. Bootstrapping should give a more
accurate estimate of the uncertainty.
An alternative method which is less accurate though faster than
bootstrapping, but more accurate than the delta method, is to draw a
sample from the asymptotic multivariate normal distribution implied by
the maximum likelihood estimates (and covariance matrix), and
summarise the transformed estimates. See pmatrix.msm
.
All objects used in the original call to msm
which
produced x
, such as the qmatrix
, should be in the
working environment, or else boot.msm
will produce an
object not found error. This enables boot.msm
to
refit the original model to the replicate datasets. However there is
currently a limitation. In the original call to msm
, the
"formula"
argument should be specified directly, as, for
example,
msm(state ~ time, data = ...)
and not, for example,
form = data$state ~ data$time
msm(formula=form, data = ...)
otherwise boot.msm
will be unable to draw the replicate
datasets.
boot.msm
will also fail with an incomprehensible error if the
original call to msm used a used-defined object whose name is the same
as a built-in R object, or an object in any other loaded package. For
example, if you have called a Q matrix q
, when q()
is
the built-in function for quitting R.
If stat
is NULL
, then B
different msm
model
objects will be stored in memory. This is unadvisable, as msm
objects tend to be large, since they contain the original data used for
the msm
fit, so this will be wasteful of memory.
To specify more than one statistic, write a function consisting of a
list of different function calls, for example,
stat = function(x) list (pmatrix.msm(x, t=1), pmatrix.msm(x,
t=2))