msm ( formula, subject=NULL, data = list(), qmatrix, gen.inits = FALSE,
ematrix=NULL, hmodel=NULL, obstype=NULL, obstrue=NULL,
covariates = NULL, covinits = NULL, constraint = NULL,
misccovariates = NULL, misccovinits = NULL, miscconstraint = NULL,
hcovariates = NULL, hcovinits = NULL, hconstraint = NULL, hranges=NULL,
qconstraint=NULL, econstraint=NULL, initprobs = NULL,
est.initprobs=FALSE, initcovariates = NULL, initcovinits = NULL,
death = FALSE, exacttimes = FALSE, censor=NULL,
censor.states=NULL, pci=NULL, cl = 0.95, fixedpars = NULL, center=TRUE,
opt.method="optim", hessian=NULL, use.deriv=TRUE,
use.expm=TRUE, analyticp=TRUE, na.action=na.omit, ... )
state ~ time
Observed states should be in the set 1, ..., n
, where
n
is the number o
formula
. If missing, then all observations
are assumed to be on the same subject. These must be sorted so that
all observations on the same subject are adjacent.formula
, subject
, covariates
,
misccovariates
, hcovariates
, obstype
and
obstrue
.qmatrix
TRUE
, then initial values for the
transition intensities are generated automatically using the method
in crudeinits.msm
. The non-zero
entries of the supplied qmatrix
hmm-dists
help
page. Each element of the list corresponds to data
along with the state, time, subject IDs and covariates. Its
elements should be either 1, 2 or 3, meaning as followsTRUE
or FALSE
) or
numerics (1 or 0) specifying which observations (TRUE
, 1) are
observations of the underlying state without error, and which
(FALSE
, 0) are realisations of a covariates = ~ age + sex + treatment
then these covariates are assumed t
constrai
covariates
,
via multinomial logistic regression. Only used if the model is
specified using ematrix
, rather than hmodel
covinits
.
Only used if the model is specified using ematrix
.constraint
.
Only usehmodel
, defining any covariates
governing the hidden Markov outcome models. The covariates operate
on a suitably link-transformed linear scale, for example, log scale
for a Poisson outcome modehcovariates
. Each element is a vector with
initial values for the effect of each covariate on that state. For example, the
above hco
For example consider the three-state hidd
qconstraint = c(1,2,3,3)
constrains the third and fourth intensities to be equal, in a model with four allowed instantaneous tra
ematrix
, rather
than hmodel
.TRUE
, then the underlying state
occupancy probabilities at the first observation will be estimated,
starting from a vector of initial values supplied in the initprobs
argumentinitcovariates
. A named list with each element corresponding
to a covariate, as in covinits
. Each element is a vector with
(1 - number of states) elements, containing the initcensor=999
indicates that all observations of 999
icensor
is a single number (the default) this
can be a vector, or a list with one element. If censor
is a
vector with more than one element pci = c(5, 10)
specifies that the intensity c
exacttimes
is set to TRUE
, then
the observation times are assumed to represent theTRUE
(the default, unless
fixedpars=TRUE
) then covariates are centered at their means
during the maximum likelihood estimation. This usually improves
stability of the numerical optimisation.optim
is the default. "bobyqa" requires the package
TRUE
then standard errors and confidence
intervals are obtained from a numerical estimate of the Hessian (the
observed information matrix). This is the default when maximum
likelihood estimation is performed. If all parameterTRUE
then analytic first derivatives are
used in the optimisation of the likelihood, when an appropriate
quasi-Newton optimisation method, such as BFGS, is being used. Note
that the default for
TRUE
then any matrix exponentiation needed to
calculate the likelihood is done using the FALSE
forna.omit
to drop it and carry on, or na.fail
to stop with an error.
Missing data includes all NAs in the states, times, subject
or
obstrue
, all NAs at the msm
function, it is recommended to use extractor
functions such as qmatrix.msm
,
pmatrix.msm
, sojourn.msm
, msm.form.qoutput
. These provide
estimates and confidence intervals for quantities such as transition
probabilities for given covariate values. For advanced use, it may be necessary to directly use information
stored in the object returned by msm
. This is
documented in the help page msm.object
.
Printing a msm
object by typing the object's name at the
command line implicitly invokes print.msm
. This
formats and prints the important information in the model fit,
and also returns that information in an R object.
This includes estimates and confidence intervals for the transition intensities and (log) hazard
ratios for the corresponding covariates. When there is a hidden Markov
model, the chief information in the
hmodel
component is also formatted and printed. This includes
estimates and confidence intervals for each
parameter.
pmatrix.msm(...,t=1)
is equivalent to the
matrix that governs the discrete-time model. However, these can be
fitted more efficiently using multinomial logistic regression, for
example, using multinom
from the R package
For simple continuous-time multi-state Markov models, the likelihood is calculated in terms of the transition intensity matrix $Q$. When the data consist of observations of the Markov process at arbitrary times, the exact transition times are not known. Then the likelihood is calculated using the transition probability matrix $P(t) = \exp(tQ)$, where $\exp$ is the matrix exponential. If state $i$ is observed at time $t$ and state $j$ is observed at time $u$, then the contribution to the likelihood from this pair of observations is the $i,j$ element of $P(u - t)$. See, for example, Kalbfleisch and Lawless (1985), Kay (1986), or Gentleman et al. (1994).
For hidden Markov models, the likelihood for an individual with $k$ observations is calculated directly by summing over the unknown state at each time, producing a product of $k$ matrices. The calculation is a generalisation of the method described by Satten and Longini (1996), and also by Jackson and Sharples (2002), and Jackson et al. (2003).
There must be enough information in the data on each state to estimate each transition rate, otherwise the likelihood will be flat and the maximum will not be found. It may be appropriate to reduce the number of states in the model, the number of allowed transitions, or the number of covariate effects, to ensure convergence. Hidden Markov models, and situations where the value of the process is only known at a series of snapshots, are particularly susceptible to non-identifiability, especially when combined with a complex transition matrix. Choosing an appropriate set of initial values for the optimisation can also be important. For flat likelihoods, 'informative' initial values will often be required. See the PDF manual for other tips.
Kalbfleisch, J., Lawless, J.F., The analysis of panel data under a Markov assumption Journal of the Americal Statistical Association (1985) 80(392): 863--871.
Kay, R. A Markov model for analysing cancer markers and disease states in survival studies. Biometrics (1986) 42: 855--865.
Gentleman, R.C., Lawless, J.F., Lindsey, J.C. and Yan, P. Multi-state Markov models for analysing incomplete disease history data with illustrations for HIV disease. Statistics in Medicine (1994) 13(3): 805--821.
Satten, G.A. and Longini, I.M. Markov chains with measurement error: estimating the 'true' course of a marker of the progression of human immunodeficiency virus disease (with discussion) Applied Statistics 45(3): 275-309 (1996)
Jackson, C.H. and Sharples, L.D. Hidden Markov models for the onset and progression of bronchiolitis obliterans syndrome in lung transplant recipients Statistics in Medicine, 21(1): 113--128 (2002).
Jackson, C.H., Sharples, L.D., Thompson, S.G. and Duffy, S.W. and Couto, E. Multi-state Markov models for disease progression with classification error. The Statistician, 52(2): 193--209 (2003)
Venables, W.N. and Ripley, B.D. (2002) Modern Applied Statistics with S, second edition. Springer.
simmulti.msm
, plot.msm
,
summary.msm
, qmatrix.msm
,
pmatrix.msm
, sojourn.msm
.### Heart transplant data
### For further details and background to this example, see
### Jackson (2011) or the PDF manual in the doc directory.
print(cav[1:10,])
twoway4.q <- rbind(c(-0.5, 0.25, 0, 0.25), c(0.166, -0.498, 0.166, 0.166),
c(0, 0.25, -0.5, 0.25), c(0, 0, 0, 0))
statetable.msm(state, PTNUM, data=cav)
crudeinits.msm(state ~ years, PTNUM, data=cav, qmatrix=twoway4.q)
cav.msm <- msm( state ~ years, subject=PTNUM, data = cav,
qmatrix = twoway4.q, death = 4,
control = list ( trace = 2, REPORT = 1 ) )
cav.msm
qmatrix.msm(cav.msm)
pmatrix.msm(cav.msm, t=10)
sojourn.msm(cav.msm)
Run the code above in your browser using DataLab