prevalence.msm: Tables of observed and expected prevalences

Description

This provides a rough indication of the goodness of fit of a multi-state model, by estimating the observed numbers of individuals occupying each state at a series of times, and comparing these with forecasts from the fitted model.

Usage

prevalence.msm(x, times, timezero=NULL, initstates, covariates="mean",
               misccovariates="mean")

Arguments

A fitted multi-state model produced by msm.

times

Series of times at which to compute the observed and expected prevalences of states.

timezero

Initial time of the Markov process. Expected values are forecasted from here. Defaults to the minimum of the observation times given in the data.

initstates

Optional vector of the same length as the number of states. Gives the numbers of individuals occupying each state at the initial time. The default is those observed in the data.

covariates

Covariate values for which to forecast expected state occupancy. See qmatrix.msm. Defaults to the mean values of the covariates in the data set.

misccovariates

(Misclassification models only) Values of covariates on the misclassification probability matrix for which to forecast expected state occupancy. Defaults to the mean values of the covariates in the data set.

Value

A list with components:
ObservedTable of observed numbers of individuals in each state at each time
Observed percentagesCorresponding percentage of the individuals at risk at each time.
ExpectedTable of corresponding expected numbers.
Expected percentagesCorresponding percentage of the individuals at risk at each time.

concept

Goodness of fit

Details

To compute `observed' prevalences at a time $t$, individuals are assumed to be in the same state as at their last observation time preceding $t$. This calculation is rather slow when there are a large number of individuals in the data.

The fitted transition probability matrix is used to forecast expected prevalences from the state occupancy at the initial time. To produce the expected number in state $j$ at time $t$ after the start, the number of individuals under observation at time $t$ (including those who have died, but not those lost to follow-up) is multiplied by the probability of transition between the initial state and state $j$ in a time interval $t$.

For misclassification models (fitted using an ematrix), this aims to assess the fit of the full model for the observed states. That is, the combined Markov progression model for the true states and the misclassification model. Thus, expected prevalences of true states are estimated from the assumed proportion occupying each state at the initial time using the fitted transition probabiliy matrix. The vector of expected prevalences of true states is then multiplied by the fitted misclassification probability matrix to obtain the expected prevalences of observed states.

For general hidden Markov models, the observed state is taken to be the predicted underlying state from Viterbi algorithm (viterbi.msm). The goodness of fit of these states to the underlying Markov model is tested. This approach only makes sense for processes where all individuals start at a common time. For an example of this approach, see Gentleman et al. (1994).

References

Gentleman, R.C., Lawless, J.F., Lindsey, J.C. and Yan, P. Multi-state Markov models for analysing incomplete disease history data with illustrations for HIV disease. Statistics in Medicine (1994) 13(3): 805--821.