pearson.msm: Pearson-type goodness-of-fit test

Description

Pearson-type goodness-of-fit test for multi-state models fitted to panel-observed data.

Usage

pearson.msm(x, transitions=NULL, timegroups=3, intervalgroups=3,
            covgroups=3, groups=NULL, boot=FALSE, B=500,
            next.obstime=NULL, N=100, indep.cens=TRUE,
            maxtimes=NULL, pval=TRUE)

Arguments

A fitted multi-state model, as returned by msm.

transitions

This should be an integer vector indicating which interval transitions should be grouped together in the contingency table. Its length should be the number of allowed interval transitions, excluding transitions from absorbing states to ab

timegroups

Number of groups based on quantiles of the time since the start of the process.

intervalgroups

Number of groups based on quantiles of the time interval between observations, within time groups

covgroups

Number of groups based on quantiles of $\sum_r q_{irr}$, where $q_{irr}$ are the diagonal entries of the transition intensity matrix for the ith transition. These are a function of the covariate effects and the covariate values at the

groups

A vector of arbitrary groups in which to categorise each transition. This can be an integer vector or a factor. This can be used to diagnose specific areas of poor fit. For example, the contingency table might be grouped by arbitrary combi

boot

Estimate an "exact" p-value using a parametric bootstrap.

All objects used in the original call to msm which produced x, such as the qmatrix, should be in the working environm

Number of bootstrap replicates.

next.obstime

This is a vector of length x$data$n (the number of observations used in the model fit) giving the time to the next scheduled observation following each time point. This is only used when times to death are known exactly.

Number of imputations for the estimation of the distribution of the next scheduled observation time, when there are exact death times.

indep.cens

If TRUE, then times to censoring are included in the estimation of the distribution to the next scheduled observation time. If FALSE, times to censoring are assumed to be systematically different from other observati

maxtimes

A vector of length x$data$n, or a common scalar, giving an upper bound for the next scheduled observation time. Used in the multiple imputation when times to death are known exactly. If a value greater than maxtimes

pval

Calculate a p-value using the improved approximation of Titman (2009). This is optional since it is not needed during bootstrapping, and it is computationally non-trivial. Only available currently for non-hidden Markov models for panel d

Value

A list whose first two elements are contingency tables of observed transitions $O$ and expected transitions $E$, respectively, for each combination of groups. The third element is a table of the deviances $(O - E)^2 / E$ multiplied by the sign of $O - E$. If the expected number of transitions is zero then the deviance is zero. Entries in the third matrix will be bigger in magnitude for groups for which the model fits poorly.
"test"the fourth element of the list, is a data frame with one row containing the Pearson-type goodness-of-fit test statistic stat. The test statistic is the sum of the deviances. For panel-observed data without exact death times, misclassification or censored observations, p is the p-value for the test statistic calculated using the improved approximation of Titman (2009).
For these models, for comparison with older versions of the package, test also presents p.lower and p.upper, which are theoretical lower and upper limits for the p-value of the test statistic, based on latex{$\chi^2$}{chi-squared} distributions with df.lower and df.upper degrees of freedom, respectively. df.upper is the number of independent cells in the contingency table, and df.lower is df.upper minus the number of estimated parameters in the model.
"intervalq"(not printed by default) contains the definition of the grouping of the intervals between observations. These groups are defined by quantiles within the groups corresponding to the time since the start of the process.
"sim"If there are exact death times, this contains simulations of the contingency tables and test statistics for each imputation of the next scheduled sampling time. These are averaged over to produce the presented tables and test statistic. This element is not printed by default.
With exact death times, the null variance of the test statistic (formed by taking mean of simulated test statistics) is less than twice the mean (Titman, 2008), and the null distribution is not latex{$\chi^2$}{chi-squared}. In this case, p.upper is an upper limit for the true asymptotic p-value, but p.lower is not a lower limit, and is not presented.
"boot"If the bootstrap has been used, the element will contain the bootstrap replicates of the test statistics (not printed by default).
"lambda"If the Titman (2009) p-value has been calculated, this contains the weights defining the null distribution of the test statistic as a weighted sum of latex{$\chi^2_1$}{chi-squared(1)} random variables (not printed by default).

Details

This method (Aguirre-Hernandez and Farewell, 2002) is intended for data which represent observations of the process at arbitrary times ("snapshots", or "panel-observed" data). For data which represent the exact transition times of the process, prevalence.msm can be used to assess fit, though without a formal test.

When times of death are known exactly, states are misclassified, or an individual's final observation is a censored state, the modification by Titman and Sharples (2008) is used. The only form of censoring supported is a state at the end of an individual's series which represents an unknown transient state (i.e. the individual is only known to be alive at this time). Other types of censoring are omitted from the data before performing the test.

See the references for further details of the methods. The method used for censored states is a modification of the method in the appendix to Titman and Sharples (2008), described at http://www.mrc-bsu.cam.ac.uk/personal/chris/papers/robustcensoring.pdf (Titman, 2007).

Groupings of the time since initiation, the time interval and the impact of covariates are based on equally-spaced quantiles. The number of groups should be chosen that there are not many cells with small expected numbers of transitions, since the deviance statistic will be unstable for sparse contingency tables. Ideally, the expected numbers of transitions in each cell of the table should be no less than about 5. Conversely, the power of the test is reduced if there are too few groups. Therefore, some sensitivity analysis of the test results to the grouping is advisable.

Saved model objects fitted with previous versions of R (versions less than 1.2) will need to be refitted under the current R for use with pearson.msm.

References

Aguirre-Hernandez, R. and Farewell, V. (2002) A Pearson-type goodness-of-fit test for stationary and time-continuous Markov regression models. Statistics in Medicine 21:1899-1911.

Titman, A. and Sharples, L. (2008) A general goodness-of-fit test for Markov and hidden Markov models. Statistics in Medicine 27(12):2177-2195

Titman, A. (2009) Computation of the asymptotic null distribution of goodness-of-fit tests for multi-state models. Lifetime Data Analysis 15(4):519-533.

Titman, A. (2008) Model diagnostics in multi-state models of biological systems. PhD thesis, University of Cambridge.

Examples

Run this code

psor.q <- rbind(c(0,0.1,0,0),c(0,0,0.1,0),c(0,0,0,0.1),c(0,0,0,0))
psor.msm <- msm(state ~ months, subject=ptnum, data=psor,
                qmatrix = psor.q, covariates = ~ollwsdrt+hieffusn,
                constraint = list(hieffusn=c(1,1,1),ollwsdrt=c(1,1,2)))
pearson.msm(psor.msm, timegroups=2, intervalgroups=2, covgroups=2)
# More 1-2, 1-3 and 1-4 observations than expected in shorter time
# intervals - the model fits poorly.
# A random effects model might accommodate such fast progressors.

Run the code above in your browser using DataLab