dxPlot: Diagnostic plots for coxph model

Description

Diagnostic plots for coxph model

Usage

dxPlot(x, ...)

  ## S3 method for class 'coxph':
dxPlot(x, ..., ties = "breslow",
    defCont = 2, noQuantiles = 5, noPerPage = 2,
    height = NULL, width = NULL)

Arguments

An object of class coxph

...

Additional arguments (not implemented)

ties

Method of handling ties when refitting model. Must be one of breslow or efron

defCont

Definition of continuous variable. Variables with more than n unique values will be split into quantiles to facilitate graphs. (This does not apply to factor variables)

noQuantiles

No. of quantiles into which to split continuous variables

noPerPage

Number of plots per page

width

Width of screen (display device) in pixels. Set to NULL for default plot size

height

Height of screen (display device) in pixels. Set to NULL for default plot size

Value

Graphs as described above. Plotted with base graphics.

Details

The Cox-Snell residuals are used to assess the fit of a proportional hazards model. The residuals are generated from the (non time-dependent) covariates $Z$, a matrix with one row per observation (total $n$) and additional indicators of time $T$ and status $\delta$. estimated coefficients $b$, where $b$ is a vector of length $p$ (no. of predictors) : $$r_j = \hat{H}_0(T_j) \exp ( \sum_{k=1}^p Z_{jk}b_k ), \quad j=1,...,n$$ Here $\hat{H}_0$ is Breslows estimator of the `baseline' hazard (i.e. all coefficients are zero). If the coefficients are close to their true values, then $r_j$ should follow a unit-exponential distribution, i.e. $H_0(t)=t$. To verify this, we calculate the Nelson-Aalen estimator of the cumulative hazard rate of the $r_j$s. A plot of this estimator against $r_j$ should be a straight line through the origin with a slope of $1$. The martingale residual is used to help determine the best functional form of a covariate in a coxph model. The Cox model assumes that the hazard function satisfies: $$\lambda_{i}(t) = \lambda_0(t) \exp(X_i\beta)$$ That is, for a continuous variable, an unit increase in the variable produces the same change in risk across the value of the variable. (E.g. an increase of age of 5 years leads to the same change in hazard no matter what the increase is from or to). To verify this is the case, a null model is fitted (i.e no coefficients, similar to intercept-only model in linear regression). Martingale residuals are calcuated for this. Plots of these residuals against the values of each of the predictors in the model are shown. If the correct model for covariate $j$ is based on a smooth function $f()$, i.e. $\exp(f(X_j)\beta_j)$ then the following should hold: $$E(M_i|X_{ij}=x_j) \approx cf(x_j)$$ Where $M$ is the martingale residual and $c$ depends on the amount of censoring and is roughly independent of $x_j$. A lowess smoothed line is added to the plot. This should be approximately linear if the assumption of proportional hazards is met. If the plot shows a threshold, a discretised version of the covariate may be preferable. The assumption of proportional hazards can be checked in a number of ways. These methods work by stratifying a covariate $G$ into $k$ disjoint strata. A stratified coxph model is fitted to these strata and one is selected as a reference. The cumulative hazard $\hat{H}_g(t)$ is plotted for each stratum $g$. These should be a constant multiple of the reference stratum $\hat{H}_1(t)$ over time. Another way to compare these is to plot the differences in log cumulative hazard, that is: $$\log \hat{H}_g(t) - \log \hat{H}_1(t), \quad g = 2 , ..., k$$ Each curve should be horizontal and constant over time. Curves above zero indicate increased hazard in the stratum $g$ vs the reference at that time. Finally Andersen plots show $$\log \hat{H}_1(t) vs \log \hat{H}_g(t), \quad g = 2,...,k$$ If proportional hazards are present, these should be straight lines through the origin. If the curve is convex this shows that $\hat{H}_g(t) \div \hat{H}_1(t)$ is an increasing function of $t$. Thus if convex, the hazard rate in $g$ is increasing vs the reference.

References

Examples are from Klein J, Moeschberger M 2003 Survival Analysis, 2nd edition. New York: Springer. Example 11.1 - 11.7, pg 355-66 Last example is from: Therneau T, Grambsch P 2000. Modeling Survival Data, 1st edition. New York: Springer. Section 5.1.2, pg 91. Andersen PK, Borgan O, Gill R, Keiding N 1982. Linear Nonparametric Tests for Comparison of Counting Processes, with Applications to Censored Survival Data, Correspondent Paper. International Statistical Review 50(3):219--44. http://www.jstor.org/stable/1402489{JSTOR}

Examples

Run this code

data(bmt, package="KMsurv")
bmt <- within(bmt, {
z1 <- z1 -28
z2 <- z2- 28
z3 <- z1*z2
z4 <- as.double( group== 2 )
z5 <- as.double( group== 3 )
z6 <- z8
z7 <- (z7 / 30) - 9
z8 <- z10
})
c1 <- coxph(Surv(t2,d3) ~ z1+z2+z3+z4+z5+z6+z7+z8,
            method="breslow",
            data=bmt)
dxPlot(c1)
data(alloauto, package="KMsurv")
c1 <- coxph(Surv(time,delta) ~ factor(type),
            method='breslow',
            data=alloauto)
dxPlot(c1)
c1 <- coxph(formula = Surv(time, status == 2) ~ age + log(bili), data=pbc)
dxPlot(c1)

Run the code above in your browser using DataLab