coxed.npsf: Predict expected durations using the GAM method

Description

This function is called by coxed and is not intended to be used by itself.

Usage

coxed.npsf(cox.model, newdata = NULL, coef = NULL, b.ind = NULL)

Arguments

cox.model

The output from a Cox proportional hazards model estimated with the coxph function in the survival package or with the cph function in the rms package

newdata

An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used

coef

A vector of new coefficients to replace the coefficients attribute of the cox.model. Used primarily for bootstrapping, to recalculate durations using new coefficients derived from a bootstrapped sample. If NULL, the original coefficients are employed

b.ind

A vector of observation numbers to pass to the estimation sample to construct the a bootstrapped sample with replacement

Value

Returns a list containing the following components:

`exp.dur`	A vector of predicted mean durations for the estimation sample if `newdata` is omitted, or else for the specified new data.
`baseline.functions`	The estimated cumulative baseline hazard function and survivor function.

Details

The non-parametric step function (NPSF) approach to calculating expected durations from the Cox proportional hazards model, described in Kropko and Harden (2018), uses the method proposed by Cox and Oakes (1984, 107-109) for estimating the cumulative baseline hazard function. This method is nonparametric and results in a step-function representation of the cumulative baseline hazard.

Cox and Oakes (1984, 108) show that the cumulative baseline hazard function can be estimated after fitting a Cox model by $$\hat{H}_0(t) = \sum_{\tau_j < t}\frac{d_j}{\sum_{l\in\Re(\tau_j)}\hat{\psi}(l)},$$ where $\tau_j$ represents time points earlier than $t$, $d_j$ is a count of the total number of failures at $\tau_j$, $\Re(\tau_j)$ is the remaining risk set at $\tau_j$, and $\hat{\psi}(l)$ represents the ELP from the Cox model for observations still in the risk set at $\tau_j$. This equation is used calculate the cumulative baseline hazard at all time points in the range of observed durations. This estimate is a stepwise function because time points with no failures do not contribute to the cumulative hazard, so the function is flat until the next time point with observed failures.

We extend this method to obtain expected durations by first calculating the baseline survivor function from the cumulative hazard function, using $$\hat{S}_0(t) = \exp[-\hat{H}_0(t)].$$ Each observation's survivor function is related to the baseline survivor function by $$\hat{S}_i(t) = \hat{S}_0(t)^{\hat{\psi}(i)},$$ where $\hat{\psi}(i)$ is the exponentiated linear predictor (ELP) for observation $i$. These survivor functions can be used directly to calculate expected durations for each observation. The expected value of a non-negative random variable can be calculated by $$E(X) = \int_0^{\infty} \bigg(1 - F(t)\bigg)dt,$$ where $F(.)$ is the cumulative distribution function for $X$. In the case of a duration variable $t_i$, the expected duration is $$E(t_i) = \int_0^T S_i(t)\,dt,$$ where $T$ is the largest possible duration and $S(t)$ is the individual's survivor function. We approximate this integral with a right Riemann-sum by calculating the survivor functions at every discrete time point from the minimum to the maximum observed durations, and multiplying these values by the length of the interval between time points with observed failures: $$E(t_i) \approx \sum_{t_j \in [0,T]} (t_j - t_{j-1})S_i(t_j).$$

References

Kropko, J. and Harden, J. J. (2018). Beyond the Hazard Ratio: Generating Expected Durations from the Cox Proportional Hazards Model. British Journal of Political Science https://doi.org/10.1017/S000712341700045X

Cox, D. R., and Oakes, D. (1984). Analysis of Survival Data. Monographs on Statistics & Applied Probability

Examples

Run this code

# NOT RUN {
mv.surv <- Surv(martinvanberg$formdur, event = rep(1, nrow(martinvanberg)))
mv.cox <- coxph(mv.surv ~ postel + prevdef + cont + ident + rgovm + pgovno + tpgovno +
     minority, method = "breslow", data = martinvanberg)

ed <- coxed.npsf(mv.cox)
ed$baseline.functions
ed$exp.dur

#Running coxed.npsf() on a bootstrap sample and with new coefficients
bsample <- sample(1:nrow(martinvanberg), nrow(martinvanberg), replace=TRUE)
newcoefs <- rnorm(8)
ed2 <- coxed.npsf(mv.cox, b.ind=bsample, coef=newcoefs)
# }

Run the code above in your browser using DataLab