coxphw: Weighted Estimation in Cox Regression

Description

Weighted Cox regression as proposed by Schemper et al. (2009) 10.1002/sim.3623 provides unbiased estimates of average hazard ratios also in case of non-proportional hazards. Time-dependent effects can be conveniently estimated by including interactions of covariates with arbitrary functions of time, with or without making use of the weighting option.

Usage

coxphw(formula, data, template = c("AHR", "ARE", "PH"), subset, na.action,
       robust = TRUE, jack = FALSE, betafix = NULL, alpha = 0.05,
       trunc.weights = 1, control, caseweights, x = TRUE, y = TRUE,
       verbose = FALSE, sorted = FALSE, id = NULL, …)

Arguments

formula

a formula object with the response on the left of the operator and the model terms on the right. The response must be a survival object as returned by Surv. formula may include offset-terms or functions of time (see example).

data

a data frame in which to interpret the variables named in formula.

template

choose among three pre-defined templates: "AHR" requests estimation of average hazard ratios (Schemper et al., 2009), "ARE" requests estimation of average regression effects (Xu and O'Quigley, 2000) and "PH" requests Cox proportional hazards regression. Recommended and default template is "AHR".

subset

expression indicating which subset of the rows of data should be used in the fit. All observations are included by default.

na.action

missing-data filtering. Defaults to options$na.action. Applied after subsetting data.

robust

if set to TRUE, the robust covariance estimate (Lin-Wei) is used; otherwise the Lin-Sasieni covariance estimate is applied. Default is TRUE.

jack

if set to TRUE, the variance is based on a complete jackknife. Each individual (as identified by id) is left out in turn. The resulting matrix of DFBETA residuals D is then used to compute the variance matrix: V = D'D. Default is FALSE.

betafix

can be used to restrict the estimation of one or more regression coefficients to pre-defined values. A vector with one element for each model term as given in formula is expected (with an identical order as in formula). If estimation of a model term is requested, then the corresponding element in betafix has to be set to NA, otherwise it should be set to the fixed parameter value. The default value is betafix = NULL, yielding unrestricted estimation of all regression coefficients.

alpha

the significance level (1-$\alpha$), 0.05 as default.

trunc.weights

specifies a quantile at which the (combined normalized) weights are to be truncated. It can be used to increase the precision of the estimates, particularly if template = "AHR" or "ARE" is used. Default is 1 (no truncation). Recommended value is 0.95 for mild truncation.

control

Object of class coxphw.control specifying iteration limit and other control options. Default is coxphw.control(...).

caseweights

vector of case weights, equivalent to weights in coxph. If caseweights is a vector of integers, then the estimated coefficients are equivalent to estimating the model from data with the individual cases replicated as many times as indicated by caseweights. These weights should not be confused with the weights in weighted Cox regression which account for the non-proportional hazards.

requests copying explanatory variables into the output object. Default is TRUE.

requests copying survival information into the output object. Default is TRUE.

verbose

requests echoing of intermediate results. Default is FALSE.

sorted

if set to TRUE, the data set will not be sorted prior to passing it to FORTRAN. This may speed up computations. Default is FALSE.

a vector of subject identification integer numbers starting from 1. These IDs are used for computing the robust covariance matrix. If id = NA (the default) the program assumes that each line of the data set refers to a distinct subject.

…

additional arguments.

Value

A list with the following components:

coefficients

the parameter estimates.

var

the estimated covariance matrix.

the degrees of freedom.

ci.lower

the lower confidence limits of exp(beta).

ci.upper

the upper confidence limits of exp(beta).

prob

the p-values.

linear.predictors

the linear predictors.

the number of observations.

dfbeta.resid

matrix of DFBETA residuals.

iter

the number of iterations needed to converge.

method.ties

the ties handling method.

PTcoefs

matrix with scale and shift used for pretransformation of fp()-terms.

% \item{fp.ind}{ if \code{fp()}-terms were used, a matrix with all transformations of fractional % polynomials used in the fitting process. } % \item{ind}{ if \code{fp()}-terms were used, an indicator which variables from all transformations of % fractional polynomials used in the fitting process were seleted in the final model. }

cov.j

the covariance matrix computed by the jackknife method (only computed if jack = TRUE).

cov.lw

the covariance matrix computed by the Lin-Wei method (robust covariance)

cov.ls

the covariance matrix computed by the Lin-Sasieni method.

cov.method

the method used to compute the (displayed) covariance matrix and the standard errors. This method is either "jack" if jack = TRUE, or "Lin-Wei" if jack = FALSE.

w.matrix

a matrix with four columns according to the number of uncensored failure times. The first column contains the failure times, the remaining columns (labeled w.raw, w.obskm, and w) contain the raw weights, the weights according to the inverse of the Kaplan-Meier estimates with reverse status indicator and the normalized product of both.

caseweights

if x = TRUE the case weights.

Wald

Wald-test statistics.

means

the means of the covariates.

offset.values

offset values.

dataline

the first dataline of the input data set (required for plotfp).

if x = TRUE the explanatory variables.

the response.

alpha

the significance level = 1 - confidence level.

template

the requested template.

formula

the model formula.

betafix

the betafix vector.

call

the function call.

%%\item{cards}{ . } %%\item{parms}{ . } %%\item{exit.code}{ . } %%\item{ioarray}{ . } %%\item{offset}{ . } %%\item{method}{the estimation method (usually weighted estimation).} %%\item{method.ci}{the confidence interval estimation method (profile likelihood or Wald).} %% \item{loglik}{the null and maximimized (penalized) log likelihood. - DAS GIBT ES NICHT !! }

Details

If Cox's proportional hazards regression is used in the presence of non-proportional hazards, i.e., with underlying time-dependent hazard ratios of prognostic factors, the average relative risk for such a factor is under- or overestimated and testing power for the corresponding regression parameter is reduced. In such a situation weighted estimation provides a parsimonious alternative to more elaborate modelling of time-dependent effects. Weighted estimation in Cox regression extends the tests by Breslow and Prentice to a multi-covariate situation as does the Cox model to Mantel's logrank test. Weighted Cox regression can also be seen as a robust alternative to the standard Cox estimator, reducing the influence of outlying survival times on parameter estimates.

Three pre-defined templates can be requested: 1) "AHR", i.e., estimation of average hazard ratios (Schemper et al., 2009) using Prentice weights with censoring correction and robust variance estimation; 2) "ARE", i.e., estimation of average regression effects (Xu and O'Quigley, 2000) using censoring correction and robust variance estimation; or 3) "PH", i.e., Cox proportional hazards regression using robust variance estimation.

Breslow's tie-handling method is used by the program, other methods to handle ties are currently not available.

A fit of coxphw with template = "PH" will yield identical estimates as a fit of coxph using Breslow's tie handling method and robust variance estimation (using cluster).

If robust = FALSE, the program estimates the covariance matrix using the Lin (1991) and Sasieni (1993) sandwich estimate $A^{-1}BA^{-1}$ with $-A$ and $-B$ denoting the sum of contributions to the second derivative of the log likelihood, weighted by $w(t_j)$ and $w(t_j)^2$, respectively. This estimate is independent from the scaling of the weights and reduces to the inverse of the information matrix in case of no weighting. However, it is theoretically valid only in case of proportional hazards. Therefore, since application of weighted Cox regression usually implies a violated proportional hazards assumption, the robust Lin-Wei covariance estimate is used by default (robust = TRUE).

If some regression coefficients are held constant using betafix, no standard errors are given for these coefficients as they are not estimated in the model. The global Wald test only relates to those variables for which regression coefficients were estimated.

An offset term can be included in the formula of coxphw. In this way a variable can be specified which is included in the model but its parameter estimate is fixed at 1.

References

Dunkler D, Ploner M, Schemper M, Heinze G. (2018) Weighted Cox Regression Using the R Package coxphw. JSS 84, 1--26, 10.18637/jss.v084.i02.

Lin D and Wei L (1989). The Robust Inference for the Cox Proportional Hazards Model. J AM STAT ASSOC 84, 1074-1078.

Lin D (1991). Goodness-of-Fit Analysis for the Cox Regression Model Based on a Class of Parameter Estimators. J AM STAT ASSOC 86, 725-728.

Royston P and Altman D (1994). Regression Using Fractional Polynomials of Continuous Covariates: Parsimonious Parametric Modelling. J R STAT SOC C-APPL 43, 429-467.

Royston P and Sauerbrei W (2008). Multivariable Model-Building. A Pragmatic Approach to Regression Analysis Based on Fractional Polynomials for Modelling Continuous Variables. Wiley, Chichester, UK.

Sasieni P (1993). Maximum Weighted Partial Likelihood Estimators for the Cox Model. J AM STAT ASSOC 88, 144-152.

Schemper M (1992). Cox Analysis of Survival Data with Non-Proportional Hazard Functions. J R STAT SOC D 41, 455-465.

Schemper M, Wakounig S and Heinze G (2009). The Estimation of Average Hazard Ratios by Weighted Cox Regression. STAT MED 28, 2473-2489. 10.1002/sim.3623

Xu R and O'Quigley J (2000). Estimating Average Regression Effect Under Non-Proportional Hazards. Biostatistics 1, 423-439.

Examples

Run this code

# NOT RUN {
data("gastric")

# weighted estimation of average hazard ratio
fit1 <- coxphw(Surv(time, status) ~ radiation, data = gastric, template = "AHR")
summary(fit1)
fit1$cov.lw     # robust covariance
fit1$cov.ls     # Lin-Sasieni covariance


# unweighted estimation, include interaction with years
# ('radiation' must be included in formula!)
gastric$years <- gastric$time / 365.25
fit2 <- coxphw(Surv(years, status) ~ radiation + years : radiation, data = gastric,
               template = "PH")
summary(fit2)


# unweighted estimation with a function of time
data("gastric")
gastric$yrs <- gastric$time / 365.25

fun <- function(t) { (t > 1) * 1 }
fit3 <- coxphw(Surv(yrs, status) ~ radiation + fun(yrs):radiation, data = gastric,
               template = "PH")

# for more examples see vignette or predict.coxphw
# }

Run the code above in your browser using DataLab