standardize_coxph: Regression standardization in Cox proportional hazards models

Description

standardize_coxph performs regression standardization in Cox proportional hazards models at specified values of the exposure over the sample covariate distribution. Let $T$, $X$, and $Z$ be the survival outcome, the exposure, and a vector of covariates, respectively. standardize_coxph fits a Cox proportional hazards model and the Breslow estimator of the baseline hazard in order to estimate the standardized survival function $\theta(t,x)=E\{S(t|X=x,Z)\}$ when measure = "survival" or the standardized restricted mean survival up to time t $\theta(t, x) = E\{\int_0^t S(u|X = x, Z) du\}$ when measure = "rmean", where $t$ is a specific value of $T$, $x$ is a specific value of $X$, and the expectation is over the marginal distribution of $Z$.

Usage

standardize_coxph(
  formula,
  data,
  values,
  times,
  measure = c("survival", "rmean"),
  clusterid,
  ci_level = 0.95,
  ci_type = "plain",
  contrasts = NULL,
  family = "gaussian",
  reference = NULL,
  transforms = NULL
)

Value

An object of class std_surv. Obtain numeric results by using tidy.std_surv. This is a list with the following components:

res_contrast

An unnamed list with one element for each of the requested contrasts. Each element is itself a list with the elements:

call: The function call
input: A list with components used in the estimation
measure: Either "survival" or "rmean"
est: Estimated counterfactual means and standard errors for each exposure level
vcov: Estimated covariance matrix of counterfactual means for each time
est_table: Data.frame of the estimates of the contrast with inference
times: The vector of times used in the calculation
transform: The transform argument used for this contrast
contrast: The requested contrast type
reference: The reference level of the exposure
ci_type: Confidence interval type
ci_level: Confidence interval level

res

A named list with the elements:

call: The function call

input

A list with components used in the estimation

measure

Either "survival" or "rmean"

est

Estimated counterfactual means and standard errors for each exposure level

vcov

Estimated covariance matrix of counterfactual means for each time

Arguments

formula: The formula which is used to fit the model for the outcome.
data: The data.
values: A named list or data.frame specifying the variables and values at which marginal means of the outcome will be estimated.
times: A vector containing the specific values of $T$ at which to estimate the standardized survival function.
measure: Either "survival" to estimate the survival function at times or "rmean" for the restricted mean survival up to the largest of times.
clusterid: An optional string containing the name of a cluster identification variable when data are clustered.
ci_level: Coverage probability of confidence intervals.
ci_type: A string, indicating the type of confidence intervals. Either "plain", which gives untransformed intervals, or "log", which gives log-transformed intervals.
contrasts: A vector of contrasts in the following format: If set to "difference" or "ratio", then $\psi(x)-\psi(x_0)$ or $\psi(x) / \psi(x_0)$ are constructed, where $x_0$ is a reference level specified by the reference argument. Has to be NULL if no references are specified.
family: The family argument which is used to fit the glm model for the outcome.
reference: A vector of reference levels in the following format: If contrasts is not NULL, the desired reference level(s). This must be a vector or list the same length as contrasts, and if not named, it is assumed that the order is as specified in contrasts.
transforms: A vector of transforms in the following format: If set to "log", "logit", or "odds", the standardized mean $\theta(x)$ is transformed into $\psi(x)=\log\{\theta(x)\}$, $\psi(x)=\log[\theta(x)/\{1-\theta(x)\}]$, or $\psi(x)=\theta(x)/\{1-\theta(x)\}$, respectively. If the vector is NULL, then $\psi(x)=\theta(x)$.

Author

Arvid Sjölander, Adam Brand, Michael Sachs

Details

standardize_coxph fits the Cox proportional hazards model $$\lambda(t|X,Z)=\lambda_0(t)\exp\{h(X,Z;\beta)\}.$$ Breslow's estimator of the cumulative baseline hazard $\Lambda_0(t)=\int_0^t\lambda_0(u)du$ is used together with the partial likelihood estimate of $\beta$ to obtain estimates of the survival function $S(t|X=x,Z)$ if measure = "survival": $$\hat{S}(t|X=x,Z)=\exp[-\hat{\Lambda}_0(t)\exp\{h(X=x,Z;\hat{\beta})\}].$$ For each $t$ in the t argument and for each $x$ in the x argument, these estimates are averaged across all subjects (i.e. all observed values of $Z$) to produce estimates $$\hat{\theta}(t,x)=\sum_{i=1}^n \hat{S}(t|X=x,Z_i)/n,$$ where $Z_i$ is the value of $Z$ for subject $i$, $i=1,...,n$. The variance for $\hat{\theta}(t,x)$ is obtained by the sandwich formula.

If measure = "rmean", then $\Lambda_0(t)=\int_0^t\lambda_0(u)du$ is used together with the partial likelihood estimate of $\beta$ to obtain estimates of the restricted mean survival up to time t: $\int_0^t S(u|X=x,Z) du$ for each element of times. The estimation and inference is done using the method described in Chen and Tsiatis 2001. Currently, we can only estimate the difference in RMST for a single binary exposure. Two separate Cox models are fit for each level of the exposure, which is expected to be coded as 0/1.

References

Chang I.M., Gelman G., Pagano M. (1982). Corrected group prognostic curves and summary statistics. Journal of Chronic Diseases 35, 669-674.

Gail M.H. and Byar D.P. (1986). Variance calculations for direct adjusted survival curves, with applications to testing for no treatment effect. Biometrical Journal 28(5), 587-599.

Makuch R.W. (1982). Adjusted survival curve estimation using covariates. Journal of Chronic Diseases 35, 437-443.

Sjölander A. (2016). Regression standardization with the R-package stdReg. European Journal of Epidemiology 31(6), 563-574.

Sjölander A. (2018). Estimation of causal effect measures with the R-package stdReg. European Journal of Epidemiology 33(9), 847-858.

Chen, P. Y., Tsiatis, A. A. (2001). Causal inference on the difference of the restricted mean lifetime between two groups. Biometrics, 57(4), 1030-1038.

Examples

Run this code



require(survival)
set.seed(7)
n <- 300
Z <- rnorm(n)
Zbin <- rbinom(n, 1, .3)
X <- rnorm(n, mean = Z)
T <- rexp(n, rate = exp(X + Z + X * Z)) # survival time
C <- rexp(n, rate = exp(X + Z + X * Z)) # censoring time
fact <- factor(sample(letters[1:3], n, replace = TRUE))
U <- pmin(T, C) # time at risk
D <- as.numeric(T < C) # event indicator
dd <- data.frame(Z, Zbin, X, U, D, fact)
fit.std.surv <- standardize_coxph(
  formula = Surv(U, D) ~ X + Z + X * Z,
  data = dd,
  values = list(X = seq(-1, 1, 0.5)),
  times = 1:5
)
print(fit.std.surv)
plot(fit.std.surv)
tidy(fit.std.surv)

fit.std <- standardize_coxph(
  formula = Surv(U, D) ~ X + Zbin + X * Zbin + fact,
  data = dd,
  values = list(Zbin = 0:1),
  times = 1.5,
  measure = "rmean",
  contrast = "difference",
  reference = 0
)
print(fit.std)
tidy(fit.std)

Run the code above in your browser using DataLab