cox_cure: Cox Cure Rate Model

Description

For right-censored data, the function cox_cure() trains a Cox cure rate model (Kuk and Chen, 1992; Sy and Taylor, 2000) via an expectation maximization (EM) algorithm; For right-censored data with missing/uncertain event/censoring indicators, the function fits the Cox cure rate model proposed by Wang et al. (2023).

Usage

cox_cure(
  surv_formula,
  cure_formula,
  time,
  event,
  data,
  subset,
  contrasts = NULL,
  bootstrap = 0L,
  surv_mstep = cox_cure.mstep(),
  cure_mstep = cox_cure.mstep(),
  control = cox_cure.control(),
  ...
)
cox_cure.fit(
  surv_x,
  cure_x,
  time,
  event,
  cure_intercept = TRUE,
  bootstrap = 0L,
  surv_mstep = cox_cure.mstep(),
  cure_mstep = cox_cure.mstep(),
  control = cox_cure.control(),
  ...
)
cox_cure.control(
  tail_completion = c("zero", "exp", "tau-zero"),
  tail_tau = Inf,
  maxit = 100,
  epsilon = 1e-04,
  pmin = 1e-05,
  save_call = TRUE,
  verbose = 0,
  ...
)
cox_cure.mstep(
  start = NULL,
  offset = NULL,
  maxit = 10,
  epsilon = 1e-04,
  standardize = TRUE,
  ...
)

Value

A cox_cure object that contains the fitted ordinary Cox cure rate model if none of the event indicators is NA. For right-censored data with uncertain/missing event indicators, a

cox_cure_uncer object is returned.

Arguments

surv_formula: A formula object starting with ~ for the model formula in survival model part. For Cox model, no intercept term is included even if an intercept is specified or implied in the model formula. A model formula with an intercept term only is not allowed.
cure_formula: A formula object starting with ~ for the model formula in incidence model part. For logistic model, an intercept term is included by default and can be excluded by adding + 0 or - 1 to the model formula.
time: A numeric vector for the observed survival times.
event: A numeric vector for the event indicators, where NA's are allowed and represent uncertain event indicators.
data: An optional data frame, list, or environment that contains the model covariates and response variables (time and event) If they are not found in data, the variables are taken from the environment of the specified formula, usually the environment from which this function is called.
subset: An optional logical vector specifying a subset of observations to be used in the fitting process.
contrasts: An optional list, whose entries are values (numeric matrices or character strings naming functions) to be used as replacement values for the contrasts replacement function and whose names are the names of columns of data containing factors. See contrasts.arg of model.matrix.default for details.
bootstrap: An integer representing the number of bootstrap samples for estimating standard errors of the coefficient estimates. The bootstrap procedure will not run if bootstrap = 0 by default. If bootstrap > 0, the specified number of bootstrap samples will be used for estimating the standard errors.
surv_mstep, cure_mstep: A named list passed to cox_cure.mstep() specifying the control parameters for the corresponding M-steps.
control: A cox_cure.control object that contains the control parameters.
...: Other arguments passed to the control functions for backward compatibility.
surv_x: A numeric matrix for the design matrix of the survival model component.
cure_x: A numeric matrix for the design matrix of the cure rate model component. The design matrix should exclude an intercept term unless we want to fit a model only including the intercept term. In that case, we need further set cure_intercept = FALSE to not standardize the intercept term.
cure_intercept: A logical value specifying whether to add an intercept term to the cure rate model component. If TRUE by default, an intercept term is included.
tail_completion: A character string specifying the tail completion method for conditional survival function. The available methods are "zero" for zero-tail completion after the largest event times (Sy and Taylor, 2000), "exp" for exponential-tail completion (Peng, 2003), and "tau-zero" for zero-tail completion after a specified tail_tau. The default method is the zero-tail completion proposed by Sy and Taylor (2000).
tail_tau: A numeric number specifying the time of zero-tail completion. It will be used only if tail_completion = "tau-zero". A reasonable choice must be a time point between the largest event time and the largest survival time.
maxit: A positive integer specifying the maximum iteration number. The default value is 1000.
epsilon: A positive number specifying the tolerance that determines the convergence of the coefficient estimates. The tolerance is compared with the relative change between estimates from two consecutive iterations, which is measured by the ratio of the L1-norm of their difference to the sum of their L1-norms plus one.
pmin: A positive number specifying the minimum value of probabilities for numerical stability. The default value is 1e-5.
save_call: A logical value indicating if the function call should be saved. For large datasets, saving the function call would increase the size of the returned object dramatically. We may want to set save_call = FALSE if the original function call is not needed.
verbose: A nonnegative integer for verbose outputs, which is mainly useful for debugging.
start: A numeric vector representing the initial values for the underlying model estimation procedure. If standardize is TRUE, the specified initial values will be scaled internally to match the standardized data. The default initial values depend on the specific models and based on the observed data. If inappropriate initial values (in terms of length) are specified, the default values will be used.
offset: A numeric vector specifying the offset term. The length of the specified offset term should be equal to the sample size.
standardize: A logical value specifying if each covariate should be standardized to have mean zero and standard deviation one internally for numerically stability and fair regularization. The default value is TRUE. The coefficient estimates will always be returned in original scales.

References

Kuk, A. Y. C., & Chen, C. (1992). A mixture model combining logistic regression with proportional hazards regression. Biometrika, 79(3), 531--541.

Peng, Y. (2003). Estimating baseline distribution in proportional hazards cure models. Computational Statistics & Data Analysis, 42(1-2), 187--201.

Sy, J. P., & Taylor, J. M. (2000). Estimation in a Cox proportional hazards cure model. Biometrics, 56(1), 227--236.

Wang, W., Luo, C., Aseltine, R. H., Wang, F., Yan, J., & Chen, K. (2023). Survival Modeling of Suicide Risk with Rare and Uncertain Diagnoses. Statistics in Biosciences, 17(1), 1--27.

Examples

Run this code

library(intsurv)

### 1. Cox cure rate model
## simulate right-censored data with a cure fraction
set.seed(123)
n_obs <- 2e2
p <- 5
x_mat <- matrix(rnorm(n_obs * p), nrow = n_obs, ncol = p)
colnames(x_mat) <- paste0("x", seq_len(p))
cure_beta <- rep(0.5, p)
b0 <- - 1
expit <- binomial()$linkinv
ncure_prob <- expit(as.numeric(b0 + x_mat %*% cure_beta))
is_cure <- 1 - rbinom(n_obs, size = 1, prob = ncure_prob)
surv_beta <- rep(0.5, p)
risk_score <- as.numeric(x_mat %*% surv_beta)
event_time <- rexp(n_obs, exp(as.numeric(x_mat %*% surv_beta)))
censor_time <- 10
event <- ifelse(event_time < censor_time & ! is_cure, 1, 0)
obs_time <- ifelse(event > 0, event_time, censor_time)

## model-fitting for the given design matrices using cox_cure.fit()
fit1 <- cox_cure.fit(x_mat, x_mat, obs_time, event, bootstrap = 30)
summary(fit1)

## coefficient estimates from both model parts
coef(fit1)

## or a particular part
coef(fit1, "surv")
coef(fit1, "cure")


## create a toy example dataset
toy_dat <- data.frame(time = obs_time, status = event)
toy_dat$group <- cut(abs(x_mat[, 1L]), breaks = c(0, 0.5, 1, 2, Inf),
                     labels = LETTERS[1:4])
toy_dat <- cbind(toy_dat, as.data.frame(x_mat[, - 1L, drop = FALSE]))

## model-fitting for the given model formula using cox_cure()
fit2 <- cox_cure(
    ~ x3 + x4 + group,
    ~ group + x3 + offset(x2),
    time = time,
    event = status,
    data = toy_dat,
    subset = group != "D",
    bootstrap = 30
)
summary(fit2)

## get BIC's
BIC(fit1)
BIC(fit2)
BIC(fit1, fit2)

### 2. Cox cure rate model for uncertain event status
set.seed(123)
n_obs <- 200
p <- 5
x_mat <- matrix(rnorm(n_obs * p), nrow = n_obs, ncol = p)
colnames(x_mat) <- paste0("x", seq_len(p))

## simulate sample data
sim_dat <- simData4cure(nSubject = 200, max_censor = 10, lambda_censor = 0.1,
                        survMat = x_mat, cureMat = x_mat, b0 = 1)
table(sim_dat$case)
table(sim_dat$obs_event, useNA = "ifany")

## use formula
fit3 <- cox_cure(
    ~ x1 + x2 + x3,
    ~ z1 + z2 + z3,
    time = obs_time,
    event = obs_event,
    data = sim_dat
)
summary(fit3)

## use design matrix
fit4 <- cox_cure.fit(x_mat, x_mat,
                     time = sim_dat$obs_time,
                     event = sim_dat$obs_event)
summary(fit4)

## get BIC's
BIC(fit3, fit4)

Run the code above in your browser using DataLab