cox_cure_net: Regularied Cox Cure Rate Model with Elastic-Net Penalty

Description

For right-censored data, the function cox_cure_net() trains a regularized Cox cure rate model with elastic-net penalty following Masud et al. (2018), and Zou and Hastie (2005). For right-censored data with missing/uncertain event/censoring indicators, it fits the Cox cure rate model proposed by Wang et al. (2023).

Usage

cox_cure_net(
  surv_formula,
  cure_formula,
  time,
  event,
  data,
  subset,
  contrasts = NULL,
  cv_nfolds = 0L,
  surv_net = cox_cure_net.penalty(),
  cure_net = cox_cure_net.penalty(),
  surv_mstep = cox_cure.mstep(),
  cure_mstep = cox_cure.mstep(),
  control = cox_cure.control(),
  ...
)
cox_cure_net.fit(
  surv_x,
  cure_x,
  time,
  event,
  cure_intercept = TRUE,
  cv_nfolds = 0L,
  surv_net = cox_cure_net.penalty(),
  cure_net = cox_cure_net.penalty(),
  surv_mstep = cox_cure.mstep(),
  cure_mstep = cox_cure.mstep(),
  control = cox_cure.control(),
  ...
)
cox_cure_net.penalty(
  nlambda = 10,
  lambda_min_ratio = 0.001,
  alpha = 1,
  lambda = NULL,
  penalty_factor = NULL,
  varying_active = TRUE,
  ...
)

Value

A cox_cure or cox_cure_net object that contains the fitted ordinary or regularized Cox cure rate model if none of the event indicators is NA. For right-censored data with uncertain/missing event indicators, a cox_cure_uncer or cox_cure_net_uncer

is returned.

Arguments

surv_formula: A formula object starting with ~ for the model formula in survival model part. For Cox model, no intercept term is included even if an intercept is specified or implied in the model formula. A model formula with an intercept term only is not allowed.
cure_formula: A formula object starting with ~ for the model formula in incidence model part. For logistic model, an intercept term is included by default and can be excluded by adding + 0 or - 1 to the model formula.
time: A numeric vector for the observed survival times.
event: A numeric vector for the event indicators, where NA's are allowed and represent uncertain event indicators.
data: An optional data frame, list, or environment that contains the model covariates and response variables (time and event) If they are not found in data, the variables are taken from the environment of the specified formula, usually the environment from which this function is called.
subset: An optional logical vector specifying a subset of observations to be used in the fitting process.
contrasts: An optional list, whose entries are values (numeric matrices or character strings naming functions) to be used as replacement values for the contrasts replacement function and whose names are the names of columns of data containing factors. See contrasts.arg of model.matrix.default for details.
cv_nfolds: A nonnegative integer representing the number of folds in cross-validation.
surv_net, cure_net: Optional lists or cox_cure_net.penalty objects specifying the elastic penalties for survival model part and cure rate model part, respectively.
surv_mstep, cure_mstep: A named list passed to cox_cure.mstep() specifying the control parameters for the corresponding M-steps.
control: A cox_cure.control object that contains the control parameters.
...: Other arguments passed to the control functions for backward compatibility.
surv_x: A numeric matrix for the design matrix of the survival model component.
cure_x: A numeric matrix for the design matrix of the cure rate model component. The design matrix should exclude an intercept term unless we want to fit a model only including the intercept term. In that case, we need further set cure_intercept = FALSE to not standardize the intercept term.
cure_intercept: A logical value specifying whether to add an intercept term to the cure rate model component. If TRUE by default, an intercept term is included.
nlambda: A positive integer representing the number of lambda parameters.
lambda_min_ratio: A positive number specifying the ratio between the smallest lambda in the solution path to the large enough lambda that would result in all zero estimates with the lasso penalty.
alpha: A positive number between 0 and 1 representing the mixing parameter in the elastic net penalty.
lambda: A numeric vector that consists of nonnegative values representing the sequence of the lambda parameters.
penalty_factor: A numeric vector that consists of nonnegative penalty factors (or adaptive weights) for the L1-norm of the coefficient estimates.
varying_active: A logical value. If TRUE (by default), the underlying coordinate-descent algorithm will be iterated over varying active sets, which can usually improve the computational efficiency when the number of predictors is large. Otherwise, an ordinary coordinate-descent will be performed.

Details

The model estimation procedure follows expectation maximization (EM) algorithm. Variable selection procedure through regularization by elastic net penalty is developed based on cyclic coordinate descent and majorization-minimization (MM) algorithm.

References

Kuk, A. Y. C., & Chen, C. (1992). A mixture model combining logistic regression with proportional hazards regression. Biometrika, 79(3), 531--541.

Peng, Y. (2003). Estimating baseline distribution in proportional hazards cure models. Computational Statistics & Data Analysis, 42(1-2), 187--201.

Sy, J. P., & Taylor, J. M. (2000). Estimation in a Cox proportional hazards cure model. Biometrics, 56(1), 227--236.

Masud, A., Tu, W., & Yu, Z. (2018). Variable selection for mixture and promotion time cure rate models. Statistical methods in medical research, 27(7), 2185--2199.

Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301--320.

Wang, W., Luo, C., Aseltine, R. H., Wang, F., Yan, J., & Chen, K. (2023). Survival Modeling of Suicide Risk with Rare and Uncertain Diagnoses. Statistics in Biosciences, 17(1), 1--27.

Examples

Run this code

library(intsurv)

### 1. Regularized Cox cure rate model with elastic-net penalty
## simulate a toy right-censored data with a cure fraction
set.seed(123)
n_obs <- 100
p <- 10
x_mat <- matrix(rnorm(n_obs * p), nrow = n_obs, ncol = p)
colnames(x_mat) <- paste0("x", seq_len(p))
surv_beta <- c(rep(0, p - 5), rep(1, 5))
cure_beta <- c(rep(1, 2), rep(0, p - 2))
dat <- simData4cure(nSubject = n_obs, lambda_censor = 0.01,
                    max_censor = 10, survMat = x_mat,
                    survCoef = surv_beta, cureCoef = cure_beta,
                    b0 = 0.5, p1 = 1, p2 = 1, p3 = 1)

## model-fitting for the given design matrices
fit1 <- cox_cure_net.fit(x_mat, x_mat, dat$obs_time, dat$obs_event,
                         surv_net = list(nlambda = 10, alpha = 1),
                         cure_net = list(nlambda = 10, alpha = 0.8))

## model-fitting for the given model formula
fm <- paste(paste0("x", seq_len(p)), collapse = " + ")
surv_fm <- as.formula(sprintf("~ %s", fm))
cure_fm <- surv_fm
fit2 <- cox_cure_net(surv_fm, cure_fm, data = dat,
                     time = obs_time, event = obs_event)

## summary of BIC's
BIC(fit1)
BIC(fit2)
BIC(fit1)[which.min(BIC(fit1)[, "BIC"]), ]
BIC(fit2)[which.min(BIC(fit2)[, "BIC"]), ]

## list of coefficient estimates based on BIC
coef(fit1)
coef(fit2)


### 2. regularized Cox cure model for uncertain event status
## simulate a toy data
set.seed(123)
n_obs <- 100
p <- 5
x_mat <- matrix(rnorm(n_obs * p), nrow = n_obs, ncol = p)
colnames(x_mat) <- paste0("x", seq_len(p))
surv_beta <- c(rep(0, p - 3), rep(1, 3))
cure_beta <- c(rep(1, 2), rep(0, p - 2))
dat <- simData4cure(nSubject = n_obs, lambda_censor = 0.01,
                    max_censor = 10, survMat = x_mat,
                    survCoef = surv_beta, cureCoef = cure_beta,
                    b0 = 0.5, p1 = 0.95, p2 = 0.95, p3 = 0.95)

## model-fitting from given design matrices
fit1 <- cox_cure_net.fit(
    x_mat, x_mat,
    dat$obs_time, dat$obs_event,
    surv_net = list(nlambda = 5, alpha = 0.5)
)

## model-fitting from given model formula
fm <- paste(paste0("x", seq_len(p)), collapse = " + ")
surv_fm <- as.formula(sprintf("~ %s", fm))
cure_fm <- surv_fm
fit2 <- cox_cure_net(
    surv_fm,
    cure_fm,
    data = dat,
    time = obs_time,
    event = obs_event,
    surv_net = list(nlambda = 5, alpha = 0.9),
    cure_net = list(nlambda = 5, alpha = 0.9)
)

## summary of BIC's
BIC(fit1)
BIC(fit2)

## list of coefficient estimates based on BIC
coef(fit1)
coef(fit2)

Run the code above in your browser using DataLab