cdROC: Cumulative/dynamic ROC curve estimate

Description

This function estimates a time-dependent ROC curve following the cumulative/dynamic approach and returns a 'cdroc' object. This object can be printed or plotted. To deal with the right censored problem different statistics can be considered: those ones proposed by Martinez-Camblor et al. (2016) based on the hazard Cox regression model (semiparametric) or the Kaplan-Meier estimator (non-parametric); and the one included in Li et al. (2016) based on the kernel-weighted Kaplan-Meier method. See References below.

Usage

cdROC(stime, status, marker, predict.time, ...)
# S3 method for default
cdROC(stime, status, marker, predict.time, method=c('Cox', 'KM', 'wKM'),
      kernel=c('normal', 'Epanechnikov', 'other'), h=1,
      kernel.fun = function(x,xi,h){u <- (x-xi)/h; 1/(2*h)*(abs(u)

Arguments

stime

vector of observed times.

status

vector of status (takes the value 0 if the subject is censored and 1 otherwise).

marker

vector of (bio)marker values.

predict.time

considered time point (scalar).

method

procedure used to estimate the probability. One of "Cox" (method based on Cox regression), "KM" (method based on Kaplan-Meier estimator) or "wKM" (method based on kernel-weighted Kaplan-Meier estimator).

kernel

procedure used to calculate the kernel function. One of "normal", "Epanechnikov" or "other". Only considered if method='wKM'.

bandwith used to calculate the kernel function. Only considered if method='wKM'.

kernel.fun

if method='wKM' and kernel='other', function used to calculate the kernel function. It has three input parameters: x=vector, xi=value around which the kernel weight should be computed, h=bandwidth. Default: Uniform kernel.

if TRUE, a confidence interval for the area under the curve is computed.

boot.n

number of bootstrap replicates considered to build the confidence interval. Default: 100.

conf.level

the width of the confidence band as a number in (0,1). Default: 0.95, resulting in a 95% confidence band.

seed

seed considered to generate bootstrap replicates (for reproducibility).

…

additional arguments for cdROC. Ignored.

Value

A list of class 'cdroc' with the following content:

vector of sensitivities (true positive rates).

vector of specificities (true negative rates).

cutPoints

vector of thresholds considered for the (bio)marker. It coincides with the marker vector adding \(min(\)marker\()-1\) and \(max(\)marker\()+1\).

auc

area under the curve estimate by trapezoidal rule.

if TRUE, a confidence interval for the area under the curve has been computed.

boot.n

number of bootstrap replicates considered to build the confidence interval. Default: 100.

conf.level

the width of the confidence band as a number in (0,1). Default: 0.95, resulting in a 95% confidence band.

seed

seed considered to generate bootstrap replicates (for reproducibility).

meanAuc

bootstrap area under the curve estimate (mean along bootstrap replicates).

ciAuc

bootstrap confidence interval for the area under the curve.

aucs

vector of bootstrap area under the curve estimates.

stime

vector of observed times.

status

vector of status (takes the value 0 if the subject is censored and 1 otherwise).

marker

vector of (bio)marker values.

predict.time

considered time point (scalar).

method

procedure used in order to estimate the probability.

kernel

procedure used to calculate the kernel function. Only considered if method='wKM'.

bandwith used to calculate the kernel function. Only considered if method='wKM'.

Details

Assuming that larger values of the marker are associated with higher probabilities of occurrence of the event, the cumulative sensitivity and the dynamic specificity are defined by:

\(Se^C(x,t) = P(\)marker \(> x | \)stime \(\le t)\) and \(Sp^D(x,t) = P(\)marker \(\le x |\) stime \(> t)\).

The resulting ROC curve is known as the cumulative/dynamic ROC curve, \(R_t^{C/D}\), where \(t = \) predict.time.

Data censored before \(t\) is the major handicap with regard to the estimation of the time-dependent ROC curve. In order to estimate the probability of surviving beyond \(t\) for the \(i\)-th subject, \(\hat{P}_i\), three different methods are considered:

A semiparametric one, using a proportional hazard Cox regression model:

The hazard function is estimated by \(\lambda(t) = \lambda_0(t) \cdot exp(\beta \cdot X)\) where \(X\) denotes the marker.

The probability is estimated by \(\hat{P}_i = \frac{\hat{S}(t | X = x_i)}{\hat{S}(z_i | X = x_i)}\) where \(z_i\) stands for the observed time of the \(i\)-th subject and \(\hat{S}\) is the survival function estimated from the Cox regression model.
A non-parametric one, using the Kaplan-Meier estimator directly:

The probability is estimated by \(\hat{P}_i = \frac{\hat{S}(t)}{\hat{S}(z_i)}\) where \(z_i\) stands for the observed time of the \(i\)-th subject and \(\hat{S}\) is the survival function estimated by the Kaplan-Meier method referred to those subjects satisfying \(X \le x_i\).
A non-parametric one, using the kernel-weighted Kaplan-Meier estimator:

The survival function is estimated by \(\hat{S}(t | X = x_i) = \prod_{s \leq t} \left[ 1- \frac{\sum_{j=1}^n K_h(x_j,x_i) I(z_j = s) status_j}{\sum_{j=1}^n K_h(x_j,x_i) I(z_j = s)} \right]\) where \(z_j\) stands for the observed time of the \(j\)-th subject, \(I\) is the indicator function and \(status_j\) takes the value 0 if the \(j\)-th subject is censored and 1 otherwise.

Two different methods can be considered in order to define the kernel function, \(K_h(x_j,x_i)\):
- kernel='normal':
  \(K_h(x_j,x_i) = \frac{1}{h \sqrt{2 \pi}} exp\{ - \frac{(x_j - x_i)^2}{2 h^2} \}\)
- kernel='Epanechnikov':
  \(K_h(x_j,x_i) = \frac{3}{4h} \left( 1 - \frac{x_j - x_i}{h} \right) I(|x_j - x_i| \le h)\)
where \(h\) is the bandwidth considered for kernel weights.

If the user decide to use another kernel function, kernel='other', it should be defined by the kernel.fun input parameter, which has three parameters following this order: x is a vector, xi is the value around which the kernel weight should be computed and h is the bandwidth.

The probability is estimated by \(\hat{P}_i = \frac{\hat{S}(t | X = x_i)}{\hat{S}(z_i | X = x_i)}\) where \(z_i\) stands for the observed time of the \(i\)-th subject and \(\hat{S}\) is the survival function estimated by the kernel-weighted Kaplan-Meier method considered above.

References

Martinez-Camblor P., F-Bayon G., Perez-Fernandez S., 2016, Cumulative/dynamic ROC curve estimation, Journal of Statistical Computation and Simulation, 86(17), 3582-3594.

Li L., Greene T., Hu B., 2016, A simple method to estimate the time-dependent receiver operating characteristic curve and the area under the curve with right censored data, Statistical Methods in Medical Research, DOI: 10.1177/0962280216680239.

Examples

Run this code

# NOT RUN {
# Basic example. Data
set.seed(123)
stime <- rchisq(50,3)
status <- sample(c(rep(1,40), rep(0,10)))
marker <- max(stime) - stime + rnorm(50,0,2)

# Cumulative/dynamic ROC curve estimate at time 2.8 (Cox method is used) with 0.95 confidence
# interval for the area under the curve
cdROC(stime, status, marker, 2.8, ci=TRUE)

# Cumulative/dynamic ROC curve estimate at time 3.1 (Kaplan-Meier method is used)
cdROC(stime, status, marker, 3.1, method="KM")

# Cumulative/dynamic ROC curve estimate at time 3 (kernel-weighted Kaplan-Meier method with
# gaussian kernel and bandwidth 1 is used)
cdROC(stime, status, marker, 3, method="wKM")

# Cumulative/dynamic ROC curve estimate at time 3 (kernel-weighted Kaplan-Meier method with
# biweight kernel and bandwidth equals to 2 is used)
cdROC(stime, status, marker, 3, method="wKM", kernel="other", h=2,
      kernel.fun = function(x,xi,h){u <- (x-xi)/h; 15/(16*h)*(1-u^2)^2*(abs(u)<=1)})
# }

Run the code above in your browser using DataLab