ROCsurface: Receiver operating characteristics surface for a continuous diagnostic test

Description

ROCs.tcf is used to obtain bias-corrected estimates of the true class fractions (TCFs) for evaluating the accuracy of a continuous diagnostic test for a given cut point $(c_1, c_2)$, with $c_1 < c_2$.

ROCs provides bias-corrected estimates of the ROC surfaces of the continuous diagnostic test by using TCF.

Usage

ROCs.tcf(method = "full", T, Dvec, V = NULL, rhoEst = NULL, piEst = NULL, cps)
ROCs(
  method = "full",
  T,
  Dvec,
  V,
  A = NULL,
  rhoEst = NULL,
  piEst = NULL,
  ncp = 100,
  plot = TRUE,
  ellipsoid = FALSE,
  cpst = NULL,
  level = 0.95,
  sur.col = c("gray40", "green"),
  BOOT = FALSE,
  nR = 250,
  parallel = FALSE,
  ncpus = ifelse(parallel, detectCores()/2, NULL),
  ...
)

Arguments

method

a estimation method to be used for estimating the true class fractions in presence of verification bias. See 'Details'.

a numeric vector containing the diagnostic test values. NA values are not allowed.

Dvec

a n * 3 binary matrix with the three columns, corresponding to three classes of the disease status. In row i, 1 in column j indicates that the i-th subject belongs to class j, with j = 1, 2, 3. A row of NA values indicates a non-verified subject.

a binary vector containing the verification status (1 verified, 0 not verified).

rhoEst

a result of a call to rhoMLogit of rhoKNN to fit the disease model.

piEst

a result of a call to psglm to fit the verification model.

cps

a cut point $(c_1, c_2)$, with $c_1 < c_2$, which used to estimate TCFs. If m estimates of TCFs are required, cps must be matrix with m rows and 2 columns.

a vector/matrix of dimension n * q containing the values of the covariate(s). If the method is "knn" and ellipsoid = TRUE, A is needed to compute the asymptotic covariance of TCFs at a fixed cut point. The default NULL is suitable for the remaining methods.

ncp

the dimension of cut point grid. It is used to determine the cut points (see 'Details'). Default 100.

plot

if TRUE(the default), a 3D plot of ROC surface is produced.

ellipsoid

a logical value. If TRUE, adds an ellipsoidal confidence region for TCFs at a specified cut point to current plot of ROC surface.

cpst

a specified cut point, which used to construct the ellipsoid confidence region. If m ellipsoid confidence regions are required, cpst must be matrix with m rows and 2 columns. Default NULL.

level

an confidence level to be used for constructing the ellipsoid confidence region; default 0.95.

sur.col

color to be used for plotting ROC surface and ellipsoid.

BOOT

a logical value. Default = FALSE. If set to TRUE, a bootstrap resampling is employed to estimate the asymptotic variance-covariance matrix of TCFs at the cut point cpst. See more details in asyCovTCF.

the number of bootstrap replicates, which is used for FULL estimator, or option BOOT = TRUE. Usually this will be a single positive integer. Default 250.

parallel

a logical value. If TRUE, a parallel computing is employed to the bootstrap resampling process.

ncpus

number of processes to be used in parallel computing. Default is half of of available cores.

...

optional arguments to be passed to plot3d, surface3d.

Value

ROCs returns a list, with the following components:

vals

the estimates of TCFs at all cut points.

cpoint

the cut points are used to construct the ROC surface.

ncp

dimension of the cut point grid.

cpst

the cut points are used to construct the ellipsoidal confidence regions.

tcf

the estimates of TCFs at the cut points cpst.

message

an integer code or vector. 1 indicates the ellipsoidal confidence region is available.

ROCs.tcf returns a vector having estimates of TCFs at a cut point when cps is a vector with two elements, or a list of estimates of TCFs at m cut points when cps is a m*2 matrix. In addition, some attributes called theta, beta, cp and name are given. Here, theta is a probability vector, with 3 element, corresponding to the disease prevalence rates of three classes. beta is also a probability vector having 4 components, which are used to compute TCFs, see To Duc el al. (2016a, 2016b) for more details. cp is the specified cut point that is used to estimate TCFs. name indicates the method used to estimate TCFs. These attributes are required to compute the asymptotic variance-covariance matrix of TCFs at the given cut point.

Details

In a three-class diagnostic problem, quantities used to evaluate the accuracy of a diagnostic test are the true class fractions (TCFs). For a given pair of cut points $(c_1, c_2)$ such that $c_1 < c_2$, subjects are classified into class 1 ($D_1$) if $T < c_1$; class 2 ($D_2$) if $c_1 \le T < c_2$; class 3 ($D_3$) otherwise. The true class fractions of the test $T$ at $(c_1, c_2)$ are defined as $$TCF_1(c_1) = P(T < c_1| D_1 = 1) = 1 - P(T \ge c_1| D_1 = 1),$$ $$TCF_2(c_1, c_2) = P(c_1 \le T < c_2| D_2 = 1) = P(T \ge c_1| D_2 = 1) - P(T \ge c_2| D_2 = 1),$$ $$TCF_3(c_2) = P(T > c_2| D_3 = 1) = P(T \ge c_2| D_3 = 1). $$

The receiver operating characteristic (ROC) surface is the plot of $TCF_1$, $TCF_2$ and $TCF_3$ by varying the cut point $(c_1, c_2)$ in the domain of the diagnostic test. The cut points $(c_1, c_2)$ are produced by designing a cut point grid with ncp dimension. In this grid, the points satisfying $c_1 < c_2$ are selected as the cut points. The number of the cut points are obtained as $ncp(ncp - 1)/2$, for example, the default is 4950.

These functions implement the bias-corrected estimators in To Duc et al (2016a, 2016b) for estimating TCF of a three-class continuous diagnostic test in presence of verification bias. The estimators work under MAR assumption. Five methods are provided, namely:

Full imputation (FI): uses the fitted values of the disease model to replace the true disease status (both of missing and non-missing values).
Mean score imputation (MSI): replaces only the missing values by the fitted values of the disease model.
Inverse probability weighted (IPW): weights each observation in the verification sample by the inverse of the sampling fraction (i.e. the probability that the subject was selected for verification).
Semiparametric efficient (SPE): replaces the true disease status by the double robust estimates.
K nearest-neighbor (KNN): uses K nearest-neighbor imputation to obtain the missing values of the true disease status.

The argument method must be selected from the collection of the bias-corrected methods, i.e., "full", "fi", "msi", "ipw", "spe" and "knn".

The ellipsoidal confidence region of TCFs at a given cut point can be constructed by using a normal approximation and plotted in the ROC surface space. The confidence level (default) is 0.95.

Note that, before using the functions ROCs and ROCs.tcf, the use of preDATA might be needed to check the monotone ordering disease classes and to create the matrix format for disease status.

References

To Duc, K., Chiogna, M. and Adimari, G. (2016a) Bias-corrected methods for estimating the receiver operating characteristic surface of continuous diagnostic tests. Electronic Journal of Statistics, 10, 3063-3113.

To Duc, K., Chiogna, M. and Adimari, G. (2018) Nonparametric estimation of ROC surfaces in presence of verification bias. REVSTAT Statistical Journal. Accepted.

Examples

Run this code

# NOT RUN {
data(EOC)
head(EOC)

# }
# NOT RUN {
# FULL data estimator
Dfull <- preDATA(EOC$D.full, EOC$CA125)
Dvec.full <- Dfull$Dvec
ROCs("full", T = EOC$CA125, Dvec = Dvec.full, , ncp = 30, ellipsoid = TRUE,
     cpst = c(-0.56, 2.31))
# }
# NOT RUN {
# Preparing the missing disease status
Dna <- preDATA(EOC$D, EOC$CA125)
Dvec.na <- Dna$Dvec
Dfact.na <- Dna$D

# FI estimator
rho.out <- rhoMLogit(Dfact.na ~ CA125 + CA153 + Age, data = EOC, test = TRUE)
ROCs("fi", T = EOC$CA125, Dvec = Dvec.na, V = EOC$V, rhoEst = rho.out, ncp = 30)

# }
# NOT RUN {
# Plot ROC surface and add ellipsoid confidence region
ROCs("fi", T = EOC$CA125, Dvec = Dvec.na, V = EOC$V, rhoEst = rho.out, ncp = 30,
     ellipsoid = TRUE, cpst = c(-0.56, 2.31))

# MSI estimator
ROCs("msi", T = EOC$CA125, Dvec = Dvec.na, V = EOC$V, rhoEst = rho.out, ncp = 30,
     ellipsoid = TRUE, cpst = c(-0.56, 2.31))

# IPW estimator
pi.out <- psglm(V ~ CA125 + CA153 + Age, data = EOC, test = TRUE)
ROCs("ipw", T = EOC$CA125, Dvec = Dvec.na, V = EOC$V, piEst = pi.out, ncp = 30,
     ellipsoid = TRUE, cpst = c(-0.56, 2.31))

# SPE estimator
ROCs("spe", T = EOC$CA125, Dvec = Dvec.na, V = EOC$V, rhoEst = rho.out, ncp = 30,
     piEst = pi.out, ellipsoid = TRUE, cpst = c(-0.56, 2.31))

# 1NN estimator
XX <- cbind(EOC$CA125, EOC$CA153, EOC$Age)
K.opt <- CVknn(X = XX, Dvec = Dvec.na, V = EOC$V, type = "mahala", plot = TRUE)
rho.1nn <- rhoKNN(X = XX, Dvec = Dvec.na, V = EOC$V, K = K.opt, type = "mahala")
ROCs("knn", T = EOC$CA125, Dvec = Dvec.na, V = EOC$V, rhoEst = rho.1nn, ncp = 30,
     ellipsoid = TRUE, cpst = c(-0.56, 2.31))

## Compute TCFs at three cut points
cutps <- rbind(c(0,0.5), c(0,1), c(0.5,1))
ROCs.tcf("spe", T = EOC$CA125, Dvec = Dvec.na, V = EOC$V, rhoEst = rho.out, ncp = 30,
         piEst = pi.out, cps = cutps)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab