AROC.bsp: Semiparametric Bayesian inference of the covariate-adjusted ROC curve (AROC).

Description

Estimates the covariate-adjusted ROC curve (AROC) using the semiparametric Bayesian normal linear regression model discussed in Inacio de Carvalho and Rodriguez-Alvarez (2018).

Usage

AROC.bsp(formula.healthy, group, tag.healthy, data, scale = TRUE, 
  p = seq(0, 1, l = 101), paauc = paauccontrol(),
  compute.lpml = FALSE, compute.WAIC = FALSE, 
  m0, S0, nu, Psi, a = 2, b = 0.5, nsim = 5000, nburn = 1500)

Arguments

formula.healthy

A formula object specifying the Bayesian normal linear regression model for the estimation of the conditional distribution function for the diagnostic test outcome in the healthy population (see Details).

group

A character string with the name of the variable that distinguishes healthy from diseased individuals.

tag.healthy

The value codifying the healthy individuals in the variable group.

data

Data frame representing the data and containing all needed variables.

scale

A logical value. If TRUE the test outcomes are scaled, i.e., are divided by the standard deviation. The default is TRUE.

Set of false positive fractions (FPF) at which to estimate the covariate-adjusted ROC curve.

compute.lpml

A logical value. If TRUE, the log pseudo marginal likelihood (LPML, Geisser and Eddy, 1979) and the conditional predictive ordinates (CPO) are computed.

paauc

A list of control values to replace the default values returned by the function paauccontrol. This argument is used to indicate whether the partial area under the covariate-adjusted ROC curve (pAAUC) should be computed and at which FPF.

compute.WAIC

A logical value. If TRUE, the widely applicable information criterion (WAIC, Gelman et al., 2014; Watanabe, 2010) is computed.

A numeric vector. Hyperparameter; mean vector of the (multivariate) normal distribution for the mean of the regression coefficients. If missing, it is set to a vector of zeros of length p+1 (see Details).

A numeric matrix. Hyperprior. If missing, it is set to a diagonal matrix of dimension (p+1)x(p+1) with 100 in the diagonal (see Details).

A numeric value. Hyperparameter; degrees of freedom of the Wishart distribution for the precision matrix of the regression coefficients. If missing, it is set to p + 3 (see Details)

Psi

A numeric matrix. Hyperparameter; scale matrix of the Wishart distribution for the precision matrix of the regression coefficients. If missing, it is set to an identity matrix of dimension (p+1)x(p+1) (see Details).

A numeric value. Hyperparameter; shape parameter of the gamma distribution for the precision (inverse variance). The default is 2 (scaled data) (see Details).

A numeric value. Hyperparameter; rate parameter of the gamma distribution for the precision (inverse variance). The default is 0.5 (scaled data) (see Details).

nsim

A numeric value. Total number of Gibbs sampler iterates (including the burn-in). The default is 5000.

nburn

A numeric value. Number of burn-in iterations. The default is 1500.

Value

As a result, the function provides a list with the following components:

call

The matched call.

Set of false positive fractions (FPF) at which the pooled ROC curve has been estimated.

ROC

Estimated covariate-adjusted ROC curve (AROC) (posterior mean), and 95% pointwise posterior credible band.

AUC

Estimated area under the covariate-adjusted ROC curve (AAUC) (posterior mean), and 95% pointwise posterior credible band.

pAUC

If required in the call to the function, estimated partial area under the covariate-adjusted ROC curve (pAAUC) (posterior mean), and 95% pointwise posterior credible band.

lpml

If required, list with two components: the log pseudo marginal likelihood (LPML) and the conditional predictive ordinates (CPO).

WAIC

If required, widely applicable information criterion (WAIC).

fit

Results of the fitting process. It is a list with the following components: (1) mm: information needed to construct the model matrix associated with the B-splines dependent Dirichlet process mixture model. (2) beta: matrix of dimension Nxp+1 with the sampled regression coefficients. Here, N is the number of Gibbs sampler iterates after burn-in, and p+1 the number of columns of the design matrix (see also Details). (3) sd: vector of length N with the sampled variances (see also Details).

data_model

List with the data used in the fit: observed diagnostic test outcome and B-spline design matrices, separately for the healthy and diseased groups.

Details

Estimates the covariate-adjusted ROC curve (AROC) defined as

$$AROC\left(t\right) = Pr\{1 - F_{\bar{D}}(Y_D | \mathbf{X}_{D}) \leq t\},$$

where $F_{\bar{D}}(\cdot|\mathbf{X}_{\bar{D}})$ denotes the conditional distribution function for $Y_{\bar{D}}$ conditional on the vector of covariates $X_{\bar{D}}$. In particular, the method implemented in this function combines a Bayesian normal linear regression model to estimate $F_{\bar{D}}(\cdot|\mathbf{X}_{\bar{D}})$ and the Bayesian bootstrap (Rubin, 1981) to estimate the outside probability. More precisely, and letting $\{(\mathbf{x}_{\bar{D}i},y_{\bar{D}i})\}_{i=1}^{n_{\bar{D}}}$ be a random sample from the nondiseased population

$$F_{\bar{D}}(y_{\bar{D}i}|\mathbf{X}_{\bar{D}}=\mathbf{x}_{\bar{D}i}) = \Phi(y_{\bar{D}i}\mid \mathbf{x}_{\bar{D}i}^{*T}\mathbf{\beta}^{*},\sigma^2),$$

where $\mathbf{x}_{\bar{D}i}^{*T} = (1, \mathbf{x}_{\bar{D}i}^{T})$, $\mathbf{\beta}^{*}\sim N_{p+1} (\mathbf{m},\mathbf{S})$ and $\sigma^{-2}\sim\Gamma(a,b)$. It is assumed that $\mathbf{m} \sim N_{p+1}(\mathbf{m}_0,\mathbf{S}_0)$ and $\mathbf{S}^{-1}\sim W(\nu,(\nu\Psi)^{-1})$, where $p+1$ denotes the number of columns of the design matrix $\mathbf{X}_{\bar{D}}^{*}$. Here $W(\nu,(\nu\Psi)^{-1})$ denotes a Wishart distribution with $\nu$ degrees of freedom and expectation $\Psi^{-1}$. For a detailed description, we refer to Inacio de Carvalho and Rodriguez-Alvarez (2018).

References

Inacio de Carvalho, V., and Rodriguez-Alvarez, M. X. (2018). Bayesian nonparametric inference for the covariate-adjusted ROC curve. arXiv preprint arXiv:1806.00473.

Rubin, D. B. (1981). The Bayesian bootstrap. The Annals of Statistics, 9(1), 130-134.

Examples

Run this code

# NOT RUN {
library(AROC)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]

# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
# }
# NOT RUN {
m1 <- AROC.bsp(formula.healthy = l_marker1 ~ age,
group = "status", tag.healthy = 0, data = newpsa, scale = TRUE,
p = seq(0,1,l=101), compute.lpml = TRUE, compute.WAIC = TRUE,
a = 2, b = 0.5, nsim = 5000, nburn = 1500)

summary(m1)

plot(m1)
# }

Run the code above in your browser using DataLab