Estimates the covariate-adjusted ROC curve (AROC) using the semiparametric Bayesian normal linear regression model discussed in Inacio de Carvalho and Rodriguez-Alvarez (2018).
AROC.bsp(formula.healthy, group, tag.healthy, data, scale = TRUE,
p = seq(0, 1, l = 101), paauc = paauccontrol(),
compute.lpml = FALSE, compute.WAIC = FALSE,
m0, S0, nu, Psi, a = 2, b = 0.5, nsim = 5000, nburn = 1500)
A formula
object specifying the Bayesian normal linear regression model for the estimation of the conditional distribution function for the diagnostic test outcome in the healthy population (see Details).
A character string with the name of the variable that distinguishes healthy from diseased individuals.
The value codifying the healthy individuals in the variable group
.
Data frame representing the data and containing all needed variables.
A logical value. If TRUE the test outcomes are scaled, i.e., are divided by the standard deviation. The default is TRUE.
Set of false positive fractions (FPF) at which to estimate the covariate-adjusted ROC curve.
A logical value. If TRUE, the log pseudo marginal likelihood (LPML, Geisser and Eddy, 1979) and the conditional predictive ordinates (CPO) are computed.
A list of control values to replace the default values returned by the function paauccontrol
. This argument is used to indicate whether the partial area under the covariate-adjusted ROC curve (pAAUC) should be computed and at which FPF.
A logical value. If TRUE, the widely applicable information criterion (WAIC, Gelman et al., 2014; Watanabe, 2010) is computed.
A numeric vector. Hyperparameter; mean vector of the (multivariate) normal distribution for the mean of the regression coefficients. If missing, it is set to a vector of zeros of length p+1
(see Details).
A numeric matrix. Hyperprior. If missing, it is set to a diagonal matrix of dimension (p+1)
x(p+1)
with 100 in the diagonal (see Details).
A numeric value. Hyperparameter; degrees of freedom of the Wishart distribution for the precision matrix of the regression coefficients. If missing, it is set to p + 3
(see Details)
A numeric matrix. Hyperparameter; scale matrix of the Wishart distribution for the precision matrix of the regression coefficients. If missing, it is set to an identity matrix of dimension (p+1)
x(p+1)
(see Details).
A numeric value. Hyperparameter; shape parameter of the gamma distribution for the precision (inverse variance). The default is 2 (scaled data) (see Details).
A numeric value. Hyperparameter; rate parameter of the gamma distribution for the precision (inverse variance). The default is 0.5 (scaled data) (see Details).
A numeric value. Total number of Gibbs sampler iterates (including the burn-in). The default is 5000.
A numeric value. Number of burn-in iterations. The default is 1500.
As a result, the function provides a list with the following components:
The matched call.
Set of false positive fractions (FPF) at which the pooled ROC curve has been estimated.
Estimated covariate-adjusted ROC curve (AROC) (posterior mean), and 95% pointwise posterior credible band.
Estimated area under the covariate-adjusted ROC curve (AAUC) (posterior mean), and 95% pointwise posterior credible band.
If required in the call to the function, estimated partial area under the covariate-adjusted ROC curve (pAAUC) (posterior mean), and 95% pointwise posterior credible band.
If required, list with two components: the log pseudo marginal likelihood (LPML) and the conditional predictive ordinates (CPO).
If required, widely applicable information criterion (WAIC).
Results of the fitting process. It is a list with the following components: (1) mm
: information needed to construct the model matrix associated with the B-splines dependent Dirichlet process mixture model. (2) beta
: matrix of dimension N
xp+1
with the sampled regression coefficients. Here, N
is the number of Gibbs sampler iterates after burn-in, and p+1
the number of columns of the design matrix (see also Details). (3) sd
: vector of length N
with the sampled variances (see also Details).
List with the data used in the fit: observed diagnostic test outcome and B-spline design matrices, separately for the healthy and diseased groups.
Estimates the covariate-adjusted ROC curve (AROC) defined as
$$AROC\left(t\right) = Pr\{1 - F_{\bar{D}}(Y_D | \mathbf{X}_{D}) \leq t\},$$
where \(F_{\bar{D}}(\cdot|\mathbf{X}_{\bar{D}})\) denotes the conditional distribution function for \(Y_{\bar{D}}\) conditional on the vector of covariates \(X_{\bar{D}}\). In particular, the method implemented in this function combines a Bayesian normal linear regression model to estimate \(F_{\bar{D}}(\cdot|\mathbf{X}_{\bar{D}})\) and the Bayesian bootstrap (Rubin, 1981) to estimate the outside probability. More precisely, and letting \(\{(\mathbf{x}_{\bar{D}i},y_{\bar{D}i})\}_{i=1}^{n_{\bar{D}}}\) be a random sample from the nondiseased population
$$F_{\bar{D}}(y_{\bar{D}i}|\mathbf{X}_{\bar{D}}=\mathbf{x}_{\bar{D}i}) = \Phi(y_{\bar{D}i}\mid \mathbf{x}_{\bar{D}i}^{*T}\mathbf{\beta}^{*},\sigma^2),$$
where \(\mathbf{x}_{\bar{D}i}^{*T} = (1, \mathbf{x}_{\bar{D}i}^{T})\), \(\mathbf{\beta}^{*}\sim N_{p+1} (\mathbf{m},\mathbf{S})\) and \(\sigma^{-2}\sim\Gamma(a,b)\). It is assumed that \(\mathbf{m} \sim N_{p+1}(\mathbf{m}_0,\mathbf{S}_0)\) and \(\mathbf{S}^{-1}\sim W(\nu,(\nu\Psi)^{-1})\), where \(p+1\) denotes the number of columns of the design matrix \(\mathbf{X}_{\bar{D}}^{*}\). Here \(W(\nu,(\nu\Psi)^{-1})\) denotes a Wishart distribution with \(\nu\) degrees of freedom and expectation \(\Psi^{-1}\). For a detailed description, we refer to Inacio de Carvalho and Rodriguez-Alvarez (2018).
Inacio de Carvalho, V., and Rodriguez-Alvarez, M. X. (2018). Bayesian nonparametric inference for the covariate-adjusted ROC curve. arXiv preprint arXiv:1806.00473.
Rubin, D. B. (1981). The Bayesian bootstrap. The Annals of Statistics, 9(1), 130-134.
AROC.bnp
, AROC.bsp
, AROC.sp
, AROC.kernel
, pooledROC.BB
or pooledROC.emp
.
# NOT RUN {
library(AROC)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
# }
# NOT RUN {
m1 <- AROC.bsp(formula.healthy = l_marker1 ~ age,
group = "status", tag.healthy = 0, data = newpsa, scale = TRUE,
p = seq(0,1,l=101), compute.lpml = TRUE, compute.WAIC = TRUE,
a = 2, b = 0.5, nsim = 5000, nburn = 1500)
summary(m1)
plot(m1)
# }
Run the code above in your browser using DataLab