Estimates the covariate-adjusted ROC curve (AROC) using the nonparametric Bayesian approach proposed by Inacio de Carvalho and Rodriguez-Alvarez (2018).
AROC.bnp(formula.healthy, group, tag.healthy, data, scale = TRUE,
p = seq(0, 1, l = 101), paauc = paauccontrol(),
compute.lpml = FALSE, compute.WAIC = FALSE,
m0, S0, nu, Psi, alpha = 1, a = 2, b = 0.5, L = 10, nsim = 10000, nburn = 2000)
A formula
object specifying the B-splines dependent Dirichlet process mixture model for the estimation of the conditional distribution function for the diagnostic test outcome in the healthy population (see Note).
A character string with the name of the variable that distinguishes healthy from diseased individuals.
The value codifying the healthy individuals in the variable group
.
Data frame representing the data and containing all needed variables.
A logical value. If TRUE the test outcomes are scaled, i.e., are divided by the standard deviation. The default is TRUE.
Set of false positive fractions (FPF) at which to estimate the covariate-adjusted ROC curve.
A list of control values to replace the default values returned by the function paauccontrol
. This argument is used to indicate whether the partial area under the covariate-adjusted ROC curve (pAAUC) should be computed and at which FPF.
A logical value. If TRUE, the log pseudo marginal likelihood (LPML, Geisser and Eddy, 1979) and the conditional predictive ordinates (CPO) are computed.
A logical value. If TRUE, the widely applicable information criterion (WAIC, Gelman et al., 2014; Watanabe, 2010) is computed.
A numeric vector. Hyperparameter; mean vector of the (multivariate) normal prior distribution for the mean of the normal component of the centering distribution. If missing, it is set to a vector of zeros of length Q
(see Details).
A numeric matrix. Hyperparameter; covariance matrix of the (multivariate) normal prior distribution for the mean of the normal component of the centering distribution. If missing, it is set to a diagonal matrix of dimension Q
xQ
with 100 in the diagonal (see Details).
A numeric value. Hyperparameter; degrees of freedom of the Wishart prior distribution for the precision matrix of the the normal component of the centering distribution. If missing, it is set to Q + 2
(see Details)
A numeric matrix. Hyperparameter; scale matrix of the Wishart distribution for the precision matrix of the the normal component of the centering distribution. If missing, it is set to an identity matrix of dimension Q
xQ
(see Details).
A numeric value. Precision parameter of the Dirichlet Process. The default is 1 (see Details).
A numeric value. Hyperparameter; shape parameter of the gamma prior distribution for the precision (inverse variance). The default is 2 (scaled data) (see Details).
A numeric value. Hyperparameter; rate parameter of the gamma prior distribution for the precision (inverse variance). The default is 0.5 (scaled data) (see Details).
A numeric value. Maximum number of mixture components for the B-splines dependent Dirichlet process mixture model. The default is 10 (see Details)
A numeric value. Total number of Gibbs sampler iterates (including the burn-in). The default is 10000.
A numeric value. Number of burn-in iterations. The default is 2000.
As a result, the function provides a list with the following components:
The matched call.
Set of false positive fractions (FPF) at which the pooled ROC curve has been estimated.
Estimated covariate-adjusted ROC curve (AROC) (posterior mean), and 95% pointwise posterior credible band.
Estimated area under the covariate-adjusted ROC curve (AAUC) (posterior mean), and 95% pointwise posterior credible band.
If required, estimated partial area under the covariate-adjusted ROC curve (pAAUC) (posterior mean), and 95% pointwise posterior credible band.
If required, list with two components: the log pseudo marginal likelihood (LPML) and the conditional predictive ordinates (CPO).
If required, widely applicable information criterion (WAIC).
Results of the fitting process. It is a list with the following components: (1) mm
: information needed to construct the model matrix associated with the B-splines dependent Dirichlet process mixture model. (2) beta
: array of dimension N
xL
xQ
with the sampled regression coefficients. Here, N
is the number of Gibbs sampler iterates after burn-in, L
is the maximum number of mixture components, and Q
is the dimension of vector \(\mathbf{Z}_{\bar{D}}\) (see also Details). (3) sd
: matrix of dimension N
xL
with the sampled variances. Here, N
is the number of Gibbs sampler iterates after burn-in, and L
is the maximum number of mixture components (see also Details). (4) probs
: matrix of dimension N
xL
with the sampled components' weights. Here, N
is the number of Gibbs sampler iterates after burn-in and L
is the maximum number of mixture components (see also Details).
List with the data used in the fit: observed diagnostic test outcome and B-spline design matrices, separately for the healthy and diseased groups.
Estimates the covariate-adjusted ROC curve (AROC) defined as
$$AROC\left(t\right) = Pr\{1 - F_{\bar{D}}(Y_D | \mathbf{X}_{D}) \leq t\},$$
where \(F_{\bar{D}}(\cdot|\mathbf{X}_{\bar{D}})\) denotes the conditional distribution function for \(Y_{\bar{D}}\) conditional on the vector of covariates \(\mathbf{X}_{\bar{D}}\). In particular, the method implemented in this function combines a B-splines dependent Dirichlet process mixture model to estimate \(F_{\bar{D}}(\cdot|\mathbf{X}_{\bar{D}})\) and the Bayesian bootstrap (Rubin, 1981) to estimate the outside probability. More precisely, and letting \(\{(\mathbf{x}_{\bar{D}i},y_{\bar{D}i})\}_{i=1}^{n_{\bar{D}}}\) be a random sample from the nondiseased population
$$F_{\bar{D}}(y_{\bar{D}i}|\mathbf{X}_{\bar{D}}=\mathbf{x}_{\bar{D}i}) = \sum_{l=1}^{L}\omega_l\Phi(y_{\bar{D}i}\mid\mu_{l}(\mathbf{x}_{\bar{D}i}),\sigma_l^2),$$
where \(\mu_{l}(\mathbf{x}_{\bar{D}i}) = \mathbf{z}_{\bar{D}i}^{T}\mathbf{\beta}_l\) and \(L\) is pre-specified (maximum number of mixture components). The \(\omega_l\)'s result from a truncated version of the stick-breaking construction (\(\omega_1=v_1\); \(\omega_l=v_l\prod_{r<l}(1-v_r)\), \(l=2,\ldots,L\); \(v_1,\ldots,v_{L-1}\sim\) Beta \((1,\alpha)\); \(v_L=1\)), \(\mathbf{\beta}_l\sim N_{Q}(\mathbf{m},\mathbf{S})\), and \(\sigma_l^{-2}\sim\Gamma(a,b)\). It is assumed that \(\mathbf{m} \sim N_{Q}(\mathbf{m}_0,\mathbf{S}_0)\) and \(\mathbf{S}^{-1}\sim W(\nu,(\nu\Psi)^{-1})\). Here \(W(\nu,(\nu\Psi)^{-1})\) denotes a Wishart distribution with \(\nu\) degrees of freedom and expectation \(\Psi^{-1}\), and \(Q\) denotes the dimension of vector \(\mathbf{z}_{\bar{D}i}\). For a detailed description, we refer to Inacio de Carvalho and Rodriguez-Alvarez (2018).
Inacio de Carvalho, V., and Rodriguez-Alvarez, M. X. (2018). Bayesian nonparametric inference for the covariate-adjusted ROC curve. arXiv preprint arXiv:1806.00473.
Rubin, D. B. (1981). The Bayesian bootstrap. The Annals of Statistics, 9(1), 130-134.
AROC.bnp
, AROC.bsp
, AROC.sp
, AROC.kernel
, pooledROC.BB
or pooledROC.emp
.
# NOT RUN {
library(AROC)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
# }
# NOT RUN {
m0 <- AROC.bnp(formula.healthy = l_marker1 ~ f(age, K = 0),
group = "status", tag.healthy = 0, data = newpsa, scale = TRUE,
p = seq(0,1,l=101), compute.lpml = TRUE, compute.WAIC = TRUE,
a = 2, b = 0.5, L = 10, nsim = 5000, nburn = 1000)
summary(m0)
plot(m0)
# }
Run the code above in your browser using DataLab