Learn R Programming

BayesSIM (version 1.0.0)

gpSpike: Bayesian Single-Index Regression with Gaussian Process Link and Spike-and-Slab Prior

Description

Fits a single-index model \(Y_i \sim \mathcal{N}(f(X_i'\theta), \sigma^2), i = 1,\cdots,n\), where index vector \(\theta\) has a spike and slab prior and the link \(f(\cdot)\) is represented by Gaussian process and the

Usage

gpSpike(
  formula,
  data,
  prior = NULL,
  init = NULL,
  monitors = NULL,
  niter = 10000,
  nburnin = 1000,
  thin = 1,
  nchain = 1,
  setSeed = FALSE
)

gpSpike_setup( formula, data, prior = NULL, init = NULL, monitors = NULL, niter = 10000, nburnin = 1000, thin = 1, nchain = 1, setSeed = FALSE )

Value

A list typically containing:

coefficients

Mean estimates of index vector. Return list of model_setup does not include it.

ses

Mean standard error of index vector. Return list of model_setup does not include it.

residuals

Residuals with mean estimates of fitted values. Return list of model_setup does not include it.

fitted.values

Mean estimates of fitted values. Return list of model_setup does not include it.

linear.predictors

Mean estimates of single-index values. Return list of model_setup does not include it.

gof

Goodness-of-fit. Return list of model_setup function does not include it.

samples

Posterior draws of variables for computing fitted values of the model, including \(\theta\), \(\sigma^2\). Return list of model_setup does not include it.

input

List of data used in modeling, formula, input values for prior and initial values, and computation time without compiling.

defModel

Nimble model object.

defSampler

Nimble sampler object.

modelName

Name of the model.

Arguments

formula

an object of class formula. Interaction term is not allowed for single-index model.

data

an data frame.

prior

Optional named list of prior settings. Further descriptions are in Details section.

init

Optional named list of initial values. If the values are not assigned, they are randomly sampled from prior or designated value. Further descriptions are in Details section.

monitors

Optional character vector of additional monitor nodes. To check the names of the nodes, fit the model_setup function and then inspect the variable names stored in the model object using getVarMonitor.

niter

Integer. Total MCMC iterations (default 10000).

nburnin

Integer. Burn-in iterations (default 1000).

thin

Integer. Thinning for monitors (default 1).

nchain

Integer. Number of MCMC chains (default 1).

setSeed

Logical or numeric argument. Further details are provided in runMCMC setSeed argument.

Details

Model The single–index model is specified as \(Y_i = f(X_i' \theta) + \epsilon_i\), where \(\theta\) is a p-dimensional index vector subject to a spike-and-slab prior for variable selection. The link function \(f(\cdot)\) is modeled using a Gaussian process prior with zero mean and squared exponential covariance kernel \(K(x_1, x_2) = \exp\{-\rho {(x_1 - x_2)^{T}\theta}^2\}\), where \(\rho\) determines the smoothness of \(f\). The covariance kernel is re-parameterized to \(\exp\{-{(x_1 - x_2)^{T}\theta^{*}}^2\}\) where \(\rho = ||\theta^{*}||\) and \(\theta = ||\theta||^{-1}\theta^{*}\). Therefore, \(\theta^{*}\) is sampled in MCMC.

Priors

  • The variable selection indicator \(\nu\) has Beta–Bernoulli hierarchy prior. Beta hyper-prior on the Bernoulli–inclusion probability \(w\), inducing \(p(\nu) \propto \mathrm{Beta}(r_1 + n_\nu, r_2 + p - n_\nu)\) where \(n_\nu = \Sigma_{i=1}^{p}I(\nu_i = 1)\). \(r_1, r_2\) are shape and rate parameter of beta distribution.

  • Slab coefficients \(\theta\) have normal distribution with zero mean and \(\sigma_\theta^2\) variance.

  • GP precision \(\lambda^{-1}\) follows gamma distribution with shape parameter \(a_\lambda\), and rate parameter \(b_\lambda\).

  • Error precision \((\sigma^2)^{-1}\) follows gamma distribution with shape parameter \(a_\sigma\), and rate parameter \(b_\sigma\).

Sampling A random walk Metropolis algorithm is used to sample \(\lambda^{-1}\) and a Metropolis-Hastings algorithm is used for the main parameters \((\theta^{*}, \nu)\). The variance \(\sigma^2\) is directly sampled from posterior distribution. \(f\) is not directly sampled by MCMC.

Prior hyper-parameters These are the prior hyper-parameters set in the function. You can define new values for each parameter in prior_param.

  1. Index vector: index_nu_r1, index_nu_r2 gives the shape and rate parameter of beta-binomial prior, respectively. For slab prior, normal distribution with zero mean is assigned for selected variables \(\theta\). index_sigma_theta is for variance of \(\theta\), and it is assigned 0.25 by default.

  2. Link function: Inverse gamma prior is assigned for hyper-parameters \(\lambda^{-1}\) link_inv_lambda_shape is shape parameter and link_inv_lambda_rate is rate parameter of inverse gamma distribution. (default link_inv_lambda_shape = 1, link_inv_lambda_rate = 0.1)

  3. Error variance (sigma2): An Inverse gamma prior is assigned to \(\sigma^2\) where sigma2_shape is shape parameter and sigma2_rate is rate parameter of inverse gamma distribution. (default sigma2_shape = 0.001, sigma2_rate = 100)

Initial values These are the initial values set in the function. You can define new values for each initial value in init_param

  1. Index vector:

    • index_pi: Initial selecting variable probability. (default: 0.5)

    • index_nu: Initial vector of inclusion indicators . By default, each index_nu is randomly drawn by \(Bernoulli(1/2)\)

    • index: Initial vector of index. By default, each element of index vector, which is chosen by indicator, is proposed by normal distribution.

  2. Link function: Initial scalar of lambda (link_inv_lambda) for covariance kernel of Gaussian process.

  3. Error variance (sigma2): Initial scalar error variance. (default: 0.01)

References

McGee, G., Wilson, A., Webster, T. F., & Coull, B. A. (2023). Bayesian multiple index models for environmental mixtures. Biometrics, 79(1), 462-474.

Examples

Run this code
# \donttest{
set.seed(123)
n <- 200; d <- 4
theta <- c(2, 1, 1, 1); theta <- theta / sqrt(sum(theta^2))
f <- function(u) u^2 * exp(u)
sigma <- 0.5
X <- matrix(runif(n * d, -1, 1), nrow = n)
index_vals <- as.vector(X %*% theta)
y <- f(index_vals) + rnorm(n, 0, sigma)
simdata <- data.frame(x = X, y = y)
colnames(simdata) <- c(paste0("X", 1:4), "y")

# One tool version
fit1 <- gpSpike(y ~ ., data = simdata,
               niter = 5000, nburnin = 1000)

# Split version
models <- gpSpike_setup(y ~ ., data = simdata)
Ccompile <- compileModelAndMCMC(models)
nimSampler <- get_sampler(Ccompile)
initList <- getInit(models)
mcmc.out <- runMCMC(nimSampler, niter = 5000, nburnin = 1000, thin = 1,
                    nchains = 1, setSeed = TRUE, inits = initList,
                    summary = TRUE, samplesAsCodaMCMC = TRUE)
fit2 <- as_bsim(models, mcmc.out)
summary(fit2)
# }

Run the code above in your browser using DataLab