kmbayes: Fit Bayesian kernel machine regression

Description

Fits the Bayesian kernel machine regression (BKMR) model using Markov chain Monte Carlo (MCMC) methods.

Usage

kmbayes(
  y,
  Z,
  X = NULL,
  iter = 1000,
  family = "gaussian",
  id = NULL,
  verbose = TRUE,
  Znew = NULL,
  starting.values = NULL,
  control.params = NULL,
  varsel = FALSE,
  groups = NULL,
  knots = NULL,
  ztest = NULL,
  rmethod = "varying",
  est.h = FALSE
)

Arguments

a vector of outcome data of length n.

an n-by-M matrix of predictor variables to be included in the h function. Each row represents an observation and each column represents an predictor.

an n-by-K matrix of covariate data where each row represents an observation and each column represents a covariate. Should not contain an intercept column.

iter

number of iterations to run the sampler

family

a description of the error distribution and link function to be used in the model. Currently implemented for gaussian and binomial families.

optional vector (of length n) of grouping factors for fitting a model with a random intercept. If NULL then no random intercept will be included.

verbose

TRUE or FALSE: flag indicating whether to print intermediate diagnostic information during the model fitting.

Znew

optional matrix of new predictor values at which to predict h, where each row represents a new observation. This will slow down the model fitting, and can be done as a post-processing step using SamplePred

starting.values

list of starting values for each parameter. If not specified default values will be chosen.

control.params

list of parameters specifying the prior distributions and tuning parameters for the MCMC algorithm. If not specified default values will be chosen.

varsel

TRUE or FALSE: indicator for whether to conduct variable selection on the Z variables in h

groups

optional vector (of length M) of group indicators for fitting hierarchical variable selection if varsel=TRUE. If varsel=TRUE without group specification, component-wise variable selections will be performed.

knots

optional matrix of knot locations for implementing the Gaussian predictive process of Banerjee et al. (2008). Currently only implemented for models without a random intercept.

ztest

optional vector indicating on which variables in Z to conduct variable selection (the remaining variables will be forced into the model).

rmethod

for those predictors being forced into the h function, the method for sampling the r[m] values. Takes the value of 'varying' to allow separate r[m] for each predictor; 'equal' to force the same r[m] for each predictor; or 'fixed' to fix the r[m] to their starting values

est.h

TRUE or FALSE: indicator for whether to sample from the posterior distribution of the subject-specific effects h_i within the main sampler. This will slow down the model fitting.

Value

an object of class "bkmrfit" (containing the posterior samples from the model fit), which has the associated methods:

print (i.e., print.bkmrfit)
summary (i.e., summary.bkmrfit)

References

Bobb, JF, Valeri L, Claus Henn B, Christiani DC, Wright RO, Mazumdar M, Godleski JJ, Coull BA (2015). Bayesian Kernel Machine Regression for Estimating the Health Effects of Multi-Pollutant Mixtures. Biostatistics 16, no. 3: 493-508.

Banerjee S, Gelfand AE, Finley AO, Sang H (2008). Gaussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(4), 825-848.

Examples

Run this code

# NOT RUN {
## First generate dataset
set.seed(111)
dat <- SimData(n = 50, M = 4)
y <- dat$y
Z <- dat$Z
X <- dat$X

## Fit model with component-wise variable selection
## Using only 100 iterations to make example run quickly
## Typically should use a large number of iterations for inference
set.seed(111)
fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE)
# }