Learn R Programming

roben (version 0.1.1)

roben: fit a robust Bayesian variable selection

Description

fit a robust Bayesian variable selection model for G×E interactions.

Usage

roben(
  X,
  Y,
  E,
  clin = NULL,
  iterations = 10000,
  burn.in = NULL,
  robust = TRUE,
  sparse = TRUE,
  structure = c("sparsegroup", "group", "individual"),
  hyper = NULL,
  debugging = FALSE
)

Arguments

X

the matrix of predictors (genetic factors) without intercept. Each row should be an observation vector. X will be centered and a column of 1 will be added to the X matrix as the intercept.

Y

the response variable. The current version of roben only supports continuous response.

E

a matrix of environmental factors. E will be centered. The interaction terms between X (G factors) and E will be automatically created and included in the model.

clin

a matrix of clinical variables. Clinical variables are not subject to penalty.

iterations

the number of MCMC iterations.

burn.in

the number of iterations for burn-in.

robust

logical flag. If TRUE, robust methods will be used.

sparse

logical flag. If TRUE, spike-and-slab priors will be used to shrink coefficients of irrelevant covariates to zero exactly.

structure

three choices are available. "sparsegroup" for sparse-group selection, which is a bi-level selection on both group-level and individual-level. "group" for selection on group-level only. "individual" for selection on individual-level only.

hyper

a named list of hyperparameters.

debugging

logical flag. If TRUE, progress will be output to the console and extra information will be returned.

Details

Consider the data model described in "data": $$Y_{i} = \alpha_{0} + \sum_{t=1}^{q}\alpha_{t}Clin_{it} + \sum_{m=1}^{k}\theta_{m}E_{im} + \sum_{j=1}^{p} \big(U_{ij}^\top\beta_{j}\big) +\epsilon_{i},$$ where the main and interaction effects of the \(j\)th genetic variant is corresponding to the coefficient vector \(\beta_{j}=(\beta_{j1}, \beta_{j2},\ldots,\beta_{jL})^\top\).

When structure="sparsegroup" (default setting), selection will be conducted on both individual and group levels (bi-level selection):

  • Group-level selection: by determining whether \(||\beta_{j}||_{2}=0\), we can know if the \(j\)th genetic variant has any effect at all.

  • Individual-level selection: investigate whether the \(j\)th genetic variant has main effect, G\(\times\)E interaction or both, by determining which components in \(\beta_{j}\) has non-zero values.

If structure="group", only group-level selection will be conducted on \(||\beta_{j}||_{2}\). If structure="individual", only individual-level selection will be conducted on each \(\beta_{jl}\), (\(l=1,\ldots,L\)).

When sparse=TRUE (default), spike--and--slab priors are imposed on individual and/or group levels to identify important main and interaction effects. Otherwise, Laplacian shrinkage will be used.

When robust=TRUE (default), the distribution of \(\epsilon_{i}\) is defined as a Laplace distribution with density \( f(\epsilon_{i}|\nu) = \frac{\nu}{2}\exp\left\{-\nu |\epsilon_{i}|\right\} \), (\(i=1,\dots,n\)), which leads to a Bayesian formulation of LAD regression. If robust=FALSE, \(\epsilon_{i}\) follows a normal distribution.

Both \(X\) and \(E\) will be centered before the generation of interaction terms, in order to prevent the multicollinearity between main effects and interaction terms.

Users can modify the hyper-parameters by providing a named list of hyper-parameters via the argument `hyper'. The list can have the following named components

  • a0, b0: shape parameters of the Beta priors (\(\pi^{a_{0}-1}(1-\pi)^{b_{0}-1}\)) on \(\pi_{0}\).

  • a1, b1: shape parameters of the Beta priors (\(\pi^{a_{1}-1}(1-\pi)^{b_{1}-1}\)) on \(\pi_{1}\).

  • c1, c2: the shape parameter and the rate parameter of the Gamma prior on \(\nu\).

  • d1, d2: the shape parameter and the rate parameter of the Gamma priors on \(\eta\).

Please check the references for more details about the prior distributions.

See Also

GxESelection

Examples

Run this code
data(GxE_small)

## default method
iter=5000
fit=roben(X, Y, E, clin, iterations = iter)
fit$coefficient

## Ture values of parameters of main G effects and interactions
coeff$GE

## Compute TP and FP
sel=GxESelection(fit)
pos=which(sel$indicator != 0)
tp=length(intersect(which(coeff$GE != 0), pos))
fp=length(pos)-tp
list(tp=tp, fp=fp)

# \donttest{
## alternative: robust group selection
fit=roben(X, Y, E, clin, iterations=iter, structure="g")
fit$coefficient

## alternative: non-robust sparse group selection
fit=roben(X, Y, E, clin, iterations=iter, robust=FALSE)
fit$coefficient
# }

Run the code above in your browser using DataLab