BayesSUR: main function of the package

Description

Main function of the package. Fits a range of models introduced in the package vignette BayesSUR.pdf. Returns an object of S3 class BayesSUR. There are three options for the prior on the residual covariance matrix (i.e., independent inverse-Gamma, inverse-Wishart and hyper-inverse Wishart) and three options for the prior on the latent indicator variable (i.e., independent Bernoulli, hotspot and Markov random field). So there are nine models in total. See details for their combinations.

Usage

BayesSUR(
  Y,
  X,
  X_0 = NULL,
  data = NULL,
  outFilePath = "",
  nIter = 10000,
  burnin = 5000,
  nChains = 2,
  covariancePrior = "HIW",
  gammaPrior = "",
  betaPrior = "independent",
  gammaSampler = "bandit",
  gammaInit = "MLE",
  mrfG = NULL,
  standardize = TRUE,
  standardize.response = TRUE,
  maxThreads = 2,
  output_gamma = TRUE,
  output_beta = TRUE,
  output_G = TRUE,
  output_sigmaRho = TRUE,
  output_pi = TRUE,
  output_tail = TRUE,
  output_model_size = TRUE,
  output_model_visit = FALSE,
  output_CPO = FALSE,
  output_Y = TRUE,
  output_X = TRUE,
  hyperpar = list(),
  tmpFolder = "tmp/"
)

Arguments

Y, X, X_0

vectors of indexes (with respect to the data matrix) for the outcomes, the covariates to select and the fixed covariates respectively if data is either a path to a file or a matrix; if the 'data' argument is not provided, these needs to be matrices containing the data instead.

data

a data frame if using formula. If not using formula, it is either a matrix/dataframe or the path to (a plain text) data file with variables on the columns and observations on the rows

outFilePath

path to where the output files are to be written. The default path is the currect working directory.

nIter

number of iterations for the MCMC procedure.

burnin

number of iterations (or fraction of iterations) to discard at the start of the chain. Default is 0.

nChains

number of parallel chains to run.

covariancePrior

string indicating the prior for the covariance $C$; it has to be either "HIW" for the hyper-inverse-Wishar (which will result in a sparse covariance matrix), "IW" for the inverse-Wishart prior ( dense covariance ) or "IG" for independent inverse-Gamma on all the diagonal elements and 0 otherwise. See the details for the model specification

gammaPrior

string indicating the gamma prior to use, either "hotspot" for the Hotspot prior of Bottolo (2011), "MRF" for the Markov Random Field prior or "hierarchical" for a simpler hierarchical prior. See the details for the model specification

betaPrior

string indicating the beta prior to use, either "independent" for the independent spike-and-slab prior or "reGroup" for the random effects for X_0 and independent spike-and-slab priors for other predictors

gammaSampler

string indicating the type of sampler for gamma, either "bandit" for the Thompson sampling inspired samper or "MC3" for the usual $MC^3$ sampler

gammaInit

gamma initialisation to either all-zeros ("0"), all ones ("1"), randomly ("R") or (default) MLE-informed ("MLE").

mrfG

either a matrix or a path to the file containing the G matrix for the MRF prior on gamma (if necessary)

standardize

logical flag for X variable standardization. Default is standardize=TRUE. The coefficients are returned on the standardized scale.

standardize.response

Standardization for the response variables. Default is standardize.response=TRUE.

maxThreads

maximum threads used for parallelization. Default is 2.

output_gamma

allow ( TRUE ) or suppress ( FALSE ) the output for gamma. See the return value below for more information.

output_beta

allow ( TRUE ) or suppress ( FALSE ) the output for beta. See the return value below for more information.

output_G

allow ( TRUE ) or suppress ( FALSE ) the output for G. See the return value below for more information.

output_sigmaRho

allow ( TRUE ) or suppress ( FALSE ) the output for sigmaRho. See the return value below for more information.

output_pi

allow ( TRUE ) or suppress ( FALSE ) the output for pi. See the return value below for more information.

output_tail

allow ( TRUE ) or suppress ( FALSE ) the output for tail (hotspot tail probability). See the return value below for more information.

output_model_size

allow ( TRUE ) or suppress ( FALSE ) the output for model_size. See the return value below for more information.

output_model_visit

allow ( TRUE ) or suppress ( FALSE ) the output for all visited models over the MCMC iterations. Default is FALSE. See the return value below for more information.

output_CPO

allow ( TRUE ) or suppress ( FALSE ) the output for *; possible outputs are gamma, G, beta, sigmaRho, pi, tail (hotspot tail probability), model_size, CPO. See the return value below for more information.

output_Y

allow ( TRUE ) or suppress ( FALSE ) the output for responses dataset Y.

output_X

allow ( TRUE ) or suppress ( FALSE ) the output for predictors dataset X.

hyperpar

a list of named hypeparameters to use instead of the default values. Valid names are mrf_d, mrf_e, a_sigma, b_sigma, a_tau, b_tau, nu, a_eta, b_eta, a_o, b_o, a_pi, b_pi, a_w and b_w. Their default values are a_w=2, b_w=5, a_omega=1, b_omega=1, a_o=2, b_o=p-2, a_pi=2, b_pi=1, nu=s+2, a_tau=0.1, b_tau=10, a_eta=0.1, b_eta=1, a_sigma=1, b_sigma=1, mrf_d=-3 and mrf_e=0.03. See the vignette for more information.

tmpFolder

the path to a temporary folder where intermediate data files are stored (will be erased at the end of the chain) default to local tmpFolder

Value

An object of class "BayesSUR":

status - the running status
input - a list of all input parameters by the user
output - a list of the all output filenames:
- "*_logP_out.txt" - contains each row for the $1000t$-th iteration's log-likelihoods of parameters, i.e., Tau, Eta, JunctionTree, SigmaRho, O, Pi, Gamma, W, Beta and data conditional log-likelihood depending on the models.
- "*_gamma_out.txt" - posterior mean of the latent indicator matrix.
- "*_pi_out.txt" - posterior mean of the predictor effects (prospensity) by decomposing the probability of the latent indicator.
- "*_hotspot_tail_p_out.txt" - posterior mean of the hotspot tail probability. Only available for the hotspot prior on the gamma.
- "*_beta_out.txt" - posterior mean of the coefficients matrix.
- "*_G_out.txt" - posterior mean of the response graph. Only available for the HIW prior on the covariance.
- "*_sigmaRho_out.txt" - posterior mean of the transformed parameters. Not available for the IG prior on the covariance.
- "*_model_size_out.txt" - contains each row for the$1000t$-th iteration's model sizes of the multiple response variables.
- "*_model_visit_g_out.txt" - contains each row for the nonzero indices of the vectorized estimated graph matrix for each iteration.
- "*_model_visit_gamma_out.txt" - contains each row for the nonzero indices of the vectorized estimated gamma matrix for each iteration.
- "*_CPO_out.txt" - the (scaled) conditional predictive ordinates (CPO).
- "*_CPOsumy_out.txt" - the (scaled) conditional predictive ordinates (CPO) with joint posterior predictive of the response variables.
- "*_WAIC_out.txt" - the widely applicable information criterion (WAIC).
- "*_Y.txt" - responses dataset.
- "*_X.txt" - predictors dataset.
- "*_X0.txt" - fixed predictors dataset.
call - the matched call.

Details

The arguments covariancePrior and gammaPrior specify the model HRR, dSUR or SSUR with different gamma prior. Let $\gamma_{jk}$ be latent indicator variable of each coefficient and $C$ be covariance matrix of response variables. The nine models specified through the arguments covariancePrior and gammaPrior are as follows.

	$\gamma_{jk}$~Bernoulli	$\gamma_{jk}$~hotspot	$\gamma_{jk}$~MRF
$C$~indep	HRR-B	HRR-H	HRR-M
$C$~IW	dSUR-B	dSUR-H	dSUR-M

References

Banterle M, Bottolo L, Richardson S, Ala-Korpela M, Jarvelin MR, Lewin A (2018). Sparse variable and covariance selection for high-dimensional seemingly unrelated Bayesian regression. bioRxiv: 467019.

Banterle M#, Zhao Z#, Bottolo L, Richardson S, Lewin A\*, Zucknick M\* (2019). BayesSUR: An R package for high-dimensional multivariate Bayesian variable and covariance selection in linear regression. URL: https://github.com/mbant/BayesSUR/tree/master/BayesSUR/vignettes/vignettes.pdf

Examples

Run this code

# NOT RUN {
data("example_eQTL", package = "BayesSUR")
hyperpar <- list( a_w = 2 , b_w = 5 )
set.seed(9173)
fit <- BayesSUR(Y = example_eQTL[["blockList"]][[1]], 
                X = example_eQTL[["blockList"]][[2]],
                data = example_eQTL[["data"]], outFilePath = tempdir(),
                nIter = 100, burnin = 50, nChains = 2, gammaPrior = "hotspot",
                hyperpar = hyperpar, tmpFolder = "tmp/", output_CPO=TRUE)

## check output
# show the summary information
summary(fit)

# show the estimated beta, gamma and graph of responeses Gy
# }
# NOT RUN {
estimators <- getEstimator(fit, estimator=c("beta","gamma","Gy"))
plot(estimators)

#Set up temporary work directory for saving a pdf figure
td <- tempdir()
oldwd <- getwd()
setwd(td)

# Produce authentic math formulas in the graph
plot(estimators, fig.tex = TRUE)
system(paste(getOption("pdfviewer"), "ParamEstimator.pdf"))
setwd(oldwd)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab

	\(\gamma_{jk}\)~Bernoulli	\(\gamma_{jk}\)~hotspot	\(\gamma_{jk}\)~MRF
\(C\)~indep	HRR-B	HRR-H	HRR-M
\(C\)~IW	dSUR-B	dSUR-H	dSUR-M