gibbs_abms: Bayesian variable selection models via a spike-and-slab methodology.

Description

A Bayesian model selection methodology based on the spike-and-slab strategy and an augmentation technique for Linear, Logistic, Negative Binomial, Quantile, and Skew Normal Regression. The model considers a response vector \(y\) of size \(n\) and \(p\) predictors to perform coefficient estimation and asses which ones are relevant to explain the response distribution. Other parameters related to the family selected are also estimated. Summary results can be provided using the summary_gibbs() R function.

Usage

gibbs_abms(
  y,
  Covariates,
  family = "LiR",
  first_excluded = 0,
  nchain = 10000,
  burnin = 2000,
  tau2 = 1000,
  rho = 1,
  ni = rep(1, length(y)),
  alpha = 0.5,
  a0 = 1,
  b0 = 1,
  d = 2,
  b2 = 1/2,
  count.iteration = TRUE
)

Value

A abms object with the following variables:

family: This character object prints the name of the fitted hierarchical regression model. It needs to be extracted from the list 'Default'.
prednames: A character object that prints the predictors names, using the columns names of the Covariates argument. It needs to be extracted from the list 'Default'.
Seconds: How many seconds the method took. It needs to be extracted from the list 'Default'.
tau2: The tau2 that was used as argument.
y: The y response vector that was used as argument.
Covariates: The Covariates data frame or matrix that was used as argument.
beta_chain: The coefficients sample for each Gibbs sampler iteration. A (nchain x \(p\)) matrix
sigma2_chain: For the Linear, Quantile and Skew-Normal regression only. The variance parameter (\(\sigma^2\)) sample for each Gibbs sampler iteration. A (nchain x 1) matrix
r_chain: For the Negative-Binomial regression only. The number of failure parameter (\(r\)) sample for each Gibbs sampler iteration. A (nchain x 1) matrix
lambda_chain: For the Skew-Normal regression only. The asymmetric parameter (\(\lambda\)) sample for each Gibbs sampler iteration. A (nchain x 1) matrix
model_chain: The model selected at each Gibbs sampler iteration. A (nchain x \(p\)) matrix.
Z_chain: For internal use.
t_chain: For internal use.

Arguments

y: A vector of size \(n\) with observed responses. It can also be a (\(n x 1\)) matrix.
Covariates: A data.frame object with the predictors (without the intercept) for which we want to test if they are relevant to the response variable. It can also be a (\(n x p\)) matrix.
family: A character object that describes the hierarchical regression model that will be used. If family="LiR", then a Linear regresion model will be fitted (gaussian errors). If family="LoR", then a Logistic regresion model will be fitted (binomial distribution). If family="NBR", then a Negative Binomial regresion model will be fitted (mean \(r(1-p)/p\)). If family="QR", then a Quantile regresion model will be fitted (Asymmetric Laplace errors). If family="SNR", then a Skew normal regresion model will be fitted (Skew-Normal errors). The argument is fixed at family="LiR" by default.
first_excluded: A non-negative integer that indicates which first columns will not be tested. For example, if first_excluded=2, the two first columns of Covariates will not be tested. Intercept is always excluded for the selection process.
nchain: The Gibbs sampler's chain size, it must be a non-negative integer. The default value is 10,000
burnin: The burn-in period of the Gibbs sampler, it must be a non-negative integer and greater than nchain. The default value is 2,000
tau2: The variance prior of each coefficient, it must be a positive real number. Fixed at 1 by deafault
rho: The parameter of the Womack prior, it must be a positive real number. Fixed at 1 by deafault
ni: For Logistic regression only. A vector of size \(n\) that represent the i-th individual size (the size parameter of the Binomial distribution) that it must be a positive integer. It can also be a (\(n x 1\)) matrix. For default, all individual size are fixed at 1.
alpha: For Quantile regression only. The desired quantile for which we want to perform Quantile regression. alpha must be between (\(0,1\)). By default, alpha=0.5, that is, median regression.
a0: This argument depends on the family choosen. For family="LiR", is the shape hyper-parameter of the \(Gamma\) prior to the variance parameter (\(\sigma^2\)) of the Gaussian distribution. For family="NBR" is the shape hyper-parameter of the \(Gamma\) prior to the parameter \(r\) the Negative Binomial distribution (the number of successes until the experiment is stopped). For family="QR" is the shape hyper-parameter of the \(Gamma\) prior to thevariance parameter (\(\sigma^2\)) of the Asymmetric Laplace distribution. Note thas this argument do not exist for family=LoR and family=SNR. For all hierarchical regression models, it must be a positive real number and its fixed at 1 by deafault.
b0: This argument depends on the family choosen. For family="LiR" is the scale hyper-parameter of the \(Gamma\) prior to the variance parameter (\(\sigma^2\)) of the Gaussian distribution. For family="NBR" is the scale hyper-parameter of the \(Gamma\) prior to the parameter \(r\) the Negative Binomial distribution (the number of successes until the experiment is stopped). For family="QR" is the scale hyper-parameter of the \(Gamma\) prior to the variance parameter (\(\sigma^2\)) of the Asymmetric Laplace distribution. Note thas this argument do not exist for family=LoR and family=SNR. For all hierarchical regression models, it must be a positive real number and its fixed at 1 by deafault.
d: For the Skew-Normal regression only. It is the location hyper-parameter of the t-student prior to the parameter \(lambda\) (asymmetric parameter of the Skew-Normal distribution). By default is fixed at 2, which is recommended.
b2: For the Skew-Normal regression only. It is the scale hyper-parameter of the t-student prior to the parameter lambda (asymmetric parameter of the Skew-Normal distribution). By default is fixed at 1/2, which is recommended.
count.iteration: A logical argument. If TRUE, a counter for the Gibbs sampler iterations will be displayed. Fixed at TRUE by deafult.

References

Azzalini (1985). A class of distributions which includes the normal ones, Scandinavian Journal of Statistics 12(2): 171:178.

Bayes, C. and Branco, M. (2007). Bayesian inference for the skewness parameter of the scalar skew-normal distribution. Brazilian Journal of Probability and Statistics. 21: 141:163.

Kotz, S., Kozubowski, T. and Podgorski, K. (2001). The Laplace Distribution and Generalization, first edn, Birkhauser Basel.

Polson, N., Scott, J., and Windle, J. (2013). Bayesian Inference for Logistic Models Using Polya Gamma Latent Variables. Journal of the American Statistical Association, 108: 1339:1349.

Zhou, W. and Carin, L. (2013). Negative Binomial Process Count and Mixture Modeling. arXiv:1405.0506v1.

Examples

Run this code

##################################################
## 	    	Gibbs for Linear Regression 		##
##################################################

## Simulating data
set.seed(31415)
N<-200
r_beta<-as.matrix(c(1, 0, 2, 0))
r_p<-length(r_beta)
r_sigma2<-1.5
X<-matrix( c(rep(1, N), rnorm((r_p -1)*N)), ncol=r_p )
Xbeta<-X%*%r_beta
y<-rnorm(N, mean=Xbeta , sd=sqrt(r_sigma2))
Covariates<-X[,2:(length(r_beta))];
colnames(Covariates)<-c("X1", "X2", "X3")

## Fitting the model
fit<- gibbs_abms(y, Covariates, family="LiR", first_excluded=0, nchain=1000, burnin=20,
 a0=1, b0=1)

summary_gibbs(fit, BF=FALSE)	#Summary results

## For more examples, see "Model Ilustrations.R" file in
## https://github.com/SirCornflake/BMS

Run the code above in your browser using DataLab