probiths: Horseshoe shrinkage prior in Bayesian Probit regression

Description

This function employs the algorithm provided by Makalic and Schmidt (2016) for binary probit model to fit Bayesian probit regression. The observations are updated according to the data augmentation of approach of Albert and Chib (1993).

The model is: \(z_i\) is response either 1 or 0, \(\log \Pr(z_i = 1) = \Phi(X\beta), \Phi \sim N(0,\sigma^2)\).

Usage

probiths(
  z,
  X,
  method.tau = c("fixed", "truncatedCauchy", "halfCauchy"),
  tau = 1,
  burn = 1000,
  nmc = 5000,
  thin = 1,
  alpha = 0.05,
  Xtest = NULL
)

Arguments

Response, a \(n*1\) vector of 1 or 0.

Matrix of covariates, dimension \(n*p\).

method.tau

Method for handling \(\tau\). Select "truncatedCauchy" for full Bayes with the Cauchy prior truncated to [1/p, 1], "halfCauchy" for full Bayes with the half-Cauchy prior, or "fixed" to use a fixed value (an empirical Bayes estimate, for example).

tau

Use this argument to pass the (estimated) value of \(\tau\) in case "fixed" is selected for method.tau. Not necessary when method.tau is equal to"halfCauchy" or "truncatedCauchy". The default (tau = 1) is not suitable for most purposes and should be replaced.

burn

Number of burn-in MCMC samples. Default is 1000.

nmc

Number of posterior draws to be saved. Default is 5000.

thin

Thinning parameter of the chain. Default is 1 (no thinning).

alpha

Level for the credible intervals. For example, alpha = 0.05 results in 95% credible intervals.

Xtest

test design matrix.

Value

ProbHat

Predictive probability

BetaHat

Posterior mean of Beta, a \(p\) by 1 vector

LeftCI

The left bounds of the credible intervals

RightCI

The right bounds of the credible intervals

BetaMedian

Posterior median of Beta, a \(p\) by 1 vector

LambdaHat

Posterior samples of \(\lambda\), a \(p*1\) vector

TauHat

Posterior mean of global scale parameter tau, a positive scalar

BetaSamples

Posterior samples of \(\beta\)

TauSamples

Posterior samples of \(\tau\)

LikelihoodSamples

Posterior samples of likelihood

DIC

Devainace Information Criterion of the fitted model

WAIC

Widely Applicable Information Criterion

References

Stephanie van der Pas, James Scott, Antik Chakraborty and Anirban Bhattacharya (2016). horseshoe: Implementation of the Horseshoe Prior. R package version 0.1.0. https://CRAN.R-project.org/package=horseshoe

Enes Makalic and Daniel Schmidt (2016). High-Dimensional Bayesian Regularised Regression with the BayesReg Package arXiv:1611.06649

Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American statistical Association, 88(422), 669-679.

Examples

Run this code

# NOT RUN {
burnin <- 100
nmc    <- 200
thin   <- 1
y.sd   <- 1  # statndard deviation of the response

p <- 200  # number of predictors
ntrain <- 250  # training size
ntest  <- 100   # test size
n <- ntest + ntrain  # sample size
q <- 10   # number of true predictos

beta.t <- c(sample(x = c(1, -1), size = q, replace = TRUE), rep(0, p - q))  
x <- mvtnorm::rmvnorm(n, mean = rep(0, p))    
zmean <- x %*% beta.t

y <- rnorm(n, mean = zmean, sd = y.sd)
z <- ifelse(y > 0, 1, 0)
X <- scale(as.matrix(x))  # standarization
z <- as.numeric(as.matrix(c(z)))

# Training set
ztrain <- z[1:ntrain]
Xtrain  <- X[1:ntrain, ]

# Test set
ztest <- z[(ntrain + 1):n]
Xtest <- X[(ntrain + 1):n, ]
 
posterior.fit <- probiths(z = ztrain, X = Xtrain, method.tau = "halfCauchy",
                          burn = burnin, nmc = nmc, thin = 1,
                          Xtest = Xtest)

posterior.fit$BetaHat

# Posterior processing to recover the significant predictors
cluster     <- kmeans(abs(posterior.fit$BetaHat), centers = 2)$cluster  # return cluster indices
cluster1    <- which(cluster == 1)
cluster2    <- which(cluster == 2)
min.cluster <- ifelse(length(cluster1) < length(cluster2), 1, 2)
which(cluster == min.cluster)  # this matches with the true variables


# }

Run the code above in your browser using DataLab