logit: Default Bayesian Logistic Regression

Description

Run a Bayesian logistic regression.

Usage

logit(y, X, n=rep(1,length(y)),
      y.prior=0.5, x.prior=colMeans(as.matrix(X)), n.prior=1.0,
      samp=1000, burn=500)

Arguments

An N dimensional vector; $y_i$ is the average response at $x_i$.

An N x P dimensional design matrix; $x_i$ is the ith row.

An N dimensional vector; n_i is the number of observations at each $x_i$.

y.prior

Average response at x.prior.

x.prior

Prior predictor variable.

n.prior

Number of observations at x.prior.

samp

The number of MCMC iterations saved.

burn

The number of MCMC iterations discarded.

Value

logit returns a list.
betaA samp x P array; the posterior sample of the regression coefficients.
wA samp x N' array; the posterior sample of the latent variable. WARNING: N' may be less than N if data is combined.
yThe response matrix--different than input if data is combined.
XThe design matrix--different than input if data is combined.
nThe number of samples at each observation--different than input if data is combined.

Details

Logistic regression is a classification mechanism. Given the binary data ${y_i}$ and the p-dimensional predictor variables ${x_i}$, one wants to forecast whether a future data point y* observed at the predictor x* will be zero or one. Logistic regression stipulates that the statistical model for observing a success=1 or failure=0 is governed by

$$P(y^* = 1 | x^*, \beta) = (1 + \exp(-x* \beta))^{-1}.$$

Instead of representing data as a collection of binary outcomes, one may record the average response $y_i$ at each unique $x_i$ given a total number of $n_i$ observations at $x_i$. We follow this method of encoding data.

Polson and Scott suggest placing a Jeffrey's Beta prior Be(1/2,1/2) on

$$m(\beta) := P(y_0 = 1 | x_0, \beta) = (1 + \exp(-x_0 \beta))^{-1},$$

which generates a Z-distribution prior for $\beta$,

$$p(\beta) = \exp(0.5 x_0 \beta) / (1 + \exp(0.5 x_0 \beta)).$$

One may interpret this as "prior" data where the average response at $x_0$ is $1/2$ based upon a "single" observation. The default value of $x_0=mean(x), x={x_i}$.

References

Nicolas G. Polson, James G. Scott, and Jesse Windle. Bayesian inference for logistic models using Polya-Gamma latent variables. http://arxiv.org/abs/1205.0310

Nicolas Poslon and James G. Scott. Default Bayesian analysis for multi-way tables: a data-augmentation approach. http://arxiv.org/pdf/1109.4180

Examples

Run this code

## From UCI Machine Learning Repository.
data(spambase);

## A subset of the data.
sbase = spambase[seq(1,nrow(spambase),10),];

X = model.matrix(is.spam ~ word.freq.free + word.freq.1999, data=sbase);
y = sbase$is.spam;

## Run logistic regression.
output = logit(y, X, samp=1000, burn=100);

Run the code above in your browser using DataLab