Learn R Programming

BayesLogit (version 0.1-0)

logit: Default Bayesian Logistic Regression

Description

Run a Bayesian logistic regression.

Usage

logit(y, X, n=rep(1,length(y)),
      y.prior=0.5, x.prior=colMeans(as.matrix(X)), n.prior=1.0,
      samp=1000, burn=500)

Arguments

y
An N dimensional vector; $y_i$ is the average response at $x_i$.
X
An N x P dimensional design matrix; $x_i$ is the ith row.
n
An N dimensional vector; n_i is the number of observations at each $x_i$.
y.prior
Average response at x.prior.
x.prior
Prior predictor variable.
n.prior
Number of observations at x.prior.
samp
The number of MCMC iterations saved.
burn
The number of MCMC iterations discarded.

Value

  • logit returns a list.
  • betaA samp x P array; the posterior sample of the regression coefficients.
  • wA samp x N' array; the posterior sample of the latent variable. WARNING: N' may be less than N if data is combined.
  • yThe response matrix--different than input if data is combined.
  • XThe design matrix--different than input if data is combined.
  • nThe number of samples at each observation--different than input if data is combined.

Details

Logistic regression is a classification mechanism. Given the binary data ${y_i}$ and the p-dimensional predictor variables ${x_i}$, one wants to forecast whether a future data point y* observed at the predictor x* will be zero or one. Logistic regression stipulates that the statistical model for observing a success=1 or failure=0 is governed by

$$P(y^* = 1 | x^*, \beta) = (1 + \exp(-x* \beta))^{-1}.$$

Instead of representing data as a collection of binary outcomes, one may record the average response $y_i$ at each unique $x_i$ given a total number of $n_i$ observations at $x_i$. We follow this method of encoding data.

Polson and Scott suggest placing a Jeffrey's Beta prior Be(1/2,1/2) on

$$m(\beta) := P(y_0 = 1 | x_0, \beta) = (1 + \exp(-x_0 \beta))^{-1},$$

which generates a Z-distribution prior for $\beta$,

$$p(\beta) = \exp(0.5 x_0 \beta) / (1 + \exp(0.5 x_0 \beta)).$$

One may interpret this as "prior" data where the average response at $x_0$ is $1/2$ based upon a "single" observation. The default value of $x_0=mean(x), x={x_i}$.

References

Nicolas G. Polson, James G. Scott, and Jesse Windle. Bayesian inference for logistic models using Polya-Gamma latent variables. http://arxiv.org/abs/1205.0310

Nicolas Poslon and James G. Scott. Default Bayesian analysis for multi-way tables: a data-augmentation approach. http://arxiv.org/pdf/1109.4180

See Also

rpg, logit.EM, mlogit

Examples

Run this code
## From UCI Machine Learning Repository.
data(spambase);

## A subset of the data.
sbase = spambase[seq(1,nrow(spambase),10),];

X = model.matrix(is.spam ~ word.freq.free + word.freq.1999, data=sbase);
y = sbase$is.spam;

## Run logistic regression.
output = logit(y, X, samp=1000, burn=100);

Run the code above in your browser using DataLab