bnpglm(formula,family,data,offset,sampler="slice",WorkingDir,
ncomp,sweeps,burn,seed,V,Vdf,Mu.nu,Sigma.nu,Mu.mu,Sigma.mu,
Alpha.xi,Beta.xi,Alpha.alpha,Beta.alpha,Turnc.alpha,
Xpred,offsetPred,...)y ~ x.sampler="slice" which
implements a slice sampler (Walker, 2007; Papaspiliopoulos, 2008) and
sampler="truncated" which proceeds
sampler="slice" is chosen, ncomp needs to be specified as it is used in the initialization processsweeps-burn
samples will be written.family="poisson", this argument is parameter$\alpha_{\xi}$of the prior of the Poisson rate:$\xi \sim$Gamma($\alpha_{\xi},\beta_{\xi}$).famfamily="poisson", this argument is parameter$\beta_{\xi}$of the prior of the Poisson rate:$\xi \sim$Gamma($\alpha_{\xi},\beta_{\xi}$).familXpred. offsetPred is a vector of
length equal to the rows of Xpred. If family=poisson, its entries are the associated
bnpglm returns the following:Xpred is specified, the function returns the conditional expectation of the response given each new covariate $x$.Xpred is specified, the function returns the conditional median of the response given each new covariate $x$.Xpred is specified, the function returns the conditional first quantile of the response given each new covariate $x$.Xpred is specified, the function returns the conditional third quantile of the response given each new covariate $x$.Xpred is specified, the function returns the conditional mode of the response given each new covariate $x$.bnpglm creates files where the posterior samples are written. These files are (with all file names
preceded by `BNSP.'):(sweeps-burn)*ncomp lines and $p^2$ columns. In more detail, each sweep creates ncomp lines representing samples $T_h^{(sw)}, h=1,\dots,ncomp$, where superscript $sw$ represents a particular sweep. The elements of
$T_h^{(sw)}$ are written in the columns of the file: the entries in the first $p$ columns of the file are those in the first column (or row) of $T_h^{(sw)}$, while the entries in the last $p$ columns of the file are those in the last column (or row) of $T_h^{(sw)}$.(sweeps-burn)*ncomp lines and $p^2$ columns. In more detail, each sweep creates ncomp lines representing samples $\Sigma_h^{(sw)}, h=1,\dots,ncomp$, where superscript $sw$ represents a particular sweep. The elements of
$\Sigma_h^{(sw)}$ are written in the columns of the file: the entries in the first $p$ columns of the file are those in the first column (or row) of $\Sigma_h^{(sw)}$, while the entries in the last $p$ columns of the file are those in the last column (or row) of $\Sigma_h^{(sw)}$.(sweeps-burn)*ncomp lines and $p^2$ columns. In more detail, each sweep creates ncomp lines representing samples $(\Sigma_h^{-1})^{(sw)}, h=1,\dots,ncomp$, where superscript $sw$ represents a particular sweep. The elements of $(\Sigma_h^{-1})^{(sw)}$ are written in the columns of the file: the entries in the first $p$ columns of the file are those in the first column (or row) of $(\Sigma_h^{-1})^{(sw)}$, while the entries in the last $p$ columns of the file are those in the last column (or row) of $(\Sigma_h^{-1})^{(sw)}$.(sweeps-burn)*ncomp lines and $p$ columns. In more detail, each sweep creates ncomp lines representing samples $\nu_h^{(sw)}, h=1,\dots,ncomp$, where superscript $sw$ represents a particular sweep. The elements of $\nu_h^{(sw)}$ are written in the columns of the file.(sweeps-burn)*ncomp lines and $p$ columns. In more detail, each sweep creates ncomp lines representing samples $\mu_h^{(sw)}, h=1,\dots,ncomp$, where superscript $sw$ represents a particular sweep. The elements of $\mu_h^{(sw)}$ are written in the columns of the file.(sweeps-burn)*ncomp lines and one or two columns, depending on the number of parameters in the selected $F(.;\lambda)$. Sweeps write in the file ncomp lines representing samples $\xi_h^{(sw)}, h=1,\dots,ncomp$, where superscript $sw$ represents a particular sweep.(sweeps-burn) lines and one column, each line including one posterior sample.(sweeps-burn) lines, that represent the posterior samples, and $n$ columns, that represent the sampling units. Entries in this file range from 0 to $ncomp-1$.(sweeps-burn) lines and ncomp columns, where the lines represent posterior samples while the columns represent the components or clusters. The entries represent the number of sampling units allocated to the components.(sweeps-burn) lines with the number of components updated at each iteration of the sampler.Xpred.
The file has (sweeps-burn)*npred lines, where npred is the number of rows in Xpred. That is,
at each iteration of the sampler, one line for each `new' covariate vector $x$ is written. The columns of the file represent
the possible values of $Y$, staring from zero and continuing to a max number.bnpglm returns samples from the posterior distributions of the parameters of the model:
$$f(y_i,x_i) = \sum_{h=0}^{\infty} \pi_h f(y_i,x_i|\theta_h), \hspace{80pt} (1)$$
where $y_i$ is a univariate discrete response,
$x_i$ is a $p$-dimensional vector of continuous covariates, and $\pi_h, h \geq 1,$ are obtained according to
Sethuraman's (1994) stick-breaking construction:
$\pi_1 = v_1$, and for $l \geq 2, \pi_l = v_l \prod_{j=1}^{l-1} (1-v_j)$, where $v_k$ are iid samples
$v_k \sim$Beta $(1,\alpha), k \geq 1.$
The discrete responses $y_i$ are represented as discretized versions of continuous latent variables $y_i^*$.
Observed discrete and continuous latent variables are connected by:
$$y_{i} = q \iff c_{i,q-1} < y^*_{i} < c_{i,q},$$
where the cut-points are obtained as: $c_{i,-1} = -\infty$,
while for $q \geq 0$, $c_{i,q} = c_{q}(\lambda_{i}) = \Phi^{-1}{F(q;\lambda_i)}.$ Here $\Phi(.)$
is the cumulative distribution function (cdf) of a standard normal variable
and $F()$ denotes an appropriate cdf. Further, latent variables are assumed to
independently follow a $N(0,1)$ distribution, where the mean and variance are restricted to be zero and one as
they are non-identifiable by the data. Choices for $F()$ are described next.
For counts, currently two options are supported. First, $F(.;\lambda_i)$ can be specified as the
cdf of a Poisson$(H_i \xi_h)$ variable. Here $\lambda_i=(\xi_h,H_i)^T, \xi_h$ denotes the Poisson rate
associated with cluster $h$, and $H_i$ the offset term associated with sampling unit $i$.
Second, $F(.;\lambda_i)$ can be specified as the negative binomial cdf, where $\lambda=(\xi_{1h},\xi_{2h},H_i)^T$.
For Binomial data, currently two options are supported. First, $F(.;\lambda_i)$ may be taken to be the cdf of a
Binomial$(H_i,\xi_h)$ variable, with $\xi_h$ denoting the success probability of cluster $h$ and $H_i$
the number of trials associated with sampling unit $i$.
Second, $F(.;\lambda_i)$ may be specified to be the beta-binomial cdf, where $\lambda=(\xi_{1h},\xi_{2h},H_i)^T$.
Joint vectors $(y_i^{*},x_{i})$ are modeled utilizing Gaussian distributions. Then, with $\theta_h$ denoting model parameters
associated with the $h$th cluster, the joint density $f(y_{i},x_{i}|\theta_h)$ takes the form
$$f(y_{i},x_{i}|\theta_h) = \int_{c_{i,y_i-1}}^{c_{i,y_i}} N_{p+1}(y_{i}^{*},x_{i}|\mu_{h},C_h) dy_{i}^{*},$$
where $\mu_h$ and $C_h$ denote the mean vector and covariance matrix, respectively.
The joint distribution of the latent variable $y_i^{*}$ and the covariates $x_{i}$ is
$$(y_{i}^{*},x_{i}^T)^T|\theta_h \sim N_{p+1}\left(
\begin{array}{ll}
\left(
\begin{array}{l}
0 \
\mu_h \
\end{array}
\right),
&
C_h=\left[
\begin{array}{ll}
1 & \nu_h^T \
\nu_h & \Sigma_h \
\end{array}
\right]
\end{array}\right),$$
where $\nu_h$ denotes the vector of covariances cov$(y_{i}^{*},x_{i}|\theta_h)$.
Sampling from the posterior of constrained covariance matrix $C_h$
is done using methods similar to those of McCulloch et al. (2000).
Specifically, the conditional $x_{i}|y_{i}^{*} \sim
N_{p}(\mu_h+y_{i}^{*}\nu_h, B_h = \Sigma_h - \nu_h \nu_h^T)$ simplifies matters as there
are no constraints on matrix $B_h$ (other than positive definiteness).
Given priors for $B_h$ and $\nu_h$, it is easy to sample from their posteriors, and thus obtain samples from the posterior of $\Sigma_h=B_h+\nu_h \nu_h^T$.
Specification of the prior distributions:
Vand
Vdf.Mu.nuandSigma.nuallow the user to change the default values.Mu.muandSigma.muallow the user to change the default values.family="poisson", a priori we take$\xi_{h} \sim$Gamma$(\alpha_{\xi},\beta_{\xi}), h \geq 1$.
The default values are$\alpha_{\xi}=1.0,\beta_{\xi}=0.1$, that define a Gamma distribution
with mean$\alpha_{\xi}/\beta_{\xi}=10$and variance$\alpha_{\xi}/\beta_{\xi}^2=100.$Defaults can be altered using argumentsAlpha.xiandBeta.xi.
For binomial data, withfamily="binomial", a priori we take$\xi_{h} \sim$Beta$(\alpha_{\xi},\beta_{\xi})$,$h \geq 1$.
The default values are$\alpha_{\xi}=1.0,\beta_{\xi}=1.0$, that define a uniform distribution.
Defaults can be altered using argumentsAlpha.xiandBeta.xi.
For both count data withfamily="negative binomial"and binomial data withfamily="beta binomial",
a priori we take$\xi_{jh} \sim$Gamma$(\alpha_{j\xi},\beta_{j\xi})$,$j=1,2, h \geq 1$.
The default values are$\alpha_{j\xi}=1.0,\beta_{j\xi}=0.1$.
Default values for${\alpha_{j\xi}: j=1,2}$can be altered using argumentAlpha.xi,
and default values for${\beta_{j\xi}: j=1,2}$can be altered using argumentBeta.xi.Alpha.alpha,Beta.alphaandTurnc.alpha.# Bayesian nonparametric GLM with Binomial response Y and one predictor X
data(simD)
pred<-seq(with(simD,min(X))+0.1,with(simD,max(X))-0.1,length.out=30)
npred<-length(pred)
# fit1 and fit2 define the same model but with different numbers of
# components and posterior samples
fit1 <- bnpglm(cbind(Y,(E-Y))~X, family="binomial", data=simD, ncomp=30, sweeps=150,
burn=100, Xpred=pred, offsetPred=rep(30,npred))
fit2 <- bnpglm(cbind(Y,(E-Y))~X, family="binomial", data=simD, ncomp=50, sweeps=5000,
burn=1000, Xpred=pred, offsetPred=rep(30,npred))
plot(with(simD,X),with(simD,Y)/with(simD,E))
lines(pred,fit2$medianReg,col=3,lwd=2)Run the code above in your browser using DataLab