bnpglm(formula,family,data,offset,sampler="slice",StorageDir,
ncomp,sweeps,burn,thin=1,seed,prec,V,Vdf,Mu.nu,Sigma.nu,
Mu.mu,Sigma.mu,Alpha.xi,Beta.xi,Alpha.alpha,Beta.alpha,Turnc.alpha,
Xpred,offsetPred,...)y ~ x.sampler="slice" which
implements a slice sampler (Walker, 2007; Papaspiliopoulos, 2008) and
sampler="truncated" which proceeds
sampler="slice" is chosen, ncomp needs to be specified as it is used in the initialization processburn)
and those discarded by the thinning process (see argument thin).family="poisson", this argument is parameter$\alpha_{\xi}$of the prior of the Poisson rate:$\xi \sim$Gamma($\alpha_{\xi},\beta_{\xi}$).famfamily="poisson", this argument is parameter$\beta_{\xi}$of the prior of the Poisson rate:$\xi \sim$Gamma($\alpha_{\xi},\beta_{\xi}$).familXpred. offsetPred is a vector of
length equal to the rows of Xpred. If family is one of poisson or
<bnpglm returns the following:Xpred is specified, the function returns the conditional expectation of the response given each new covariate $x$.Xpred is specified, the function returns the conditional median of the response given each new covariate $x$.Xpred is specified, the function returns the conditional first quantile of the response given each new covariate $x$.Xpred is specified, the function returns the conditional third quantile of the response given each new covariate $x$.Xpred is specified, the function returns the conditional mode of the response given each new covariate $x$.bnpglm creates files where the posterior samples are written. These files are (with all file names
preceded by `BNSP.'):((sweeps-burn)/thin)*ncomp lines and $p^2$ columns. In more detail, each sweep creates ncomp lines representing samples $T_h^{(sw)}, h=1,\dots,ncomp$, where superscript $sw$ represents a particular sweep. The elements of
$T_h^{(sw)}$ are written in the columns of the file: the entries in the first $p$ columns of the file are those in the first column (or row) of $T_h^{(sw)}$, while the entries in the last $p$ columns of the file are those in the last column (or row) of $T_h^{(sw)}$.((sweeps-burn)/thin)*ncomp lines and $p^2$ columns. In more detail, each sweep creates ncomp lines representing samples $\Sigma_h^{(sw)}, h=1,\dots,ncomp$, where superscript $sw$ represents a particular sweep. The elements of
$\Sigma_h^{(sw)}$ are written in the columns of the file: the entries in the first $p$ columns of the file are those in the first column (or row) of $\Sigma_h^{(sw)}$, while the entries in the last $p$ columns of the file are those in the last column (or row) of $\Sigma_h^{(sw)}$.((sweeps-burn)/thin)*ncomp lines and $p^2$ columns. In more detail, each sweep creates ncomp lines representing samples $(\Sigma_h^{-1})^{(sw)}, h=1,\dots,ncomp$, where superscript $sw$ represents a particular sweep. The elements of $(\Sigma_h^{-1})^{(sw)}$ are written in the columns of the file: the entries in the first $p$ columns of the file are those in the first column (or row) of $(\Sigma_h^{-1})^{(sw)}$, while the entries in the last $p$ columns of the file are those in the last column (or row) of $(\Sigma_h^{-1})^{(sw)}$.((sweeps-burn)/thin)*ncomp lines and $p$ columns. In more detail, each sweep creates ncomp lines representing samples $\nu_h^{(sw)}, h=1,\dots,ncomp$, where superscript $sw$ represents a particular sweep. The elements of $\nu_h^{(sw)}$ are written in the columns of the file.((sweeps-burn)/thin)*ncomp lines and $p$ columns. In more detail, each sweep creates ncomp lines representing samples $\mu_h^{(sw)}, h=1,\dots,ncomp$, where superscript $sw$ represents a particular sweep. The elements of $\mu_h^{(sw)}$ are written in the columns of the file.((sweeps-burn)/thin)*ncomp lines and one or two columns, depending on the number of parameters in the selected $F(.;\lambda)$. Sweeps write in the file ncomp lines representing samples $\xi_h^{(sw)}, h=1,\dots,ncomp$, where superscript $sw$ represents a particular sweep.(sweeps-burn)/thin lines and one column, each line including one posterior sample.(sweeps-burn)/thin lines, that represent the posterior samples, and $n$ columns, that represent the sampling units. Entries in this file range from 0 to $ncomp-1$.(sweeps-burn)/thin lines and ncomp columns, where the lines represent posterior samples while the columns represent the components or clusters. The entries represent the number of sampling units allocated to the components.(sweeps-burn)/thin lines with the number of components updated at each iteration of the sampler.bnpglm returns samples from the posterior distributions of the parameters of the model:
$$f(y_i,x_i) = \sum_{h=0}^{\infty} \pi_h f(y_i,x_i|\theta_h), \hspace{80pt} (1)$$
where $y_i$ is a univariate discrete response,
$x_i$ is a $p$-dimensional vector of continuous covariates, and $\pi_h, h \geq 1,$ are obtained according to
Sethuraman's (1994) stick-breaking construction:
$\pi_1 = v_1$, and for $l \geq 2, \pi_l = v_l \prod_{j=1}^{l-1} (1-v_j)$, where $v_k$ are iid samples
$v_k \sim$Beta $(1,\alpha), k \geq 1.$
The discrete responses $y_i$ are represented as discretized versions of continuous latent variables $y_i^*$.
Observed discrete and continuous latent variables are connected by:
$$y_{i} = q \iff c_{i,q-1} < y^*_{i} < c_{i,q},$$
where the cut-points are obtained as: $c_{i,-1} = -\infty$,
while for $q \geq 0$, $c_{i,q} = c_{q}(\lambda_{i}) = \Phi^{-1}{F(q;\lambda_i)}.$ Here $\Phi(.)$
is the cumulative distribution function (cdf) of a standard normal variable
and $F()$ denotes an appropriate cdf. Further, latent variables are assumed to
independently follow a $N(0,1)$ distribution, where the mean and variance are restricted to be zero and one as
they are non-identifiable by the data. Choices for $F()$ are described next.
For counts, currently three options are supported. First, $F(.;\lambda_i)$ can be specified as the
cdf of a Poisson$(H_i \xi_h)$ variable. Here $\lambda_i=(\xi_h,H_i)^T, \xi_h$ denotes the Poisson rate
associated with cluster $h$, and $H_i$ the offset term associated with sampling unit $i$.
Second, $F(.;\lambda_i)$ can be specified as the negative binomial cdf, where $\lambda=(\xi_{1h},\xi_{2h},H_i)^T$.
This option allows for overdispersion within each cluster. With this added flexibility the model usually requires
fewer effective clusters. Third, $F(.;\lambda_i)$ can be specified as the Generalized Poisson cdf, where, again,
$\lambda=(\xi_{1h},\xi_{2h},H_i)^T$. This option allows for both over- and under-dispersion within each
cluster. See Consul and Famoye (1992) for more details.
For Binomial data, currently two options are supported. First, $F(.;\lambda_i)$ may be taken to be the cdf of a
Binomial$(H_i,\xi_h)$ variable, with $\xi_h$ denoting the success probability of cluster $h$ and $H_i$
the number of trials associated with sampling unit $i$.
Second, $F(.;\lambda_i)$ may be specified to be the beta-binomial cdf, where $\lambda=(\xi_{1h},\xi_{2h},H_i)^T$.
Joint vectors $(y_i^{*},x_{i})$ are modeled utilizing Gaussian distributions. Then, with $\theta_h$ denoting model parameters
associated with the $h$th cluster, the joint density $f(y_{i},x_{i}|\theta_h)$ takes the form
$$f(y_{i},x_{i}|\theta_h) = \int_{c_{i,y_i-1}}^{c_{i,y_i}} N_{p+1}(y_{i}^{*},x_{i}|\mu_{h},C_h) dy_{i}^{*},$$
where $\mu_h$ and $C_h$ denote the mean vector and covariance matrix, respectively.
The joint distribution of the latent variable $y_i^{*}$ and the covariates $x_{i}$ is
$$(y_{i}^{*},x_{i}^T)^T|\theta_h \sim N_{p+1}\left(
\begin{array}{ll}
\left(
\begin{array}{l}
0 \
\mu_h \
\end{array}
\right),
&
C_h=\left[
\begin{array}{ll}
1 & \nu_h^T \
\nu_h & \Sigma_h \
\end{array}
\right]
\end{array}\right),$$
where $\nu_h$ denotes the vector of covariances cov$(y_{i}^{*},x_{i}|\theta_h)$.
Sampling from the posterior of constrained covariance matrix $C_h$
is done using methods similar to those of McCulloch et al. (2000).
Specifically, the conditional $x_{i}|y_{i}^{*} \sim
N_{p}(\mu_h+y_{i}^{*}\nu_h, B_h = \Sigma_h - \nu_h \nu_h^T)$ simplifies matters as there
are no constraints on matrix $B_h$ (other than positive definiteness).
Given priors for $B_h$ and $\nu_h$, it is easy to sample from their posteriors, and thus obtain samples from the posterior of $\Sigma_h=B_h+\nu_h \nu_h^T$.
Specification of the prior distributions:
Vand
Vdf.Mu.nuandSigma.nuallow the user to change the default values.Mu.muandSigma.muallow the user to change the default values.family="poisson", a priori we take$\xi_{h} \sim$Gamma$(\alpha_{\xi},\beta_{\xi}), h \geq 1$.
The default values are$\alpha_{\xi}=1.0,\beta_{\xi}=0.1$, that define a Gamma distribution
with mean$\alpha_{\xi}/\beta_{\xi}=10$and variance$\alpha_{\xi}/\beta_{\xi}^2=100.$Defaults can be altered using argumentsAlpha.xiandBeta.xi.
For binomial data, withfamily="binomial", a priori we take$\xi_{h} \sim$Beta$(\alpha_{\xi},\beta_{\xi})$,$h \geq 1$.
The default values are$\alpha_{\xi}=1.0,\beta_{\xi}=1.0$, that define a uniform distribution.
Defaults can be altered using argumentsAlpha.xiandBeta.xi.
For both count data withfamily="negative binomial"and binomial data withfamily="beta binomial",
a priori we take$\xi_{jh} \sim$Gamma$(\alpha_{j\xi},\beta_{j\xi})$,$j=1,2, h \geq 1$.
The default values are$\alpha_{j\xi}=1.0,\beta_{j\xi}=0.1$.
Default values for${\alpha_{j\xi}: j=1,2}$can be altered using argumentAlpha.xi,
and default values for${\beta_{j\xi}: j=1,2}$can be altered using argumentBeta.xi.
For count data withfamily="generalized poisson",
a priori we take$\xi_{1h} \sim$Gamma$(\alpha_{1\xi},\beta_{1\xi})$,
and$\xi_{2h} \sim$Normal$(\alpha_{2\xi},\beta_{2\xi})$.
The default values are$\alpha_{j\xi}=1.0,\beta_{j\xi}=0.1$.
Default values for${\alpha_{j\xi}: j=1,2}$can be altered using argumentAlpha.xi,
and default values for${\beta_{j\xi}: j=1,2}$can be altered using argumentBeta.xi.Alpha.alpha,Beta.alphaandTurnc.alpha.# Bayesian nonparametric GLM with Binomial response Y and one predictor X
data(simD)
pred<-seq(with(simD,min(X))+0.1,with(simD,max(X))-0.1,length.out=30)
npred<-length(pred)
# fit1 and fit2 define the same model but with different numbers of
# components and posterior samples. They both use a slice sampler
# and parameter prec=200 achieves optimal acceptance rate, about 22%.
fit1 <- bnpglm(cbind(Y,(E-Y))~X, family="binomial", data=simD, ncomp=30, sweeps=150,
burn=100, sampler="slice", prec=c(200), Xpred=pred, offsetPred=rep(30,npred))
fit2 <- bnpglm(cbind(Y,(E-Y))~X, family="binomial", data=simD, ncomp=50, sweeps=5000,
burn=1000, sampler="slice", prec=c(200), Xpred=pred, offsetPred=rep(30,npred))
plot(with(simD,X),with(simD,Y)/with(simD,E))
lines(pred,fit2$medianReg,col=3,lwd=2)Run the code above in your browser using DataLab