bnpglm(formula,family,data,offset,sampler="slice",StorageDir,
ncomp,sweeps,burn,thin=1,seed,prec,V,Vdf,Mu.nu,Sigma.nu,
Mu.mu,Sigma.mu,Alpha.xi,Beta.xi,Alpha.alpha,Beta.alpha,Turnc.alpha,
Xpred,offsetPred,...)y ~ x.sampler="slice" which
implements a slice sampler (Walker, 2007; Papaspiliopoulos, 2008) and
sampler="truncated" which proceeds
sampler="slice" is chosen, ncomp needs to be specified as it is used in the initialization processburn)
and those discarded by the thinning process (see argument thin).family="poisson", this argument is parameter$\alpha_{\xi}$of the prior of the Poisson rate:$\xi \sim$Gamma($\alpha_{\xi},\beta_{\xi}$).famfamily="poisson", this argument is parameter$\beta_{\xi}$of the prior of the Poisson rate:$\xi \sim$Gamma($\alpha_{\xi},\beta_{\xi}$).famiXpred. offsetPred is a vector of
length equal to the rows of Xpred. If family is one of poisson or
<bnpglm returns the following:Xpred is specified, the function returns the posterior mean of the expectation of the response given each new covariate $x$.Xpred is specified, the function returns the posterior mean of the conditional mode of the response given each new covariate $x$.Xpred is specified, the function returns the posterior mean of the conditional 5% quantile of the response given each new covariate $x$.Xpred is specified, the function returns the posterior mean of the conditional 10% quantile of the response given each new covariate $x$.Xpred is specified, the function returns the posterior mean of the conditional 15% quantile of the response given each new covariate $x$.Xpred is specified, the function returns the posterior mean of the conditional 20% quantile of the response given each new covariate $x$.Xpred is specified, the function returns the posterior mean of the conditional 25% quantile of the response given each new covariate $x$.Xpred is specified, the function returns the posterior mean of the conditional 50% quantile of the response given each new covariate $x$.Xpred is specified, the function returns the posterior mean of the conditional 75% quantile of the response given each new covariate $x$.Xpred is specified, the function returns the posterior mean of the conditional 80% quantile of the response given each new covariate $x$.Xpred is specified, the function returns the posterior mean of the conditional 85% quantile of the response given each new covariate $x$.Xpred is specified, the function returns the posterior mean of the conditional 90% quantile of the response given each new covariate $x$.Xpred is specified, the function returns the posterior mean of the conditional 95% quantile of the response given each new covariate $x$.Xpred is specified, the function returns the posterior mean conditional density of the response given each new covariate $x$.
Results are presented in a matrix the rows of which correspond to the different $x$s.Xpred is specified, the function returns the posterior variance of the conditional density of the response given each new covariate $x$.
Results are presented in a matrix the rows of which correspond to the different $x$s.bnpglm creates files where the posterior samples are written. These files are (with all file names
preceded by `BNSP.'):(sweeps-burn)/thin lines and one column, each line including one posterior sample.(sweeps-burn)/thin lines, that represent the posterior samples, and $n$ columns, that represent the sampling units. Entries in this file range from 0 to $ncomp-1$.(sweeps-burn)/thin posterior samples. The columns represent the various covariate values $x$ for which the means are obtained.((sweeps-burn)/thin)*ncomp lines and $p$ columns. In more detail, each sweep creates ncomp lines representing samples $\mu_h^{(sw)}, h=1,\dots,ncomp$, where superscript $sw$ represents a particular sweep. The elements of $\mu_h^{(sw)}$ are written in the columns of the file.(sweeps-burn)/thin lines and ncomp columns, where the lines represent posterior samples while the columns represent the components or clusters. The entries represent the number of sampling units allocated to the components.((sweeps-burn)/thin)*ncomp lines and $p$ columns. In more detail, each sweep creates ncomp lines representing samples $\nu_h^{(sw)}, h=1,\dots,ncomp$, where superscript $sw$ represents a particular sweep. The elements of $\nu_h^{(sw)}$ are written in the columns of the file.(sweeps-burn)/thin posterior samples. The columns represent the various covariate values $x$ for which the quantiles are obtained.(sweeps-burn)/thin posterior samples. The columns represent the various covariate values $x$ for which the quantiles are obtained.(sweeps-burn)/thin posterior samples. The columns represent the various covariate values $x$ for which the quantiles are obtained.(sweeps-burn)/thin posterior samples. The columns represent the various covariate values $x$ for which the quantiles are obtained.(sweeps-burn)/thin posterior samples. The columns represent the various covariate values $x$ for which the quantiles are obtained.(sweeps-burn)/thin posterior samples. The columns represent the various covariate values $x$ for which the quantiles are obtained.(sweeps-burn)/thin posterior samples. The columns represent the various covariate values $x$ for which the quantiles are obtained.(sweeps-burn)/thin posterior samples. The columns represent the various covariate values $x$ for which the quantiles are obtained.(sweeps-burn)/thin posterior samples. The columns represent the various covariate values $x$ for which the quantiles are obtained.(sweeps-burn)/thin posterior samples. The columns represent the various covariate values $x$ for which the quantiles are obtained.(sweeps-burn)/thin posterior samples. The columns represent the various covariate values $x$ for which the quantiles are obtained.((sweeps-burn)/thin)*ncomp lines and $p^2$ columns. In more detail, each sweep creates ncomp lines representing samples $\Sigma_h^{(sw)}, h=1,\dots,ncomp$, where superscript $sw$ represents a particular sweep. The elements of $\Sigma_h^{(sw)}$ are written in the columns of the file: the entries in the first $p$ columns of the file are those in the first column (or row) of $\Sigma_h^{(sw)}$, while the entries in the last $p$ columns of the file are those in the last column (or row) of $\Sigma_h^{(sw)}$.((sweeps-burn)/thin)*ncomp lines and $p^2$ columns. In more detail, each sweep creates ncomp lines representing samples $(\Sigma_h^{-1})^{(sw)}, h=1,\dots,ncomp$, where superscript $sw$ represents a particular sweep. The elements of $(\Sigma_h^{-1})^{(sw)}$ are written in the columns of the file: the entries in the first $p$ columns of the file are those in the first column (or row) of $(\Sigma_h^{-1})^{(sw)}$, while the entries in the last $p$ columns of the file are those in the last column (or row) of $(\Sigma_h^{-1})^{(sw)}$.((sweeps-burn)/thin)*ncomp lines and $p^2$ columns. In more detail, each sweep creates ncomp lines representing samples $T_h^{(sw)}, h=1,\dots,ncomp$, where superscript $sw$ represents a particular sweep. The elements of $T_h^{(sw)}$ are written in the columns of the file: the entries in the first $p$ columns of the file are those in the first column (or row) of $T_h^{(sw)}$, while the entries in the last $p$ columns of the file are those in the last column (or row) of $T_h^{(sw)}$.((sweeps-burn)/thin)*ncomp lines and one or two columns, depending on the number of parameters in the selected $F(.;\lambda)$. Sweeps write in the file ncomp lines representing samples $\xi_h^{(sw)}, h=1,\dots,ncomp$, where superscript $sw$ represents a particular sweep.(sweeps-burn)/thin lines with the number of components updated at each iteration of the sampler.bnpglm returns samples from the posterior distributions of the parameters of the model:
$$f(y_i,x_i) = \sum_{h=1}^{\infty} \pi_h f(y_i,x_i|\theta_h), \hspace{80pt} (1)$$
where $y_i$ is a univariate discrete response,
$x_i$ is a $p$-dimensional vector of continuous covariates, and $\pi_h, h \geq 1,$ are obtained according to
Sethuraman's (1994) stick-breaking construction:
$\pi_1 = v_1$, and for $l \geq 2, \pi_l = v_l \prod_{j=1}^{l-1} (1-v_j)$, where $v_k$ are iid samples
$v_k \sim$Beta $(1,\alpha), k \geq 1.$
The discrete responses $y_i$ are represented as discretized versions of continuous latent variables $y_i^*$.
Observed discrete and continuous latent variables are connected by:
$$y_{i} = q \iff c_{i,q-1} < y^*_{i} < c_{i,q}, q=0,1,2,\dots,$$
where the cut-points are obtained as: $c_{i,-1} = -\infty$,
while for $q \geq 0$, $c_{i,q} = c_{q}(\lambda_{i}) = \Phi^{-1}{F(q;\lambda_i)}.$ Here $\Phi(.)$
is the cumulative distribution function (cdf) of a standard normal variable
and $F()$ denotes an appropriate cdf. Further, latent variables are assumed to
independently follow a $N(0,1)$ distribution, where the mean and variance are restricted to be zero and one as
they are non-identifiable by the data. Choices for $F()$ are described next.
For counts, currently six options are supported. First, $F(.;\lambda_i)$ can be specified as the
cdf of a Poisson$(H_i \xi_h)$ variable. Here $\lambda_i=(\xi_h,H_i)^T, \xi_h$ denotes the Poisson rate
associated with cluster $h$, and $H_i$ the offset term associated with sampling unit $i$.
Second, $F(.;\lambda_i)$ can be specified as the negative binomial cdf, where $\lambda_i=
(\xi_{1h},\xi_{2h},H_i)^T$. This option allows for overdispersion within each cluster relative to the
Poisson distribution. Third, $F(.;\lambda_i)$ can be specified as the Generalized Poisson cdf, where, again,
$\lambda_i=(\xi_{1h},\xi_{2h},H_i)^T$. This option allows for both over- and under-dispersion within each
cluster. The other three options, that also allow for both
over- and under-dispersion relative to the Poisson distribution, are the Hyper Poisson (HP), COM-Poisson
and the Complex Triparametric Pearson (CTP) kernels. The HP and COM-Poisson kernels have 2 parameters
and the CTPD kernel has 3 parameters.
For Binomial data, currently two options are supported. First, $F(.;\lambda_i)$ may be taken to be the cdf of a
Binomial$(H_i,\xi_h)$ variable, where $\xi_h$ denotes the success probability of cluster $h$ and $H_i$
the number of trials associated with sampling unit $i$.
Second, $F(.;\lambda_i)$ may be specified to be the beta-binomial cdf, where $\lambda=(\xi_{1h},\xi_{2h},H_i)^T$.
Details on all kernels are provided in the tables below. The first table provides the probability mass functions
and the mean in the presence of an offset term (which may be taken to be one). The column `Sample' indicates
for which parameters the routine provides posterior samples. The second table provides information on the assumed priors
along with the default values of the parameters of the prior distributions and it also indicates the
function arguments that allow the user to alter these. Lastly, the third tables provides some details
on the less frequently used kernels.
Vand
Vdf.Mu.nuandSigma.nuallow the user to change the default values.Mu.muandSigma.muallow the user to change the default values.family="poisson", a priori we take$\xi_{h} \sim$Gamma$(\alpha_{\xi},\beta_{\xi}), h \geq 1$.
The default values are$\alpha_{\xi}=1.0,\beta_{\xi}=0.1$, that define a Gamma distribution
with mean$\alpha_{\xi}/\beta_{\xi}=10$and variance$\alpha_{\xi}/\beta_{\xi}^2=100.$Defaults can be altered using argumentsAlpha.xiandBeta.xi.
For count data withfamily="negative binomial"a priori we take$\xi_{jh} \sim$Gamma$(\alpha_{j\xi},\beta_{j\xi})$,$j=1,2, h \geq 1$.
The default values are$\alpha_{j\xi}=1.0,\beta_{j\xi}=0.1$,$j=1,2$.
Default values for${\alpha_{j\xi}: j=1,2}$can be altered using argumentAlpha.xi,
and default values for${\beta_{j\xi}: j=1,2}$can be altered using argumentBeta.xi.
For count data withfamily="generalized poisson",
a priori we take$\xi_{1h} \sim$Gamma$(\alpha_{1\xi},\beta_{1\xi})$,
and$\xi_{2h} \sim$Normal$(\alpha_{2\xi},\beta_{2\xi})I[\xi_{2h} \in R_{\xi_{2}}]$.
The default values are$\alpha_{j\xi}=1.0, j=1,2$and$\beta_{1\xi}=0.1, \beta_{2\xi}=1.0$.
Default values for${\alpha_{j\xi}: j=1,2}$can be altered using argumentAlpha.xi,
and default values for${\beta_{j\xi}: j=1,2}$can be altered using argumentBeta.xi.
For count data withfamily="hyper-poisson"a priori we take$\xi_{jh} \sim$Gamma$(\alpha_{j\xi},\beta_{j\xi})$,$j=1,2, h \geq 1$.
The default values are$\alpha_{1\xi}=1.0, \alpha_{2\xi}=0.5$and$\beta_{1\xi}=0.1, \beta_{2\xi}=0.5$.
Default values for${\alpha_{j\xi}: j=1,2}$can be altered using argumentAlpha.xi,
and default values for${\beta_{j\xi}: j=1,2}$can be altered using argumentBeta.xi.
For count data withfamily="ctpd",
a priori we take$\xi_{1h} \sim$Gamma$(\alpha_{1\xi},\beta_{1\xi})$,$\xi_{2h} \sim$Gamma$(\alpha_{2\xi},\beta_{2\xi})$and$\xi_{3h} \sim$Normal$(\alpha_{3\xi},\beta_{3\xi})I[\xi_{3h} \in R_{\xi_{3}}]$.
The default values are$\alpha_{1\xi}=1.0, \alpha_{2\xi}=1.0, \alpha_{3\xi}=0.0$and$\beta_{1\xi}=0.1, \beta_{2\xi}=0.1, \beta_{3\xi}=100.0$.
Default values for${\alpha_{j\xi}: j=1,2}$can be altered using argumentAlpha.xi,
and default values for${\beta_{j\xi}: j=1,2}$can be altered using argumentBeta.xi.
For count data withfamily="com-poisson"a priori we take$\xi_{jh} \sim$Gamma$(\alpha_{j\xi},\beta_{j\xi})$,$j=1,2, h \geq 1$.
The default values are$\alpha_{1\xi}=1.0, \alpha_{2\xi}=0.5$and$\beta_{1\xi}=0.1, \beta_{2\xi}=0.5$.
Default values for${\alpha_{j\xi}: j=1,2}$can be altered using argumentAlpha.xi,
and default values for${\beta_{j\xi}: j=1,2}$can be altered using argumentBeta.xi.
For binomial data, withfamily="binomial", a priori we take$\xi_{h} \sim$Beta$(\alpha_{\xi},\beta_{\xi})$,$h \geq 1$.
The default values are$\alpha_{\xi}=1.0,\beta_{\xi}=1.0$, that define a uniform distribution.
Defaults can be altered using argumentsAlpha.xiandBeta.xi.
For binomial data withfamily="beta binomial",
a priori we take$\xi_{jh} \sim$Gamma$(\alpha_{j\xi},\beta_{j\xi})$,$j=1,2, h \geq 1$.
The default values are$\alpha_{j\xi}=1.0,\beta_{j\xi}=0.1$.
Default values for${\alpha_{j\xi}: j=1,2}$can be altered using argumentAlpha.xi,
and default values for${\beta_{j\xi}: j=1,2}$can be altered using argumentBeta.xi.Alpha.alpha,Beta.alphaandTurnc.alpha.# Bayesian nonparametric GLM with Binomial response Y and one predictor X
data(simD)
pred<-seq(with(simD,min(X))+0.1,with(simD,max(X))-0.1,length.out=30)
npred<-length(pred)
# fit1 and fit2 define the same model but with different numbers of
# components and posterior samples. They both use a slice sampler
# and parameter prec=200 achieves optimal acceptance rate, about 22%.
fit1 <- bnpglm(cbind(Y,(E-Y))~X, family="binomial", data=simD, ncomp=30, sweeps=150,
burn=100, sampler="slice", prec1=c(200), Xpred=pred, offsetPred=rep(30,npred))
fit2 <- bnpglm(cbind(Y,(E-Y))~X, family="binomial", data=simD, ncomp=50, sweeps=5000,
burn=1000, sampler="slice", prec=c(200), Xpred=pred, offsetPred=rep(30,npred))
plot(with(simD,X),with(simD,Y)/with(simD,E))
lines(pred,fit2$medianReg,col=3,lwd=2)Run the code above in your browser using DataLab