pBayes(x, method="m.ind", const=NULL)method = "Jeffreys", consists in adding 0.5 to each cell before estimation of the relative frequencies, in particular the final relative frequencies are derived by means of $ep_i = (n_i + 0.5)/(n + c/2)$, being $n$ the sum of all the counts and $c$ the number of cells in the table. The final estimated cell frequencies are obtained as $en_i=ep_i*n$.
method = "minimax", consists in adding $sqrt(n)/c$ to each cell before estimation of the relative frequencies, in particular the final relative frequencies are derived by means of $ep_i = (n_i + sqrt(n)/c)/(n + sqrt(n))$.
method = "invcat", consists in adding $1/c$ to each cell before estimation of the relative frequencies, in particular the final relative frequencies are derived by means of $ep_i = (n_i + 1/c)/(n + 1)$.
method = "user", consists in adding a used defined constant $a$ ($a>0$) to each cell before estimation of the relative frequencies with $ep_i = (n_i + a)/(n + a*c)$. The constant $a$ should be passed via the argument const.
method = "m.ind", the prior guess for the unknown cell probabilities is obtained by considering estimated probabilities under the mutual independence hypothesis. In such a case a data driven $K$ is considered (see Details). This option is available when dealing with at least two-way contingency tables (length(dim(x))>=2).
method = "h.assoc", the prior guess for the unknown cell probabilities is obtained by considering estimated probabilities under the homogeneous association hypothesis. In such a case a data drive $K$ is considered (see Details). This option is available when dealing with at least two-way contingency tables (length(dim(x))>=2).
method = "user". As a general rule of thumb it is preferable to avoid that the sum of constant over all the cells in table is greater than $0.20*n$.
list object with three components."n", the number of cells ("no.cells") in x, the average cell frequency ("av.cfr"), the number of cells showing frequencies equal to zero ("no.0s"), the const input argument, the chosen/estimated $K$ ("K") and the relative size of $K$, i.e. $K/(n+K)$ ("rel.K").x with the considered prior values for the cell frequencies.x providing the pseudo-Bayes estimates for the cell frequencies in x.$$\tilde{p}_h = \frac{n}{n+K} \hat{p}_h + \frac{K}{n+K} \gamma_h $$
$K$ depends on the parameters of Dirichilet prior distribution being considered (for major details see Chapter 12 in Bishop et al., 1974).
It is worth noting that with a constant prior guess $gamma_h=1/c$ ($h=1,2,...,c$), setting $K=1$ corresponds to adding $1/c$ to each cell before estimation of the relative frequencies (method = "invcat"); with $K=c/2$ the constant 0.5 is added to each cell (method = "Jeffreys"); finally when $sqrt(n)$ the quantity $sqrt(n)/c$ is added to each cell (method = "minimax"). All these cases corresponds to adding a flattening constant; the higher is the value of $K$ the more the estimates will be shrinked towards $gamma_h=1/c$ (flattening).
When method = "m.ind" the prior guess $gamma_h$ is estimated under the hypothesis of mutual independence between the variables crossed in the initial contingency table x, supposed to be at least a two-way table. In this case the value of $K$ is estimated via a data driven approach by considering
$$ \hat{K} = \frac{1 - \sum_{h} \hat{p}_h^2}{\sum_{h} \left( \hat{\gamma}_h - \hat{p}_h \right)^2 } $$
On the contrary, when method = "h.assoc" the prior guess $gamma_h$ is estimated under the hypothesis of homogeneous association between the variables crossed in the initial contingency table x.
Please note that when the input table is estimated from sample data where a weight is assigned to each unit, the weights should e used in estimation of the input table but it is suggested to rescale them so that they sum up to n, the sample size.
Bishop Y.M.M., Fienberg, S.E., Holland, P.W. (1974) Discrete Multivariate Analysis: Theory and Practice. The Massachusetts Institute of Technology
data(samp.A, package="StatMatch")
tab <- xtabs(~ area5 + urb + c.age + sex + edu7, data = samp.A)
out.pb <- pBayes(x=tab, method="m.ind")
out.pb$info
out.pb <- pBayes(x=tab, method="h.assoc")
out.pb$info
out.pb <- pBayes(x=tab, method="Jeffreys")
out.pb$info
# usage of weights in estimating the input table
n <- nrow(samp.A)
r.w <- samp.A$ww / sum(samp.A$ww) * n # rescale weights to sum up to n
tab.w <- xtabs(r.w ~ area5 + urb + c.age + sex + edu7, data = samp.A)
out.pbw <- pBayes(x=tab.w, method="m.ind")
out.pbw$info
Run the code above in your browser using DataLab