bbnam: Butts' (Hierarchical) Bayesian Network Accuracy Model

Description

Takes posterior draws from Butts' bayesian network accuracy/estimation model for multiple participant/observers (conditional on observed data and priors), using a Gibbs sampler.

Usage

bbnam(dat, model="actor", ...)
bbnam.fixed(dat, nprior=matrix(rep(0.5,dim(dat)[2]^2),
    nrow=dim(dat)[2],ncol=dim(dat)[2]), em=0.25, ep=0.25, diag=FALSE,
    mode="digraph", draws=1500, outmode="draws", anames=paste("a",
    1:dim(dat)[2],sep=""), onames=paste("o",1:dim(dat)[1], sep=""))
bbnam.pooled(dat, nprior=matrix(rep(0.5,dim(dat)[2]*dim(dat)[3]),
    nrow=dim(dat)[2],ncol=dim(dat)[3]), emprior=c(1,1), 
    epprior=c(1,1), diag=FALSE, mode="digraph", reps=5, draws=1500, 
    burntime=500, quiet=TRUE, anames=paste("a",1:dim(dat)[2],sep=""),
    onames=paste("o",1:dim(dat)[1],sep=""), compute.sqrtrhat=TRUE)
bbnam.actor(dat, nprior=matrix(rep(0.5,dim(dat)[2]*dim(dat)[3]),
    nrow=dim(dat)[2],ncol=dim(dat)[3]), 
    emprior=cbind(rep(1,dim(dat)[1]),rep(1,dim(dat)[1])), 
    epprior=cbind(rep(1,dim(dat)[1]),rep(1,dim(dat)[1])), diag=FALSE,
    mode="digraph", reps=5, draws=1500, burntime=500, quiet=TRUE, 
    anames=paste("a",1:dim(dat)[2],sep=""), 
    onames=paste("o",1:dim(dat)[1],sep=""), compute.sqrtrhat=TRUE)

Arguments

dat

Data array to be analyzed. This array must be of dimension n x n x n, where n is |V(G)|, the first dimension indexes the observer, the second indexes the sender of the relation, and the third dimension indexes the recipient of the relation. (E.g.,

model

String containing the error model to use; options are ``actor,'' ``pooled,'' and ``fixed''

nprior

Network prior matrix. This must be a matrix of dimension n x n, containing the arc/edge priors for the criterion network. (E.g., nprior[i,j] gives the prior probability of i sending the relation to j in the criterion graph.) If no network

Probability of a false negative; this may be in the form of a single number, one number per observation slice, one number per (directed) dyad, or one number per dyadic observation (fixed model only)

Probability of a false positive; this may be in the form of a single number, one number per observation slice, one number per (directed) dyad, or one number per dyadic observation (fixed model only)

emprior

Parameters for the (beta) false negative prior; these should be in the form of an (alpha,beta) pair for the pooled model, and of an n x 2 matrix of (alpha,beta) pairs for the actor model. If no emprior is given, an uninformative prior (1,1) will be assume

epprior

Parameters for the (beta) false positive prior; these should be in the form of an (alpha,beta) pair for the pooled model, and of an n x 2 matrix of (alpha,beta) pairs for the actor model. If no epprior is given, an uninformative prior (1,1) will be assume

diag

Boolean indicating whether loops (matrix diagonals) should be counted as data

mode

A string indicating whether the data in question forms a ``graph'' or a ``digraph''

reps

Number of replicate chains for the Gibbs sampler (pooled and actor models only)

draws

Integer indicating the total number of draws to take from the posterior distribution. Draws are taken evenly from each replication (thus, the number of draws from a given chain is draws/reps), and are randomly reordered to minimize dependence associated

burntime

Integer indicating the burn-in time for the Markov Chain. Each replication is iterated burntime times before taking draws (with these initial iterations being discarded); hence, one should realize that each increment to burntime increases execution time

quiet

Boolean indicating whether MCMC diagnostics should be displayed (pooled and actor models only)

outmode

``posterior'' indicates that the exact posterior probability matrix for the criterion graph should be returned, otherwise draws from the joint posterior are returned instead (fixed model only)

anames

A vector of names for the actors (vertices) in the graph

onames

A vector of names for the observers (possibly the actors themselves) whose reports are contained in the CSS

compute.sqrtrhat

A boolean indicating whether or not Gelman et al.'s potential scale reduction measure (an MCMC convergence diagnostic) should be computed (pooled and actor models only)

Value

An object of class bbnam, containing the posterior draws. The components of the output are as follows:
anamesA vector of actor names.
drawsAn integer containing the number of draws.
emA matrix containing the posterior draws for probability of producing false negatives, by actor.
epA matrix containing the posterior draws for probability of producing false positives, by actor.
nactorsAn integer containing the number of actors.
netAn array containing the posterior draws for the criterion network.
repsAn integer indicating the number of replicate chains used by the Gibbs sampler.

Details

The bbnam models a set of network data as reflecting a series of (noisy) observations by a set of participant/observers regarding an uncertain criterion structure. Each observer is assumed to send false positives (i.e., reporting a tie when none exists in the criterion structure) with probability $e^+$, and false negatives (i.e., reporting that no tie exists when one does in fact exist in the criterion structure) with probability $e^-$. The criterion network itself is taken to be a Bernoulli (di)graph. Note that the present model includes three variants:

Fixed error probabilities: Each edge is associated with a known pair of false negative/false positive error probabilities (provided by the researcher). In this case, the posterior for the criterion graph takes the form of a matrix of Bernoulli parameters, with each edge being independent conditional on the parameter matrix.
Pooled error probabilities: One pair of (uncertain) false negative/false positive error probabilities is assumed to hold for all observations. Here, we assume that the researcher's prior information regarding these parameters can be expressed as a pair of Beta distributions, with the additional assumption of independence in the prior distribution. Note that error rates and edge probabilities arenotindependent in the joint posterior, but the posterior marginals take the form of Beta mixtures and Bernoulli parameters, respectively.
Per observer (``actor'') error probabilities: One pair of (uncertain) false negative/false positive error probabilities is assumed to hold for each observation slice. Again, we assume that prior knowledge can be expressed in terms of independent Beta distributions (along with the Bernoulli prior for the criterion graph) and the resulting posterior marginals are Beta mixtures and a Bernoulli graph. (Again, it should be noted that independence in the priors doesnotimply independence in the joint posterior!)

By default, the bbnam routine returns (approximately) independent draws from the joint posterior distribution, each draw yielding one realization of the criterion network and one collection of accuracy parameters (i.e., probabilities of false positives/negatives). This is accomplished via a Gibbs sampler in the case of the pooled/actor model, and by direct sampling for the fixed probability model. In the special case of the fixed probability model, it is also possible to obtain directly the posterior for the criterion graph (expressed as a matrix of Bernoulli parameters); this can be controlled by the outmode parameter.

As noted, the taking of posterior draws in the nontrivial case is accomplished via a Markov Chain Monte Carlo method, in particular the Gibbs sampler; the high dimensionality of the problem ($O(n^2+2n)$) tends to preclude more direct approaches. At present, chain burn-in is determined ex ante on a more or less arbitrary basis by specification of the burntime parameter. Eventually, a more systematic approach will be utilized. Note that insufficient burn-in will result in inaccurate posterior sampling, so it's not wise to skimp on burn time where otherwise possible. Similarly, it is wise to employ more than one Markov Chain (set by reps), since it is possible for trajectories to become ``trapped'' in metastable regions of the state space. Number of draws per chain being equal, more replications are usually better than few; consult Gelman et al. for details. A useful measure of chain convergence, Gelman and Rubin's potential scale reduction ($\sqrt{\hat{R}}$), can be computed using the compute.sqrtrhat parameter. The potential scale reduction measure is an ANOVA-like comparison of within-chain versus between-chain variance; it approaches 1 (from above) as the chain converges, and longer burn-in times are strongly recommended for chains with scale reductions in excess of 1.1 or thereabouts.

Finally, a cautionary concerning prior distributions: it is important that the specified priors actually reflect the prior knowledge of the researcher; otherwise, the posterior will be inadequately informed. In particular, note that an uninformative prior on the accuracy probabilities implies that it is a priori equally probable that any given actor's observations will be informative or negatively informative (i.e., that i observing j sending a tie to k reduces p(j->k)). This is a highly unrealistic assumption, and it will tend to produce posteriors which are bimodal (one mode being related to the ``informative'' solution, the other to the ``negatively informative'' solution). A more plausible but still fairly diffuse prior would be Beta(3,5), which reduces the prior probability of an actor's being negatively informative to 0.16, and the prior probability of any given actor's being more than 50% likely to make a particular error (on average) to around 0.22. (This prior also puts substantial mass near the 0.5 point, which would seem consonant with the BKS studies.) Butts(1999) discusses a number of issues related to choice of priors for the bbnam, and users should consult this reference if matters are unclear before defaulting to the uninformative solution.

References

Butts, C.T. (1999). ``Informant (In)Accuracy and Network Estimation: A Bayesian Approach.'' CASOS Working Paper, Carnegie Mellon University.

Gelman, A.; Carlin, J.B.; Stern, H.S.; and Rubin, D.B. (1995). Bayesian Data Analysis. London: Chapman and Hall.

Gelman, A., and Rubin, D.B. (1992). ``Inference from Iterative Simulation Using Multiple Sequences.'' Statistical Science, 7, 457-511.

Krackhardt, D. (1987). ``Cognitive Social Structures.'' Social Networks, 9, 109-134.