ergm: Exponential-Family Random Graph Models

Description

ergm is used to fit exponential-family random graph models (ERGMs), in which the probability of a given network, $y$, on a set of nodes is $h(y) \exp\{\eta(\theta) \cdot g(y)\}/c(\theta)$, where $h(y)$ is the reference measure (usually $h(y)=1$), $g(y)$ is a vector of network statistics for $y$, $\eta(\theta)$ is a natural parameter vector of the same length (with $\eta(\theta)=\theta$ for most terms), and $c(\theta)$ is the normalizing constant for the distribution. ergm can return a maximum pseudo-likelihood estimate, an approximate maximum likelihood estimate based on a Monte Carlo scheme, or an approximate contrastive divergence estimate based on a similar scheme. (For an overview of the package, see ergm-package.)

Usage

ergm (formula, response=NULL, reference=~Bernoulli, constraints=~., offset.coef=NULL, target.stats=NULL, eval.loglik=TRUE, estimate=c("MLE", "MPLE", "CD"), control=control.ergm(), verbose=FALSE, ...)

Arguments

formula

An R formula object, of the form y ~ , where y is a network object or a matrix that can be coerced to a network object. For the details on the possible , see ergm-terms and Morris, Handcock and Hunter (2008) for binary ERGM terms and Krivitsky (2012) for valued ERGM terms (terms for weighted edges). To create a network object in R, use the network() function, then add nodal attributes to it using the %v% operator if necessary. Enclosing a model term in offset() fixes its value to one specified in offset.coef.

response

Name of the edge attribute whose value is to be modeled. Defaults to NULL for simple presence or absence, modeled via binary ERGM terms. Passing anything but NULL uses valued ERGM terms.

reference

A one-sided formula specifying the reference measure ($h(y)$) to be used. (Defaults to ~Bernoulli.) See help for ERGM reference measures implemented in the ergm package.

constraints

A one-sided formula specifying one or more constraints on the support of the distribution of the networks being modeled, using syntax similar to the formula argument. Multiple constraints may be given, separated by “+” operators. Together with the model terms in the formula and the reference measure, the constraints define the distribution of networks being modeled.

It is also possible to specify a proposal function directly by passing a string with the function's name. In that case, arguments to the proposal should be specified through the prop.args argument to control.ergm.

The default is ~., for an unconstrained model.

See the ERGM constraints documentation for the constraints implemented in the ergm package. Other packages may add their own constraints. Note that not all possible combinations of constraints and reference measures are supported.

offset.coef

A vector of coefficients for the offset terms.

target.stats

vector of "observed network statistics," if these statistics are for some reason different than the actual statistics of the network on the left-hand side of formula. Equivalently, this vector is the mean-value parameter values for the model. If this is given, the algorithm finds the natural parameter values corresponding to these mean-value parameters. If NULL, the mean-value parameters used are the observed statistics of the network in the formula.

eval.loglik

Logical: For dyad-dependent models, if TRUE, use bridge sampling to evaluate the log-likelihoood associated with the fit. Has no effect for dyad-independent models. Since bridge sampling takes additional time, setting to FALSE may speed performance if likelihood values (and likelihood-based values like AIC and BIC) are not needed.

estimate

If "MPLE," then the maximum pseudolikelihood estimator is returned. If "MLE" (the default), then an approximate maximum likelihood estimator is returned. For certain models, the MPLE and MLE are equivalent, in which case this argument is ignored. (To force MCMC-based approximate likelihood calculation even when the MLE and MPLE are the same, see the force.main argument of control.ergm. If "CD" (EXPERIMENTAL), the Monte-Carlo contrastive divergence estimate is returned. )

control

A list of control parameters for algorithm tuning. Constructed using control.ergm.

verbose

logical; if this is TRUE, the program will print out additional information, including goodness of fit statistics.

...

Additional arguments, to be passed to lower-level functions.

Value

returns an object of class ergm that is a list consisting of the following elements:See the method print.ergm for details on how an ergm object is printed. Note that the method summary.ergm returns a summary of the relevant parts of the ergm object in concise summary format.

Notes on model specification

Although each of the statistics in a given model is a summary statistic for the entire network, it is rarely necessary to calculate statistics for an entire network in a proposed Metropolis-Hastings step. Thus, for example, if the triangle term is included in the model, a census of all triangles in the observed network is never taken; instead, only the change in the number of triangles is recorded for each edge toggle. In the implementation of ergm, the model is initialized in R, then all the model information is passed to a C program that generates the sample of network statistics using MCMC. This sample is then returned to R, which implements a simple Newton-Raphson algorithm to approximate the MLE. An alternative style of maximum likelihood estimation is to use a stochastic approximation algorithm. This can be chosen with the control.ergm(style="Robbins-Monro") option. The mechanism for proposing new networks for the MCMC sampling scheme, which is a Metropolis-Hastings algorithm, depends on two things: The constraints, which define the set of possible networks that could be proposed in a particular Markov chain step, and the weights placed on these possible steps by the proposal distribution. The former may be controlled using the constraints argument described above. The latter may be controlled using the prop.weights argument to the control.ergm function. The package is designed so that the user could conceivably add additional proposal types.

References

Admiraal R, Handcock MS (2007). networksis: Simulate bipartite graphs with fixed marginals through sequential importance sampling. Statnet Project, Seattle, WA. Version 1. statnet.org.

Bender-deMoll S, Morris M, Moody J (2008). Prototype Packages for Managing and Animating Longitudinal Network Data: dynamicnetwork and rSoNIA. Journal of Statistical Software, 24(7). http://www.jstatsoft.org/v24/i07/.

Butts CT (2007). sna: Tools for Social Network Analysis. R package version 2.3-2. https://cran.r-project.org/package=sna.

Butts CT (2008). network: A Package for Managing Relational Data in R. Journal of Statistical Software, 24(2). http://www.jstatsoft.org/v24/i02/.

Butts C (2015). network: The Statnet Project (http://www.statnet.org). R package version 1.12.0, https://cran.r-project.org/package=network.

Goodreau SM, Handcock MS, Hunter DR, Butts CT, Morris M (2008a). A statnet Tutorial. Journal of Statistical Software, 24(8). http://www.jstatsoft.org/v24/i08/.

Goodreau SM, Kitts J, Morris M (2008b). Birds of a Feather, or Friend of a Friend? Using Exponential Random Graph Models to Investigate Adolescent Social Networks. Demography, 45, in press.

Handcock, M. S. (2003) Assessing Degeneracy in Statistical Models of Social Networks, Working Paper \#39, Center for Statistics and the Social Sciences, University of Washington. www.csss.washington.edu/Papers/wp39.pdf

Handcock MS (2003b). degreenet: Models for Skewed Count Distributions Relevant to Networks. Statnet Project, Seattle, WA. Version 1.0, statnet.org.

Handcock MS and Gile KJ (2010). Modeling Social Networks from Sampled Data. Annals of Applied Statistics, 4(1), 5-25. \Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("#1")}10.1214/08-AOAS221http://doi.org/10.1214/08-AOAS221doi:\ifelse{latex}{\out{~}}{ }latex~ 10.1214/08-AOAS221

Handcock MS, Hunter DR, Butts CT, Goodreau SM, Morris M (2003a). ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks. Statnet Project, Seattle, WA. Version 2, statnet.org.

Handcock MS, Hunter DR, Butts CT, Goodreau SM, Morris M (2003b). statnet: Software Tools for the Statistical Modeling of Network Data. Statnet Project, Seattle, WA. Version 2, statnet.org.

Hunter, D. R. and Handcock, M. S. (2006) Inference in curved exponential family models for networks, Journal of Computational and Graphical Statistics.

Hunter DR, Handcock MS, Butts CT, Goodreau SM, Morris M (2008b). ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks. Journal of Statistical Software, 24(3). http://www.jstatsoft.org/v24/i03/.

Krivitsky PN (2012). Exponential-Family Random Graph Models for Valued Networks. Electronic Journal of Statistics, 2012, 6, 1100-1128. \Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("#1")}10.1214/12-EJS696http://doi.org/10.1214/12-EJS696doi:\ifelse{latex}{\out{~}}{ }latex~ 10.1214/12-EJS696

Morris M, Handcock MS, Hunter DR (2008). Specification of Exponential-Family Random Graph Models: Terms and Computational Aspects. Journal of Statistical Software, 24(4). http://www.jstatsoft.org/v24/i04/. Snijders, T.A.B. (2002), Markov Chain Monte Carlo Estimation of Exponential Random Graph Models. Journal of Social Structure. Available from http://www.cmu.edu/joss/content/articles/volume3/Snijders.pdf.

Examples

Run this code


#
# load the Florentine marriage data matrix
#
data(flo)
#
# attach the sociomatrix for the Florentine marriage data
# This is not yet a network object.
#
flo
#
# Create a network object out of the adjacency matrix
#
flomarriage <- network(flo,directed=FALSE)
flomarriage
#
# print out the sociomatrix for the Florentine marriage data
#
flomarriage[,]
#
# create a vector indicating the wealth of each family (in thousands of lira) 
# and add it as a covariate to the network object
#
flomarriage %v% "wealth" <- c(10,36,27,146,55,44,20,8,42,103,48,49,10,48,32,3)
flomarriage
#
# create a plot of the social network
#
plot(flomarriage)
#
# now make the vertex size proportional to their wealth
#
plot(flomarriage, vertex.cex=flomarriage %v% "wealth" / 20, main="Marriage Ties")
#
# Use 'data(package = "ergm")' to list the data sets in a
#
data(package="ergm")
#
# Load a network object of the Florentine data
#
data(florentine)
#
# Fit a model where the propensity to form ties between
# families depends on the absolute difference in wealth
#
gest <- ergm(flomarriage ~ edges + absdiff("wealth"))
summary(gest)
#
# add terms for the propensity to form 2-stars and triangles
# of families 
#
gest <- ergm(flomarriage ~ kstar(1:2) + absdiff("wealth") + triangle)
summary(gest)

# import synthetic network that looks like a molecule
data(molecule)
# Add a attribute to it to mimic the atomic type
molecule %v% "atomic type" <- c(1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3)
#
# create a plot of the social network
# colored by atomic type
#
plot(molecule, vertex.col="atomic type",vertex.cex=3)

# measure tendency to match within each atomic type
gest <- ergm(molecule ~ edges + kstar(2) + triangle + nodematch("atomic type"),
  control=control.ergm(MCMC.samplesize=10000))
summary(gest)

# compare it to differential homophily by atomic type
gest <- ergm(molecule ~ edges + kstar(2) + triangle
                              + nodematch("atomic type",diff=TRUE),
  control=control.ergm(MCMC.samplesize=10000))
summary(gest)

Run the code above in your browser using DataLab