ergm
is used to fit linear exponential random network models, in which
the probability of a given network, $y$, on a set of nodes is
$\exp(\theta{\cdot}g(y))/c(\theta)$, where
$g(y)$ is a vector of network statistics,
$\theta$ is a parameter vector of the same length and $c(\theta)$ is the
normalizing constant for the distribution.
ergm
can return either a maximum pseudo-likelihood
estimate or an approximate maximum likelihood estimator based on a Monte Carlo scheme.ergm(formula, theta0="MPLE",
MPLEonly=FALSE, MLestimate=!MPLEonly, seed=NULL,
burnin=10000, MCMCsamplesize=10000, interval=100, maxit=3,
constraints=~.,
meanstats = NULL,
control=control.ergm(),
eval.loglik=FALSE,
verbose=FALSE, ...)
theta0="MPLE"
).TRUE
if the maximum
pseudo-likelihood estimate is to be computed and returned.
Note that MPLEonly=TRUE
will render moot most other parameters in this list.
See ergmMPLE<
TRUE
if only the Monte Carlo maximum likelihood estimate
is to be computed and returned.formula
argument. Multiple constraints
may be given, separated by control.ergm
.NULL
to use whatever the state of the random number
generator is at the time of the call.FALSE
because bridge sampling takes additional time.TRUE
, the program will print out additional
information, including goodness of fit statistics.ergm
returns an object of class ergm
that is a list
consisting of the following elements:sample
is roughly a random sample from the distribution
of network statistics specified by the model with the parameter equal
to MCMCtheta
. If MPLEonly=TRUE
then
MCMCtheta
equals the MPLE.glm
object.degeneracy.value
. Mainly
for internal use.formula
entered into the ergm
function.ergm
callergm
call
(part of the control.ergm
function output).print.ergm
for details on how
an ergm
object is printed. Note that the
method summary.ergm
returns a summary of the
relevant parts of the ergm
object in concise summary
format.ergm
function allows the user to explore a large number
of potential models for their network data. The
terms currently supported by the program,
and a brief description of each is given in the documentation
ergm-terms
.
In the formula
for the model, the model terms are various function-like
calls, some of which require arguments, separated by +
signs.
For a more detailed understanding of the model terms, see and Morris, Handcock and Hunter (2008).In the implementation of ergm
, the model is initialized
in R, then all the model information is passed to a C program
that generates the sample of network statistics using MCMC.
This sample is then returned to R, which implements a
simple Newton-Raphson algorithm to approximate the MLE.
An alternative style of maximum likelihood estimation is to use a stochastic
approximation algorithm. This can be chosen with the
control.ergm(style="Robbins-Monro")
option.
The mechanism for proposing new networks for the MCMC sampling
scheme, which is a Metropolis-Hastings algorithm, depends on
two things: The constraints
, which define the set of possible
networks that could be proposed in a particular Markov chain step,
and the weights placed on these possible steps by the
proposal distribution. The former may be controlled using the
constraints
argument described above. The latter may
be controlled using the prop.weights
argument to the
control.ergm
function.
The package is designed so that the user could conceivably add additional proposal types.
constraints
term bd
.Let's consider the simple cases first. Suppose you want to condition on the total number of degrees regardless of attributes. That is, if you had a survey that asked respondents to name three alters and no more, then you might want to limit your maximal outdegree to three without regard to any of the alters' attributes. The argument is then:
constraints=~bd(maxout=3)
Similar calls are used to restrict the number of indegrees
(maxin
), the minimum number of outdegrees
(minout
), and the minimum number of indegrees
(minin
).
You can also set ego specific limits. For example:
constraints=bd(maxout=rep(c(3,4),c(36,35)))
limits the first 36 to 3 and the other 35 to 4 outdegrees.
Multiple restrictions can be combined. bd
is very flexible.
In general, the bd
term can contain up to five arguments:
bd(attribs=attribs, maxout=maxout, maxin=maxin, minout=minout, minin=minin)
Omitted arguments are unrestricted, and arguments of length 1
are replicated out to all nodes (as above). If an individual
entry in maxout
,..., minin
is NA
then
no restriction of that kind is applied to that actor.
In general, attribs
is a matrix of the attributes on
which we are conditioning. The dimensions of attribs
are n_nodes
rows by attrcount
columns, where
attrcount
is the number of distinct attribute values
on which we want to condition (i.e., a separate column is
required for attribs[n, i]
, therefore, is TRUE
if node n
has attribute value i, and FALSE
otherwise.
(Note that, since each column represents only a single value
of a single attribute, the values of this matrix are all
Boolean (TRUE
or FALSE
).) It is important to
note that attribs
is a matrix of nodal attributes,
not alter attributes.
So, for instance, if we wanted to construct an attribs
matrix
with two columns, one each for male and female attribute
values (we are conditioning on these values of the attribute
n_node
-long vector of 0s and 1s (men and women),
then our code would look as follows:
# male column: bit vector, TRUE for males attrsex1 <- (ads.sex == 0) # female column: bit vector, TRUE for females attrsex2 <- (ads.sex == 1) # now create attribs matrix attribs <- matrix(ncol=2,nrow=71, data=c(attrsex1,attrsex2))
maxout
is a matrix of alter attributes, with the same
dimensions as the attribs
matrix. maxout
is n_nodes
rows by attrcount
columns. The value of maxout[n,i]
,
therefore, is the maximum number of outdegrees permitted
from node n
to nodes with the attribute i
(where a NA
means there is no maximum).
For example: if we wanted to create a maxout
matrix to work
with our attribs
matrix above, with a maximum from every
node of five outedges to males and five outedges to females,
our code would look like this:
# every node has maximum of 5 outdegrees to male alters maxoutsex1 <- c(rep(5,71)) # every node has maximum of 5 outdegrees to female alters maxoutsex2 <- c(rep(5,71)) # now create maxout matrix maxout <- cbind(maxoutsex1,maxoutsex2)
The maxin
, minout
, and minin
matrices
are constructed exactly like the maxout
matrix,
except for the maximum allowed indegree, the minimum allowed
outdegree, and the minimum allowed indegree, respectively.
Note that in an undirected network, we only look at the outdegree
matrices; maxin
and minin
will both be ignored
in this case.
Bender-deMoll S, Morris M, Moody J (2008).
Prototype Packages for Managing and Animating Longitudinal
Network Data:
Butts CT (2006).
Butts CT (2007).
Butts CT (2008).
Butts CT, with help~from David~Hunter, Handcock MS (2007).
Goodreau SM, Handcock MS, Hunter DR, Butts CT, Morris M (2008a).
A
Goodreau SM, Kitts J, Morris M (2008b). Birds of a Feather, or Friend of a Friend? Using Exponential Random Graph Models to Investigate Adolescent Social Networks. Demography, 45, in press.
Handcock, M. S. (2003)
Assessing Degeneracy in Statistical Models of Social Networks,
Working Paper #39,
Center for Statistics and the Social Sciences,
University of Washington.
Handcock MS (2003b).
Handcock MS, Hunter DR, Butts CT, Goodreau SM, Morris M (2003a).
Handcock MS, Hunter DR, Butts CT, Goodreau SM, Morris M (2003b).
Hunter, D. R. and Handcock, M. S. (2006) Inference in curved exponential family models for networks, Journal of Computational and Graphical Statistics.
Hunter DR, Handcock MS, Butts CT, Goodreau SM, Morris M (2008b).
Krivitsky PN, Handcock MS (2007).
Morris M, Handcock MS, Hunter DR (2008).
Specification of Exponential-Family Random Graph Models:
Terms and Computational Aspects.
Journal of Statistical Software, 24(4).
ergm-terms
, ergmMPLE
,
summary.ergm
, print.ergm
#
# load the Florentine marriage data matrix
#
data(flo)
#
# attach the sociomatrix for the Florentine marriage data
# This is not yet a network object.
#
flo
#
# Create a network object out of the adjacency matrix
#
flomarriage <- network(flo,directed=FALSE)
flomarriage
#
# print out the sociomatrix for the Florentine marriage data
#
flomarriage[,]
#
# create a vector indicating the wealth of each family (in thousands of lira)
# and add it as a covariate to the network object
#
flomarriage %v% "wealth" <- c(10,36,27,146,55,44,20,8,42,103,48,49,10,48,32,3)
flomarriage
#
# create a plot of the social network
#
plot(flomarriage)
#
# now make the vertex size proportional to their wealth
#
plot(flomarriage, vertex.cex="wealth", main="Marriage Ties")
#
# Use 'data(package = "ergm")' to list the data sets in a
#
data(package="ergm")
#
# Load a network object of the Florentine data
#
data(florentine)
#
# Fit a model where the propensity to form ties between
# families depends on the absolute difference in wealth
#
gest <- ergm(flomarriage ~ edges + absdiff("wealth"))
summary(gest)
#
# add terms for the propensity to form 2-stars and triangles
# of families
#
gest <- ergm(flomarriage ~ kstar(1:2) + absdiff("wealth") + triangle)
summary(gest)
# import synthetic network that looks like a molecule
data(molecule)
# Add a attribute to it to mimic the atomic type
molecule %v% "atomic type" <- c(1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3)
#
# create a plot of the social network
# colored by atomic type
#
plot(molecule, vertex.col="atomic type",vertex.cex=3)
# measure tendency to match within each atomic type
gest <- ergm(molecule ~ edges + kstar(2) + triangle + nodematch("atomic type"),
MCMCsamplesize=10000)
summary(gest)
# compare it to differential homophily by atomic type
gest <- ergm(molecule ~ edges + kstar(2) + triangle
+ nodematch("atomic type",diff=TRUE),
MCMCsamplesize=10000)
summary(gest)
Run the code above in your browser using DataLab