mmsbm: Dynamic mixed-membership stochastic blockmodel with covariates

Description

The function estimates a dynamic mixed-membership stochastic blockmodel that incorporates covariates.

Usage

mmsbm(
  formula.dyad,
  formula.monad = ~1,
  senderID,
  receiverID,
  nodeID = NULL,
  timeID = NULL,
  data.dyad,
  data.monad = NULL,
  n.blocks,
  n.hmmstates = 1,
  directed = TRUE,
  mmsbm.control = list()
)

Value

Object of class mmsbm. List with named components:

MixedMembership: Matrix of variational posterior of mean of mixed-membership vectors. nodes by n.blocks.
BlockModel: n.blocks by n.blocks matrix of estimated tie log-odds between members of corresponding latent groups. The blockmodel.
vcov_blockmodel: If hessian=TRUE, variance-covariance matrix of parameters in blockmodel, ordered in column-major order.
MonadCoef: Array of estimated coefficient values for monadic covariates. Has n.blocks columns, and n.hmmstates slices.
vcov_monad: If hessian=TRUE, variance-covariance matrix of monadic coefficients.
DyadCoef: Vector estimated coefficient values for dyadic covariates.
vcov_dyad: If hessian=TRUE, variance-covariance matrix of dyadic coefficients.
TransitionKernel: Matrix of estimated HMM transition probabilities.
Kappa: Matrix of marginal probabilities of being in an HMM state at any given point in time. n.hmmstates by years (or whatever time interval networks are observed at).
LowerBound: Final LB value
lb: Vector of all LB across iterations, useful to check early convergence issues.
niter: Final number of VI iterations.
converged: Convergence indicator; zero indicates failure to converge.
NodeIndex: Order in which nodes are stored in all return objects.
monadic.data, dyadic.data: Model frames used during estimation (stripped of attributes).
forms: Values of selected formal arguments used by other methods.
seed: The value of RNG seed used during estimation.
call: Original (unevaluated) function call.

Arguments

formula.dyad

A formula object. The variable in data.dyad that contains binary edges should be used as a LHS, and any dyadic predictors can be included on the RHS (when no dyadic covariates are available, use y ~ 1). Same syntax as a glm formula.

formula.monad

An optional formula object. LHS is ignored. RHS contains names of nodal atrributes found in data.monad.

senderID

Character string. Quoted name of the variable in data.dyad identifying the sender node. For undirected networks, the variable simply contains name of first node in dyad. Cannot contain special charecter "`@`".

receiverID

Character string. Quoted name of the variable in data.dyad identifying the receiver node. For undirected networks, the variable simply contains name of second node in dyad. Cannot contain special charecter "`@`".

nodeID

Character string. Quoted name of the variable in data.monad identifying a node in either data.dyad[,senderID] or data.dyad[,senderID]. If not NULL, every node data.dyad[,senderID] or data.dyad[,senderID] must be present in data.monad[,nodeID]. Cannot contain special charecter "`@`".

timeID

Character string. Quoted name of the variable in both data.dyad and data.monad indicating the time in which network (and correspding nodal atrributes) were observed. The variable itself must be composed of integers. Cannot contain special charecter "`@`".

data.dyad

Data frame. Sociomatrix in ``long'' (i.e. dyadic) format. Must contain at least three variables: the sender identifier (or identifier of the first node in an undirected networks dyad), the receiver identifier (or identifier of the second node in an undirected network dyad), and the value of the edge between them. Currently, only edges between zero and one (inclusive) are supported.

data.monad

Data frame. Nodal atributes. Must contain a node identifier matching the names of nodes used in the data.dyad data frame.

n.blocks

Integer value. How many latent groups should be used to estimate the model?

n.hmmstates

Integer value. How many hidden Markov state should be used in the HMM? Defaults to 1 (i.e. no HMM).

directed

Boolean. Is the network directed? Defaults to TRUE.

mmsbm.control

A named list of optional algorithm control parameters.

seed: Integer. Seed the RNG. By default, a random seed is generated and returned for reproducibility purposes.

nstart

Integer. Number of random initialization trials. Defaults to 5.

spectral

Boolean. Type of initialization algorithm for mixed-membership vectors in static case. If TRUE (default), use spectral clustering with degree correction; otherwise, use kmeans algorithm.

init_gibbs

Boolean. Should a collapsed Gibbs sampler of non-regression mmsbm be used to initialize mixed-membership vectors, instead of a spectral or simple kmeans initialization? Setting to TRUE will result in slower initialization and faster model estimation. When TRUE, results are typically very sensitive to choice of alpha (see below).

alpha

Numeric positive value. Concentration parameter for collapsed Gibbs sampler to find initial mixed-membership values when init_gibbs=TRUE. Defaults to 1.0.

missing

Means of handling missing data. One of "indicator method" (default) or "listwise deletion".

svi

Boolean; should stochastic variational inference be used? Defaults to TRUE.

vi_iter

Number of maximum iterations in stochastic variational updates. Defaults to 5e2.

batch_size

When svi=TRUE, proportion of nodes sampled in each local. Defaults to 0.05 when svi=TRUE, and to 1.0 otherwise.

forget_rate

When svi=TRUE, value between (0.5,1], controlling speed of decay of weight of prior parameter values in global steps. Defaults to 0.75 when svi=TRUE, and to 0.0 otherwise.

delay

When svi=TRUE, non-negative value controlling weight of past iterations in global steps. Defaults to 1.0 when svi=TRUE, and ignored otherwise.

opt_iter

Number of maximum iterations of BFGS in global step. Defaults to 10e3.

hessian

Boolean indicating whether the Hessian matrix of regression coefficients should e returned. Defaults to TRUE.

assortative

Boolean indicating whether blockmodel should be assortative (i.e. stronger connections within groups) or disassortative (i.e. stronger connections between groups). Defaults to TRUE.

mu_block

Numeric vector with two elements: prior mean of blockmodel's main diagonal elements, and and prior mean of blockmodel's offdiagonal elements. Defaults to c(5.0, -5.0) if assortative=TRUE (default) and to c(-5.0, 5.0) otherwise.

var_block

Numeric vector with two positive elements: prior variance of blockmodel's main diagonal elements, and and prior variance of blockmodel's offdiagonal elements. Defaults to c(5.0, 5.0).

mu_beta

Either single numeric value, in which case the same prior mean is applied to all monadic coefficients, or an array that is npredictors by n.blocks by n.hmmstates, where npredictors is the number of monadic predictors for which a prior mean is being set (prior means need not be set for all) predictors). The rows in the array should be named to identify which variables a prior mean is being set for. Defaults to a common prior mean of 0.0 for all monadic coefficients.

var_beta

See mu_beta. Defaults to a single common prior variance of 5.0 for all (standardized) monadic coefficients.

mu_gamma

Either a single numeric value, in which case the same prior mean is applied to all dyadic coefficients, or a named vector of numeric values (with names corresponding to the name of the variable for which a prior mean is being set). Defaults to a common prior mean of 0.0 for all dyadic coefficients.

var_gamma

See mu_gamma. Defaults to a single common prior variance of 5.0 for all (standardized) dyadic coefficients.

eta

Numeric positive value. Concentration hyper-parameter for HMM. Defaults to 1.0.

se_sim

Number of samples from variational posterior of latent variables on which approximation to variance-covariance matrices are based. Defaults to 10.

dyad_vcov_samp

Maximum number of dyads to sample in computation of variance-covariance of dyadic and blockmodel parameters, when compared to ten percent of the observed dyads. Defaults to 1000.

fixed_mm

Optional character vector, with "nodeID@timeID" as elements, indicating which mixed-membership vectors should remain constant at their initial values throughout estimation. When only one year is observed, elements should be "nodeID@1". Typically used with mm_init_t.

mm_init_t

Matrix, n.blocks by nodes across years. Optional initial values for mixed-membership vectors. Although initial values need not be provided for all nodes, column names must have a nodeID@timeID format to avoid ambiguity. When only one year is observed, names should be "nodeID@1".

kappa_init_t

Matrix, n.hmmstates by number of years. Optional initial values for variational parameters for state probabilities. Columns must be named according to unique year values.

b_init_t

Matrix, n.blocks by n.blocks. Optional initial values for blockmodel.

beta_init

Array, predictors by n.blocks by n.hmmstates. Optional initial values for monadic coefficients. If

gamma_init

Vector. Optional initial values for dyadic coefficients.

permute

Boolean. Should all permutations be tested to realign initial block models in dynamic case? If FALSE, realignment is done via faster graph matching algorithm, but may not be exact. Defaults to TRUE.

conv_tol

Numeric value. Absolute tolerance for VI convergence. Defaults to 1e-3.

verbose

Boolean. Should extra information be printed as model iterates? Defaults to FALSE.

Author

Santiago Olivella (olivella@unc.edu), Adeline Lo (aylo@wisc.edu), Tyler Pratt (tyler.pratt@yale.edu), Kosuke Imai (imai@harvard.edu)

Examples

Run this code

library(NetMix)
## Load datasets
data("lazega_dyadic")
data("lazega_monadic")
## Estimate model with 2 groups
## Setting to `hessian=TRUE` increases computation time
## but is needed if standard errors are to be computed. 
lazega_mmsbm <- mmsbm(SocializeWith ~ Coworkers,
                      ~  School + Practice + Status,
                      senderID = "Lawyer1",
                      receiverID = "Lawyer2",
                      nodeID = "Lawyer",
                      data.dyad = lazega_dyadic,
                      data.monad = lazega_monadic,
                      n.blocks = 2,
                      mmsbm.control = list(seed = 123,
                                           conv_tol = 1e-2,
                                           hessian = FALSE))

Run the code above in your browser using DataLab