estimateMissSBM: Estimation of simple SBMs with missing data

Description

Variational EM inference of Stochastic Block Models indexed by block number from a partially observed network.

Usage

estimateMissSBM(
  adjacencyMatrix,
  vBlocks,
  sampling,
  covariates = NULL,
  control = list()
)

Arguments

adjacencyMatrix

The N x N adjacency matrix of the network data. If adjacencyMatrix is symmetric, we assume an undirected network with no loop; otherwise the network is assumed to be directed.

vBlocks

The vector of number of blocks considered in the collection.

sampling

The model used to described the process that originates the missing data: MAR designs ("dyad", "node","covar-dyad","covar-node","snowball") and NMAR designs ("double-standard", "block-dyad", "block-node" , "degree") are available. See details.

covariates

A list with M entries (the M covariates). If the covariates are node-centered, each entry of covariates must be a size-N vector; if the covariates are dyad-centered, each entry of covariates must be N x N matrix.

control

a list of parameters controlling advanced features. See details.

Value

Returns an R6 object with class missSBM_collection.

Details

The list of parameters control tunes more advanced features, such as the initialization, how covariates are handled in the model, and the variational EM algorithm:

"useCovSBM": logical. If covariates is not null, should they be used for the for the SBM inference (or just for the sampling)? Default is TRUE.
"clusterInit": Initial method for clustering: either a character in "hierarchical", "spectral" or "kmeans", or a list with length(vBlocks) vectors, each with size ncol(adjacencyMatrix), providing a user-defined clustering. Default is "spectral".
"similarity": An R x R -> R function to compute similarities between node covariates. Default is missSBM:::l1_similarity, that is, -abs(x-y). Only relevant when the covariates are node-centered (i.e. covariates is a list of size-N vectors).
"threshold": V-EM algorithm stops stop when an optimization step changes the objective function by less than threshold. Default is 1e-3.
"maxIter": V-EM algorithm stops when the number of iteration exceeds maxIter. Default is 100 with no covariate, 50 otherwise.
"fixPointIter": number of fix-point iterations in the V-E step. Default is 5 with no covariate, 2 otherwise.
"cores": integer for number of cores used. Default is 1.
"trace": integer for verbosity (0, 1, 2). Default is 1. Useless when cores > 1

The different sampling designs are split into two families in which we find dyad-centered and node-centered samplings. See 10.1080/01621459.2018.1562934 for a complete description.

Missing at Random (MAR)
- "dyad": parameter = p = Prob(Dyad(i,j) is observed)
- "node": parameter = p = Prob(Node i is observed)
- "covar-dyad": parameter = beta in R^M, such that Prob(Dyad (i,j) is observed) = logistic(parameter' covarArray (i,j, .))
- "covar-node": parameter = nu in R^M such that Prob(Node i is observed) = logistic(parameter' covarMatrix (i,)
- "snowball": parameter = number of waves with Prob(Node i is observed in the 1st wave)
Not Missing At Random (NMAR)
- "double-standard": parameter = (p0,p1) with p0 = Prob(Dyad (i,j) is observed | the dyad is equal to 0), p1 = Prob(Dyad (i,j) is observed | the dyad is equal to 1)
- "block-node": parameter = c(p(1),...,p(Q)) and p(q) = Prob(Node i is observed | node i is in cluster q)
- "block-dyad": parameter = c(p(1,1),...,p(Q,Q)) and p(q,l) = Prob(Edge (i,j) is observed | node i is in cluster q and node j is in cluster l)
- "degree": parameter = c(a,b) and logit(a+b*degree(i)) = Prob(Node i is observed | Degree(i))

Examples

Run this code

# NOT RUN {
## SBM parameters
N <- 150 # number of nodes
Q <- 3   # number of clusters
pi <- rep(1,Q)/Q     # block proportion
theta <- list(mean = diag(.45,Q) + .05 ) # connectivity matrix

## Sampling parameters
samplingParameters <- .5 # the sampling rate
sampling  <- "dyad"      # the sampling design

## generate a undirected binary SBM with no covariate
sbm <- sbm::sampleSimpleSBM(N, pi, theta)

## Sample some dyads data + Infer SBM with missing data
collection <-
   observeNetwork(sbm$netMatrix, sampling, samplingParameters) %>%
   estimateMissSBM(vBlocks = 1:5, sampling = sampling)
collection$ICL
coef(collection$bestModel$fittedSBM, "connectivity")

myModel <- collection$bestModel
plot(myModel, "network")
coef(myModel, "sampling")
coef(myModel, "connectivity")
predict(myModel)[1:5, 1:5]
fitted(myModel)[1:5, 1:5]

# }

Run the code above in your browser using DataLab