estimate: Estimation of SBMs with missing data

Description

Variational inference from sampled network data on a collection of Stochastic Block Models indexed by block number.

Usage

estimate(sampledNet, vBlocks, sampling, clusterInit = "hierarchical",
  useCovariates = TRUE, control = list())

Arguments

sampledNet

An object with class sampledNetwork, typically obtained with the function prepare_data (real-word data) or sample (simulation).

vBlocks

The vector of number of blocks considered in the collection

sampling

The sampling design for the modelling of missing data: MAR designs ("dyad", "node") and NMAR designs ("double-standard", "block-dyad", "block-node" ,"degree")

clusterInit

Initial method for clustering: either a character in "hierarchical", "spectral" or "kmeans", or a list with length(vBlocks) vectors, each with size ncol(adjacencyMatrix), providing a user-defined clustering. Default is "hierarchical".

useCovariates

logicial. If covariates are present in sampledNet, should they be used for the inference or of the network sampling design, or just for the SBM inference? default is TRUE.

control

a list of parameters controlling the variational EM algorithm. See details.

Value

Returns an R6 object with class missSBM_collection.

Details

The list of parameters control essentially tunes the optimization process and the variational EM algorithm, with the following parameters

"threshold"stop when an optimization step changes the objective function by less than threshold. Default is 1e-4.
"maxIter"V-EM algorithm stops when the number of iteration exceeds maxIter. Default is 200
"fixPointIter"number of fix-point iterations in the Variational E step. Default is 5.
"cores"integer for number of cores used. Default is 1.
"trace"integer for verbosity (0, 1, 2). Default is 1. Useless when cores > 1

The different sampling designs are split into two families in which we find dyad-centered and node-centered samplings. See <doi:10.1080/01621459.2018.1562934> for complete description.

Missing at Random (MAR)
- "dyad": parameter = p and $$p = P(Dyad (i,j) is sampled)$$
- "node": parameter = p and $$p = P(Node i is sampled)$$
- "covar-dyad": parameter = beta in R^M and $$P(Dyad (i,j) is sampled) = logistic(parameter' covarArray (i,j, ))$$
- "covar-node": parameter = nu in R^M and $$P(Node i is sampled) = logistic(parameter' covarMatrix (i,)$$
Not Missing At Random (NMAR)
- "double-standard": parameter = (p0,p1) and $$p0 = P(Dyad (i,j) is sampled | the dyad is equal to 0)=$$, p1 = P(Dyad (i,j) is sampled | the dyad is equal to 1)
- "block-node": parameter = c(p(1),...,p(Q)) and $$p(q) = P(Node i is sampled | node i is in cluster q)$$
- "block-dyad": parameter = c(p(1,1),...,p(Q,Q)) and $$p(q,l) = P(Edge (i,j) is sampled | node i is in cluster q and node j is in cluster l)$$
- "degree": parameter = c(a,b) and $$logit(a+b*Degree(i)) = P(Node i is sampled | Degree(i))$$

Examples

Run this code

# NOT RUN {
## SBM parameters
directed <- FALSE
N <- 300 # number of nodes
Q <- 3   # number of clusters
alpha <- rep(1,Q)/Q     # mixture parameter
pi <- diag(.45,Q) + .05 # connectivity matrix

## simulate a SBM without covariates
sbm <- missSBM::simulate(N, alpha, pi, directed)

## Sample network data
samplingParameters <- .5 # the sampling rate
sampling <- "dyad"       # the sampling design
sampledNet <- missSBM::sample(sbm$adjacencyMatrix, sampling, samplingParameters)

## Inference :
vBlocks <- 1:5 # number of classes
collection <- missSBM::estimate(sampledNet, vBlocks, sampling)
collection$ICL
# }

Run the code above in your browser using DataLab