observeNetwork: Observe a network partially according to a given sampling design

Description

This function draws observations in an adjacency matrix according to a given network sampling design.

Usage

observeNetwork(
  adjacencyMatrix,
  sampling,
  parameters,
  clusters = NULL,
  covariates = NULL,
  similarity = missSBM:::l1_similarity,
  intercept = 0
)

Arguments

adjacencyMatrix

The N x N adjacency matrix of the network to sample.

sampling

The sampling design used to observe the adjacency matrix, see details.

parameters

The sampling parameters (adapted to each sampling, see details).

clusters

An optional clustering membership vector of the nodes. Only necessary for block samplings.

covariates

An optional list with M entries (the M covariates). If the covariates are node-centered, each entry of covariates. must be a size-N vector; if the covariates are dyad-centered, each entry of covariates must be N x N matrix.

similarity

An optional function to compute similarities between node covariates. Default is missSBM:::l1_similarity, that is, -abs(x-y). Only relevant when the covariates are node-centered.

intercept

An optional intercept term to be added in case of the presence of covariates. Default is 0.

Value

an adjacency matrix with the same dimension as the input, yet with additional NAs.

Details

The list of parameters control tunes more advanced features, such as the initialization, how covariates are handled in the model, and the variational EM algorithm:

"useCovSBM": logical. If covariates is not null, should they be used for the for the SBM inference (or just for the sampling)? Default is TRUE.
"clusterInit": Initial method for clustering: either a character in "hierarchical", "spectral" or "kmeans", or a list with length(vBlocks) vectors, each with size ncol(adjacencyMatrix), providing a user-defined clustering. Default is "spectral".
"similarity": An R x R -> R function to compute similarities between node covariates. Default is missSBM:::l1_similarity, that is, -abs(x-y). Only relevant when the covariates are node-centered (i.e. covariates is a list of size-N vectors).
"threshold": V-EM algorithm stops stop when an optimization step changes the objective function by less than threshold. Default is 1e-3.
"maxIter": V-EM algorithm stops when the number of iteration exceeds maxIter. Default is 100 with no covariate, 50 otherwise.
"fixPointIter": number of fix-point iterations in the V-E step. Default is 5 with no covariate, 2 otherwise.
"cores": integer for number of cores used. Default is 1.
"trace": integer for verbosity (0, 1, 2). Default is 1. Useless when cores > 1

The different sampling designs are split into two families in which we find dyad-centered and node-centered samplings. See 10.1080/01621459.2018.1562934 for a complete description.

Missing at Random (MAR)
- "dyad": parameter = p = Prob(Dyad(i,j) is observed)
- "node": parameter = p = Prob(Node i is observed)
- "covar-dyad": parameter = beta in R^M, such that Prob(Dyad (i,j) is observed) = logistic(parameter' covarArray (i,j, .))
- "covar-node": parameter = nu in R^M such that Prob(Node i is observed) = logistic(parameter' covarMatrix (i,)
- "snowball": parameter = number of waves with Prob(Node i is observed in the 1st wave)
Not Missing At Random (NMAR)
- "double-standard": parameter = (p0,p1) with p0 = Prob(Dyad (i,j) is observed | the dyad is equal to 0), p1 = Prob(Dyad (i,j) is observed | the dyad is equal to 1)
- "block-node": parameter = c(p(1),...,p(Q)) and p(q) = Prob(Node i is observed | node i is in cluster q)
- "block-dyad": parameter = c(p(1,1),...,p(Q,Q)) and p(q,l) = Prob(Edge (i,j) is observed | node i is in cluster q and node j is in cluster l)
- "degree": parameter = c(a,b) and logit(a+b*degree(i)) = Prob(Node i is observed | Degree(i))

Examples

Run this code

# NOT RUN {
## SBM parameters
N <- 300 # number of nodes
Q <- 3   # number of clusters
pi <- rep(1,Q)/Q     # block proportion
theta <- list(mean = diag(.45,Q) + .05 ) # connectivity matrix

## simulate an unidrected binary SBM without covariate
sbm <- sbm::sampleSimpleSBM(N, pi, theta)

## Sample network data

# some sampling design and their associated parameters
sampling_parameters <- list(
   "dyad" = .3,
   "node" = .3,
   "double-standard" = c(0.4, 0.8),
   "block-node" = c(.3, .8, .5),
   "block-dyad" = theta$mean,
   "degree" = c(.01, .01),
   "snowball" = c(2,.1)
 )

observed_networks <- list()

for (sampling in names(sampling_parameters)) {
  observed_networks[[sampling]] <-
     missSBM::observeNetwork(
       adjacencyMatrix = sbm$netMatrix,
       sampling        = sampling,
       parameters      = sampling_parameters[[sampling]],
       cluster         = sbm$memberships
     )
}
# }

Run the code above in your browser using DataLab