sdsm: The stochastic degree sequence model (sdsm)

Description

`sdsm` computes the proportion of generated edges above or below the observed value using the stochastic degree sequence model. Once computed, use backbone.extract to return the backbone matrix for a given alpha value.

Usage

sdsm(B, trials = 0, model = "logit", sparse = TRUE, maxiter = 25,
  dyad = NULL, alpha = 0.05, tolerance = 0, progress = FALSE)

Arguments

Matrix: Bipartite adjacency matrix

trials

Integer: Number of random bipartite graphs generated. Default is 0.

model

String: A generalized linear model (glm) used to generate random bipartite graphs.

sparse

Boolean: If sparse matrix manipulations should be used

maxiter

Integer: Maximum number of iterations if "model" is a glm.

dyad

vector length 2: two row entries i,j. Saves each value of the i-th row and j-th column in each projected B* matrix. This is useful for visualizing an example of the empirical null edge weight distribution generated by the model. These correspond to the row and column indices of a cell in the projected matrix , and can be written as their string row names or as numeric values.

alpha

Real: proposed alpha threshold to be used for determining statistical significance of edges

tolerance

Real: tolerance for p-value computation using RNA poisson-binomial approximation

progress

Boolean: If txtProgressBar should be used to measure progress

Value

list(positive, negative, dyad_values, summary). positive: matrix of proportion of times each entry of the projected matrix B is above the corresponding entry in the generated projection. negative: matrix of proportion of times each entry of the projected matrix B is below the corresponding entry in the generated projection. dyad_values: list of edge weight for i,j in each generated projection, included if 'dyad' not NULL and 'trials > 0'. summary: a data frame summary of the inputted matrix and the model used including: model name, number of rows, skew of row sums, number of columns, skew of column sums, and running time.

Details

The 'model' parameter can take in a 'link' function, as described by glm and family. This can be one of c('logit', 'probit', 'cauchit', 'log', 'cloglog').

If 'trials'>0, the function uses repeat Bernoulli trials to compute the proportions, using the following steps: During each iteration, sdsm computes a new B* matrix using probabilities computed using the `glm`. This is a random bipartite matrix with about the same row and column sums as the original matrix B. If the dyad_parameter is indicated to be used in the parameters, when the B* matrix is projected, the projected value for the corresponding row and column will be saved. This allows the user to see the distribution of the edge weights for desired row and column.

If 'trials'=0, the proportion of edges above or below the observed values are computed using the Poisson Binomial distribution. These values are approximated using either a Discrete Fourier Transform (DFT method) or a Refined Normal Approximation (RNA method). These functions are described by ppoibin. The RNA method is used by default, unless the computed value is within the margin of 'alpha'-'tolerance' and 'alpha'+'tolerance', the DFT method is used.

References

Neal, Z. P. (2014). The backbone of bipartite projections: Inferring relationships from co-authorship, co-sponsorship, co-attendance, and other co-behaviors. Social Networks, 39, Elsevier: 84-97. DOI: 10.1016/j.socnet.2014.06.001

Examples

Run this code

# NOT RUN {
sdsm_bt <- sdsm(davis, trials = 100,dyad = c("EVELYN", "CHARLOTTE" ))
sdsm_rna <- sdsm(davis, trials = 0, tolerance = 0)
sdsm_dft <- sdsm(davis, trials = 0, tolerance = 1)
# }

Run the code above in your browser using DataLab