`sdsm` computes the proportion of generated edges
above or below the observed value using the stochastic degree sequence model.
Once computed, use backbone.extract
to return
the backbone matrix for a given alpha value.
sdsm(B, trials = 0, model = "logit", sparse = TRUE, maxiter = 25,
dyad = NULL, alpha = 0.05, tolerance = 0, progress = FALSE)
Matrix: Bipartite adjacency matrix
Integer: Number of random bipartite graphs generated. Default is 0.
String: A generalized linear model (glm) used to generate random bipartite graphs.
Boolean: If sparse matrix manipulations should be used
Integer: Maximum number of iterations if "model" is a glm.
vector length 2: two row entries i,j. Saves each value of the i-th row and j-th column in each projected B* matrix. This is useful for visualizing an example of the empirical null edge weight distribution generated by the model. These correspond to the row and column indices of a cell in the projected matrix , and can be written as their string row names or as numeric values.
Real: proposed alpha threshold to be used for determining statistical significance of edges
Real: tolerance for p-value computation using RNA poisson-binomial approximation
Boolean: If txtProgressBar should be used to measure progress
list(positive, negative, dyad_values, summary). positive: matrix of proportion of times each entry of the projected matrix B is above the corresponding entry in the generated projection. negative: matrix of proportion of times each entry of the projected matrix B is below the corresponding entry in the generated projection. dyad_values: list of edge weight for i,j in each generated projection, included if 'dyad' not NULL and 'trials > 0'. summary: a data frame summary of the inputted matrix and the model used including: model name, number of rows, skew of row sums, number of columns, skew of column sums, and running time.
The 'model' parameter can take in a 'link' function, as described by glm and family. This can be one of c('logit', 'probit', 'cauchit', 'log', 'cloglog').
If 'trials'>0, the function uses repeat Bernoulli trials to compute the proportions, using the following steps: During each iteration, sdsm computes a new B* matrix using probabilities computed using the `glm`. This is a random bipartite matrix with about the same row and column sums as the original matrix B. If the dyad_parameter is indicated to be used in the parameters, when the B* matrix is projected, the projected value for the corresponding row and column will be saved. This allows the user to see the distribution of the edge weights for desired row and column.
If 'trials'=0, the proportion of edges above or below the observed values are computed using the Poisson Binomial distribution. These values are approximated using either a Discrete Fourier Transform (DFT method) or a Refined Normal Approximation (RNA method). These functions are described by ppoibin. The RNA method is used by default, unless the computed value is within the margin of 'alpha'-'tolerance' and 'alpha'+'tolerance', the DFT method is used.
# NOT RUN {
sdsm_bt <- sdsm(davis, trials = 100,dyad = c("EVELYN", "CHARLOTTE" ))
sdsm_rna <- sdsm(davis, trials = 0, tolerance = 0)
sdsm_dft <- sdsm(davis, trials = 0, tolerance = 1)
# }
Run the code above in your browser using DataLab