gof-methods: Conduct Goodness-of-Fit Diagnostics on ERGMs, TERGMs, SAOMs, and logit models

Description

Assess goodness of fit and degeneracy of btergm and other network models.

Usage

## S3 method for class 'btergm':
gof(object, target = NULL, 
    formula = getformula(object), nsim = 100, MCMC.interval = 1000, 
    MCMC.burnin = 10000, parallel = c("no", "MPI", "SOCK"), 
    ncpus = 1, cl = NULL, classicgof = TRUE, rocprgof = TRUE, 
    checkdegeneracy = TRUE, statistics = c("dsp", "esp", "geodist", 
    "degree", "idegree", "odegree", "kstar", "istar", "ostar"), 
    pr.impute = "poly4", verbose = TRUE, ...)
## S3 method for class 'ergm':
gof(object, target = NULL, 
    formula = getformula(object), nsim = 100, MCMC.interval = 1000, 
    MCMC.burnin = 10000, parallel = c("no", "MPI", "SOCK"), 
    ncpus = 1, cl = NULL, classicgof = TRUE, rocprgof = TRUE, 
    checkdegeneracy = TRUE, statistics = c("dsp", "esp", "geodist", 
    "degree", "idegree", "odegree", "kstar", "istar", "ostar"), 
    pr.impute = "poly4", verbose = TRUE, ...)
## S3 method for class 'sienaAlgorithm':
gof(object, siena.data, siena.effects, 
    predict.period = NULL, nsim = 50, parallel = c("no", "multicore", 
    "snow"), ncpus = 1, cl = NULL, target.na = NA, 
    target.na.method = "remove", target.structzero = 10, 
    classicgof = TRUE, rocprgof = TRUE, statistics = c("dsp", "esp", 
    "geodist", "degree", "idegree", "odegree", "kstar", "istar", 
    "ostar"), pr.impute = "poly4", ...)
## S3 method for class 'sienaModel':
gof(object, siena.data, siena.effects, 
    predict.period = NULL, nsim = 50, parallel = c("no", "multicore", 
    "snow"), ncpus = 1, cl = NULL, target.na = NA, 
    target.na.method = "remove", target.structzero = 10, 
    classicgof = TRUE, rocprgof = TRUE, statistics = c("dsp", "esp", 
    "geodist", "degree", "idegree", "odegree", "kstar", "istar", 
    "ostar"), pr.impute = "poly4", ...)
## S3 method for class 'network':
gof(object, covariates, coef, target = NULL, 
    nsim = 100, mcmc = FALSE, MCMC.interval = 1000, 
    MCMC.burnin = 10000, parallel = c("no", "MPI", "SOCK"), 
    ncpus = 1, cl = NULL, classicgof = TRUE, rocprgof = TRUE, 
    statistics = c("dsp", "esp", "geodist", "degree", "idegree", 
    "odegree", "kstar", "istar", "ostar"), pr.impute = "poly4", 
    verbose = TRUE, ...)
## S3 method for class 'matrix':
gof(object, covariates, coef, target = NULL, 
    nsim = 100, mcmc = FALSE, MCMC.interval = 1000, 
    MCMC.burnin = 10000, parallel = c("no", "MPI", "SOCK"), 
    ncpus = 1, cl = NULL, classicgof = TRUE, rocprgof = TRUE, 
    statistics = c("dsp", "esp", "geodist", "degree", "idegree", 
    "odegree", "kstar", "istar", "ostar"), pr.impute = "poly4", 
    verbose = TRUE, ...)

Arguments

object

A btergm, ergm, sienaAlgorithm, or sienaModel object (for the btergm, ergm, sienaAlgorithm, and sienaModel methods, respectively). Or a network object

siena.data

An object of the class siena, which is usually created using the sienaDataCreate function in the RSiena package.

siena.effects

An object of the class sienaEffects, which is usually created using the getEffects() and the includeEffects() function in the RSiena package.

predict.period

Which time period should be predicted? By default, the last time period is predicted based on the last simulation of the second-last time period. The time period can be provided as a numeric, e.g., predict.period = 4 for predicting the fourth

target

A network or list of networks to which the simulations are compared. If left empty, the original networks from the btergm object x are used as observed networks.

formula

A model formula from which networks are simulated for comparison. By default, the formula from the btergm object x is used. It is possible to hand over a formula with only a single response network and/or dyad or edge covariates

nsim

The number of networks to be simulated at each time step. Example: If there are six time steps in the formula and nsim = 100, a total of 600 new networks is simulated.

MCMC.interval

Internally, this package uses the simulation facilities of the ergm package to create new networks against which to compare the original network(s) for goodness-of-fit assessment. This argument sets the MCMC interval to be passed over to the si

MCMC.burnin

parallel

Use multiple cores in a computer or nodes in a cluster to speed up the simulations. The default value "no" means parallel computing is switched off. If "multicore" is used (only available for sienaAlgorithm and

ncpus

The number of CPU cores used for parallel simulations (only if parallel is activated). If the number of cores should be detected automatically on the machine where the code is executed, one can try the detectCores() function from

An optional parallel or snow cluster for use if parallel = "snow". If not supplied, a cluster on the local machine is created temporarily.

target.na

Which value was used for missing data in the dependent variable?

target.na.method

How should missing data be handled when comparing the simulations to the empirical (= observed) network? Two options are possible: remove drops nodes with missing ties both from the simulations (after running the simulations) and from the obs

target.structzero

Which value was used for structural zeroes (usually nodes which have dropped out of the network or have not yet joined the network) in the dependent variable? These nodes are removed from the observed network and the simulations before comparison.

classicgof

If classicgof = TRUE is set, the classic statnet-style goodness-of-fit comparison is conducted. This means that shared-partner statistics, the geodesic distance distribution and the degree distribution are compared between observed and simula

rocprgof

If rocprgof = TRUE is set, the coordinates of ROC and PR curves as well as the AUC measure are stored in the resulting btergmgof object. The results can be plotted as curves or printed as tables. Note that the classicgof

checkdegeneracy

If checkdegeneracy = TRUE is set, the global statistics of the observed and simulated networks are compared for each observed time step separately. Frequent significant deviations indicate degeneracy. The results can be printed as tables. Not

statistics

A character vector of auxiliary statistics used for comparison of observed and simulated networks. Valid values are "dsp", "esp", "geodist", "degree", "idegree", "odegree",

pr.impute

In some cases, the first precision value of the precision-recall curve is undefined. The pr.impute argument serves to impute this missing value to ensure that the AUC-PR value is not severely biased.

Possible values are "no" for

covariates

A list of matrices or network objects that serve as covariates for the dependent network. The covariates in this list are automatically added to the formula as edgecov terms.

coef

A vector of coefficients.

mcmc

Should statnet's MCMC methods be used for simulating new networks? If mcmc = FALSE, new networks are simulated based on predicted tie probabilities of the regression equation.

verbose

Print details?

...

Arbitrary further arguments are handed over to the simulate.formula function or the siena07 function. For details, refer to the help page

Details

The generic gof function provides goodness-of-fit measures and degeneracy checks for btergm, ergm, SAOM, and custom dyadic-independent models. Three different types of GOF/degeneracy assessment are possible with this function:

(1) Classic statnet-type GOF assessment by comparing summary statistics of observed and simulated networks. The gof function has six built-in statistics: dyad-wise shared partners (dsp), edge-wise shared partners (esp), degree (for undirected networks only), indegree (for directed networks only), outdegree (for directed networks only), and geodesic distances. The comparison can be plotted using boxplots for the simulations and lines for the observed network(s) or printed using t-tests (testing whether simulated and observed networks are significantly different for all values in the distributions of the summary statistics).

(2) An assessment of the classification performance using receiver operating characteristics (ROC) and precision-recall (PR) curves as well as the area under the curve (AUC) for the ROC curve.

(3) For bootstrapped TERGMs: A degeneracy check by comparing the global statistics of simulated networks to those of the observed networks at each observed time step. If the global statistics differ significantly, this is indicated by small p values. If there are many significant results, this indicates degeneracy.

For all three types of GOF assessment, by default, in-sample predictive performance is assessed by comparing all observed networks to all simulations from the same networks (just like in the ergm package, but aggregated over several time steps). If an observed network or a list of observed networks is provided as the target argument, the simulations are compared to these networks instead. This is useful for out-of-sample prediction. If a formula is provided, the simulations are based on the networks and covariates specified in the formula. This is helpful in situations where complex out-of-sample predictions have to be evaluated. A usage scenario could be to simulate from a network at time t (provided through the formula argument) and compare to an observed network at time t + 1 (the target argument). This can be done, for example, to assess predictive performance between time steps of the original networks, or to check whether the model performs well with regard to a newly measured network given the old data from the previous time step.

Predictive fit can also be assessed for stochastic actor-oriented models (SAOM) as implemented in the RSiena package. After compiling the usual objects (model, data, effects), one of the time steps can be predicted based on the previous time step and the SAOM using the sienaAlgorithm (for RSiena >= 1.1-227) or sienaModel (for RSiena < 1.1-227) method of the gof function.

The gof methods for networks and matrices serve to assess the goodness of fit of a dyadic-independence model. To do this, the method requires a vector of coefficients (one coefficient for the intercept or edges term and one coefficient for each covariate), a list of covariates (in matrix or network shape), and a dependent network or matrix. This is useful for assessing the goodness of fit of QAP-adjusted logistic regression models (as implemented in the netlogit function in the sna package) or other dyadic-independence models, such as models fitted using glm. Note that this method only works with cross-sectional models and does not accept lists of networks as input data.

See also the plot.btergmgof help page for details on the plotting and printing options for GOF assessment.

Description

Usage

Arguments

Details

See Also