Learn R Programming

RSiena (version 1.1-232)

sienaGOF: Functions to assess goodness of fit for SAOMs

Description

The function sienaGOF assesses goodness of fit for a model specification as represented by an estimated sienaFit object. This is done by simulations of auxiliary statistics, that differ from the statistics used for estimating the parameters. The auxiliary statistics must be given explicitly. The fit is good if the average values of the auxiliary statistics over many simulation runs are close to the values observed in the data. A Monte Carlo test based on the Mahalanobis distance is used to calculate frequentist \(p\)-values. Plotting functions can be used to diagnose bad fit. There are basic functions for calculating auxiliary statistics available out of the box, and the user is also permitted to create custom functions.

Usage

sienaGOF(sienaFitObject, auxiliaryFunction,
         period=NULL, verbose=FALSE, join=TRUE, twoTailed=FALSE,
         cluster=NULL, robust=FALSE, groupName="Data1",
         varName, …)
# S3 method for sienaGOF
plot(x, center=FALSE, scale=FALSE, violin=TRUE, key=NULL,
          perc=.05, period=1, main=main, ylab=ylab, …)

Arguments

sienaFitObject

Results from a call to siena07 with returnDeps = TRUE.

auxiliaryFunction

Function to be used to calculate the auxiliary statistics; this can be a user-defined function, e.g. depending on the sna or igraph packages.

See Examples and sienaGOF-auxiliary for more information on the signature of this function. The basic signature is function(index, data, sims, period, groupName, varName, …), where index is the index of the simulated network, or NULL if the observed variable is needed; data is the observed data object from which the relevant variables are extracted; sims is the list of simulations returned from siena07; period is the index of the period; and … are further arguments (like levls in the examples below and in sienaGOF-auxiliary).

period

Vector of period(s) to be used (may run from 1 to number of waves - 1). Has an effect only if join=FALSE.

verbose

Whether to print intermediate results. This may give some peace of mind to the user because calculations can take some time.

join

Boolean: should sienaGOF do tests on all of the periods individually (FALSE), or sum across periods (TRUE)?

twoTailed

Whether to use two tails for calculating \(p\)-values on the Monte Carlo test. Recommended for advanced users only, as it is probably only applicable in rare cases.

cluster

Optionally, a snow cluster to execute the auxiliary function calculations on.

robust

Whether to use robust estimation of the covariance matrix.

groupName

Name of group; relevant for multi-group data sets.

varName

Name of dependent variable.

x

Result from a call to sienaGOF.

center

Whether to center the statistics by median during plotting.

scale

Whether to scale the statistics by range during plotting.

violin

Use violin plots (vs. box plots only)?

key

Keys in the plot for the levels of the auxiliary statistic (as given by parameter levls in the examples).

perc

1 minus confidence level for the confidence bands (two sided).

main

Main title of the plot.

ylab

The y-axis label for the plot.

Other arguments.

Value

sienaGOF returns a result of class sienaGOF; this is a list of elements of class sienaGofTest; if join=TRUE, the list has length 1; if join=FALSE, each list element corresponds to a period analyzed; the list elements are themselves lists again, including the following elements:

* Observations

The observed values for the auxiliary statistics.

* Simulations

The simulated auxiliary statistics.

* ObservedTestStat

The observed Mahalobis distance in the data.

* SimulatedTestStat

The Mahalobis distance for the simulations.

* TwoTailed

Whether the \(p\)-value corresponds to a one- or two-tailed Monte Carlo test.

* p

The \(p\)-value for the observed Mahalanobis distance in the permutation distribution of the simulated Mahalanobis distances.

* Rank

Rank of the covariance matrix of the simulated auxiliary statistics.

Details

This function is used to assess the goodness of fit of a stochastic actor oriented model for an arbitrarily defined multidimensional auxiliary statistic. The auxiliary statistics are calculated for the simulated dependent variables in Phase 3 of the estimation algorithm, returned in sienaFitObject because of having used returnDeps = TRUE in the call to siena07. These statistics should be chosen to represent features of the network that are not explicitly fit by the estimation procedure but can be considered important properties that the model at hand should represent well. Some examples are:

  • Outdegree distribution

  • Indegree distribution

  • Distribution of the dependent behavior variable (if any).

  • Distribution of geodesic distances

  • Triad census

  • Edgewise homophily counts

  • Edgewise shared partner counts

  • Statistics depending on the combination of network and behavioral variables.

The function is written so that the user can easily define other functions to capture some other relevant aspects of the network, behaviors, etc. This is further illustrated in the help page sienaGOF-auxiliary.

We recommend the following heuristic approach to model checking:

  1. Check convergence of the estimation.

  2. Assess time heterogeneity by sienaTimeTest and if there is evidence for time heterogeneity either modify the base effects or include time dummy terms.

  3. Assess goodness of fit (primarily using join=TRUE) on auxiliary statistics, and if necessary refine the model.

The print function will display some useful information to help with model selection if some effects are set to FIX and TEST on the effects object. A rough estimator for the Mahalanobis distance that would be obtained at each proposed specification is given in the output. This can help guide model selection. This estimator is called the modified Mahalanobis distance (MMD). See Lospinoso (2012), the manual, or the references for more information.

The following functions are pre-fabricated for ease of use, and can be passed in as the auxiliaryFunction with no extra effort; see sienaGOF-auxiliary and the examples below.

References

  • See http://www.stats.ox.ac.uk/~snijders/siena/ for general information on RSiena.

  • Lospinoso, J.A. and Snijders, T.A.B., “Goodness of fit for Stochastic Actor Oriented Models.” Presentation given at Sunbelt XXXI, St. Pete's Beach, Fl. 2011.

  • Lospinoso, J.A. (2012). “Statistical Models for Social Network Dynamics.” Ph.D. Thesis. University of Oxford: U.K.

See Also

siena07, sienaGOF-auxiliary, sienaTimeTest

Examples

Run this code
# NOT RUN {
   mynet1 <- sienaDependent(array(c(s501, s502, s503), dim=c(50, 50, 3)))
   mynet2 <- sienaDependent(array(c(s503, s502, s501), dim=c(50, 50, 3)))
   mybeh <- sienaDependent(s50a, type='behavior')
   mydata <- sienaDataCreate(mynet1, mynet2, mybeh)
   myeff <- getEffects(mydata)
   myeff <- includeEffects(myeff, transTrip)
   myeff <- includeEffects(myeff, recip, name="mynet2")
   myeff <- setEffect(myeff, cycle3, fix=TRUE, test=TRUE, include=TRUE)
   myeff  <- setEffect(myeff, nbrDist2, fix=TRUE, test=TRUE, include=TRUE)
   myeff <- setEffect(myeff, transTies, fix=TRUE, test=TRUE, include=TRUE)
   myalgorithm <- sienaAlgorithmCreate(n3=200) # Shorter phase 3, just for example.
   ans <- siena07(myalgorithm, data=mydata, effects=myeff, returnDeps=TRUE)
   gofi <- sienaGOF(ans, IndegreeDistribution, verbose=TRUE, join=TRUE,
                    varName="mynet1")
   gofi
   plot(gofi)

   gofi2 <- sienaGOF(ans, IndegreeDistribution, verbose=TRUE, join=TRUE,
                     varName="mynet2")
   gofi2
   plot(gofi2)

   gofb <- sienaGOF(ans, BehaviorDistribution, varName = "mybeh",
                    verbose=TRUE, join=TRUE)
   plot(gofb)

   gofo <- sienaGOF(ans, OutdegreeDistribution, verbose=TRUE, join=TRUE,
	                varName="mynet1", cumulative=FALSE)
   # cumulative is an example of "...".
   gofo
   plot(gofo)
# }

Run the code above in your browser using DataLab