simNet: Simulate network structure and data

Description

Used for generating moderated and unmoderated adjacency matrices, along with data based on those model structures.

Usage

simNet(
  N = 100,
  p = 5,
  m = FALSE,
  m2 = 0.1,
  b1 = NULL,
  b2 = NULL,
  sparsity = 0.5,
  intercepts = NULL,
  nIter = 250,
  msym = FALSE,
  onlyDat = FALSE,
  pbar = TRUE,
  div = 10,
  gibbs = TRUE,
  ordinal = FALSE,
  nLevels = 5,
  mord = FALSE,
  time = TRUE,
  mbinary = FALSE,
  minOrd = 3,
  m1 = NULL,
  m1_range = NULL,
  m2_range = c(0.1, 0.3),
  modType = "none",
  lags = NULL,
  V = 2,
  skewErr = FALSE,
  onlyNets = FALSE,
  netArgs = NULL,
  nCores = 1,
  cluster = "SOCK",
  getChains = FALSE,
  const = 1.5,
  fixedPar = NULL,
  V2 = 1,
  ...
)

Arguments

Numeric value. Total number of subjects.

Numeric value. Total number of nodes (excluding moderator).

If a value is provided, a moderator is generated and named M in the resultant data. If TRUE, then a normal distribution with a mean of 0 will be used to generate the initial value of m, which will serve as the population mean for m throughout the simulation. If a numeric value is provided, then this will serve as the population mean, and all subsequent draws will be taken from a normal distribution with that mean. If m = "binary", then this will simply set the argument mbinary = TRUE. If m = "ordinal", this will set mord = TRUE. To simulate m from a skewed distribution, there are two options: if m = "skewed", then the alpha parameter of the sn::rmsn will automatically be set to 3. Alternatively, a vector of length two can be supplied, containing the element "skewed" as well as the desired value of alpha. Lastly, a function can be provided for m if the user wishes to sample m from another distribution. The requirement is that the function have only one argument, and only returns a single numeric value. The input of the argument should be the location parameter of the desired sampling distribution.

Numeric. If m2 >= 1, then this will determine the number of interaction effects between the moderator and some node in the network. If a value between 0 and 1 is provided, then this determines the probability of any given edge being moderated by the moderator.

Can provide an adjacency matrix to use for generating data.

Can provide an interaction matrix for generated moderated data.

sparsity

Numeric value between 0 and 1. Determines the sparsity of sampled network matrices.

intercepts

A vector of means for sampling node values.

nIter

Number of iterations for generating each instance of a datapoint with the Gibbs sampler.

msym

If TRUE then will force the interaction matrix to be symmetric.

onlyDat

If TRUE then the function only returns the simulated data.

pbar

If TRUE then a progress bar will be shown as samples are generated.

div

A value to use as a sign that the sampler diverged. Can be increased based on expected range of values. If a datapoint is larger than div, then the sampler will stop.

gibbs

If TRUE, then Gibbs sampling will be used. Otherwise, data are generated from the mvtnorm::rmvnorm function based on the partial correlation matrix that is created.

ordinal

Logical. Determines whether to generate ordinal values or not.

nLevels

Number of levels for the ordinal variables. Only relevant if ordinal is not FALSE.

mord

Logical. Determines whether the moderator variable should be simulated as ordinal.

time

If TRUE then the time it takes to simulate the data is printed to screen at the end of the sampling.

mbinary

Logical. Determines whether the moderator should be a binary variable.

minOrd

The minimum number of unique values allowed for each variable.

Functions similarly to m2, except that this argument refers to the number/probability of main effects of the moderator on any given node.

m1_range

Numeric vector of length 2. The range of values for moderator main effect coefficients.

m2_range

Numeric vector of length 2. The range of values for moderator interaction effect coefficients.

modType

Determines the type of moderation to employ, such as "none", "full", "partial". If modType = "full", then for any interaction terms there will be full moderation, such that all pairwise relationships for moderated paths will be set to zero. If modType = "partial", then pairwise edges for moderated paths will always be nonzero. If modType = "none", no constraints will be applied (e.g., could produce a mix between full and partial moderation).

lags

If TRUE or 1, then arguments are rerouted to the mlGVARsim function to simulate temporal data for a single individual.

Numeric, either 1 or 2. Determines whether to randomize the order of simulating node values at each iteration of the Gibbs sampler. If V = 2, then the order is randomized at each iteration. If V = 1, then the sampler moves through the nodes from the first to the last in order at each iteration.

skewErr

The skewness parameter for the alpha argument in the sn::rmsn function. Only relevant when gibbs = FALSE and no moderator is specified.

onlyNets

If TRUE then only the network models are returned, without the data. Could be used to create random models and then simulate data by another method.

netArgs

Only for use by the internal function modnets:::simNet2, which serves as a wrapper for the current function to prevent it from failing.

nCores

Numeric value indicating the number of CPU cores to use for the resampling. If TRUE, then the parallel::detectCores function will be used to maximize the number of cores available.

cluster

Character vector indicating which type of parallelization to use, if nCores > 1. Options include "mclapply" and "SOCK".

getChains

Logical. Determines whether to return the data-generating chains from the Gibbs sampler.

const

Numeric. The constant to be used by the internal modnets:::simPcor function.

fixedPar

Numeric. If provided, then this will be set as the coefficient value for all edges in the network. Provides a way to standardize the parameter values while varying the sparsity of the network. If length(fixedPar) == 1, then the same value will be used for all parameters. If length(fixedPar) == 2, then the first value will be for pairwise relationships, and the second value will be for interaction terms.

If V2 = 1 and m2 is between 0 and 1, the number of interaction terms in the model will be determined by multiplying m2 with the number of elements in the interaction matrix and taking the ceiling.

...

Additional arguments.

Value

Simulated network models as well as data generated from those models. For GGMs, model matrices are always symmetric. For temporal networks (when lags = 1), columns predict rows.

Warning

Importantly, the Gibbs sampler can easily diverge given certain model parameters. Generating network data based on moderator variables can produce data that quickly take on large values due to the presence of multiplicative terms. If the simulation fails, first simply try re-running the function with a different seed; this will often be sufficient to solve the problem when default parameters are specified. Additionally, one can increase the value of div, in case the sampler only diverges slightly or simply produced an anomalous value. This raises the threshold of tolerated values before the sampler stops. If supplying user-generated model matrices (for the b1 and/or b2 arguments) and the function continues to fail, you will likely need to change the parameter values in those matrices, as it may not be possible to simulate data under the given values. If simulating the model matrices inside the function (as is the default) and the function continues to fail, try adjusting the following parameters:

Try reducing the value of m2 to specify fewer interactions.
Try reducing a range with a smaller maximum for m2_range, to adjust the range of interaction coefficients.
Try adjusting the corresponding main effect parameters for the moderator, m1 and m1_range.
Try setting modType = "full" to reduce the number of main effect parameters.
Try setting a low value(s) for fixedPar, in order to provide parameter values that are known to be lower

An alternative approach could be to use the internal function simNet2, which is a wrapper designed to re-run simNet when it fails and automatically adjust simulation parameters such as div to thoroughly test a given parameterization scheme. This function can be accessed via modnets:::simNet2. There is not documentation for this function, so it is recommended to look at the source code if one wishes to use it This wrapper is also used inside the mnetPowerSim function.

Details

If no moderator is specified then data can be generated directly from a partial correlation matrix by setting gibbs = FALSE, which produces fast simulation results. Alternatively, a Gibbs sampler is used to generate data, which is the default option. For moderated networks, Gibbs sampling is the only method available.

Examples

Run this code

# NOT RUN {
# Generate a moderated GGM along with data
set.seed(1)
x <- simNet(N = 100, p = 3, m = TRUE)

net(x) # Get data-generating adjacency matrix
netInts(x) # Get data-generating interaction matrix

plot(x) # Plot the moderated network that generated the data

# Generate a single-subject GVAR model with data
set.seed(1)
x <- simNet(N = 500, p = 3, m = TRUE, lags = 1)

net(x, n = 'temporal') # Get the data-generating time-lagged adjacency matrix
net(x, n = 'contemporaneous') # Get the data-generating standardized residual covariance matrix

plot(x, which.net = 'beta') # 'beta' is another way of referring to the temporal network
plot(x, which.net = 'pcc') # 'pcc' is another way of referring to the contemporaneous network
# }

Run the code above in your browser using DataLab