Learn R Programming

simsem (version 0.4-6)

sim: Run a monte carlo simulation with a structural equation model.

Description

This function can be used to generate and analyze simulated data from SimSem objects created with the model function. In this function, parameters are drawn from the specified data-generation model and used to create data, specified missingness (if any) is imposed, and data are analyzed using the specified SimSem analysis model object. Provides a SimResult) object as ouput, which summarizes analyses across replications. Data can be transformed using the datafun argument. Additional output can be extracted using the outfun argument. Paralleled processing can be enabled using the multicore argument. The sim function can also be used to obtain raw data using the dataOnly argument, to analyze pre-existing data using the rawData argument, and to simulate data that follows the distribution of a real data set using the rawData argument.

Usage

sim(nRep, model, n, generate = NULL, rawData = NULL, miss = NULL, datafun=NULL, outfun=NULL,
pmMCAR = NULL, pmMAR = NULL, facDist = NULL, indDist = NULL, errorDist = NULL, sequential = FALSE, 
modelBoot = FALSE, realData = NULL, maxDraw = 50, misfitType = "f0", 
misfitBounds = NULL, averageNumMisspec = FALSE, optMisfit=NULL, optDraws = 50, createOrder = c(1, 2, 3), 
aux = NULL, seed = 123321, silent = FALSE, multicore = FALSE, cluster = FALSE, numProc = NULL,  
paramOnly = FALSE, dataOnly=FALSE, smartStart=FALSE, ...)

Arguments

nRep
Number of replications. If any of the n, pmMCAR, or pmMAR arguments are specified as lists, the number of replications will default to the length of the list(s), and nRep need not be specified.
model
SimSem object created by model. If the generate argument is not specified, then the object in the model argument will be used for both data ge
n
Sample size. Either a single value, or a list of values to vary sample size across replications. The n argument can also be specified as a random distribution object; if any resulting values are non-integers, the decimal will be rounded.
generate
SimSem object created using the model function. If included, this argument will be used to generate data instead of the code{SimSem} object s
rawData
If specified, a list of data objects to be used in simulations instead of generating data from a SimSem template.
miss
A missing data template created using the miss function.
datafun
A function to be applied to each generated data set across replications.
outfun
A function to be applied to the lavaan output at each replication. Output from this function in each replication will be saved in the simulation output (SimResult), and can be
pmMCAR
The percentage of data completely missing at random (0 <= pmmcar="" <="" 1).="" either="" a="" single="" value="" or="" vector="" of="" values="" in="" order="" to="" vary="" across="" replications="" (with="" length="" equal="" nrep="" divisor="" nrep).="" the="" objMissing argument is only r
pmMAR
The percentage of data missing at random (0 <= pmcar="" <="" 1).="" either="" a="" single="" value="" or="" vector="" of="" values="" in="" order="" to="" vary="" across="" replications="" (with="" length="" equal="" nrep="" divisor="" nrep).="" the="" objMissing argument is only required when
facDist
Factor distributions. Either a list of SimDataDist objects or a single SimDataDist object to give all factors the same distribution. Use when sequential is
indDist
Indicator distributions. Either a list of SimDataDist objects or a single SimDataDist object to give all indicators the same distribution. Use when sequential is
errorDist
An object or list of objects of type SimDataDist indicating the distribution of errors. If a single SimDataDist is specified, each error will be genrated with that distribution.
sequential
If TRUE, a sequential method is used to generate data in which factor data is generated first, and is subsequently applied to a set of equations to obtain the indicator data. If FALSE, data is generated directly from model-implie
modelBoot
When specified, a model-based bootstrap is used for data generation (for use with the realData argument). See draw for further information.
realData
A data.frame containing real data. Generated data will follow the distribution of this data set.
maxDraw
The maximum number of attempts to draw a valid set of parameters (no negative error variance, standardized coefficients over 1).
misfitType
Character vector indicating the fit measure used to assess the misfit of a set of parameters. Can be "f0", "rmsea", "srmr", or "all".
misfitBounds
Vector that contains upper and lower bounds of the misfit measure. Sets of parameters drawn that are not within these bounds are rejected.
averageNumMisspec
If TRUE, the provided fit will be divided by the number of misspecified parameters.
optMisfit
Character vector of either "min" or "max" indicating either maximum or minimum optimized misfit. If not null, the set of parameters out of the number of draws in "optDraws" that has either the maximum or minimum misfit of the given misfit type will be ret
optDraws
Number of parameter sets to draw if optMisfit is not null. The set of parameters with the maximum or minimum misfit will be returned.
createOrder
The order of 1) applying equality/inequality constraints, 2) applying misspecification, and 3) fill unspecified parameters (e.g., residual variances when total variances are specified). The specification of this argument is a vector of different orders of
aux
The names of auxiliary variables saved in a vector.
seed
Random number seed. Reproducibility across multiple cores or clusters is ensured using R'Lecuyer package.
silent
If TRUE, suppress warnings.
multicore
If TRUE, multiple processors within a computer will be utilized.
cluster
Not applicable now. Used to specify nodes in hpc in order to be parallelizable.
numProc
Number of processors for using multiple processors. If it is NULL, the package will find the maximum number of processors.
paramOnly
If TRUE, only the parameters from each replication will be returned.
dataOnly
If TRUE, only the raw data generated from each replication will be returned.
smartStart
Defaults to FALSE. If TRUE, population parameter values that are real numbers will be used as starting values. When tested in small models, the time elapsed when using population values as starting values was greater than the time reduced during analysis,
...
Additional arguments to be passed to lavaan.

Value

  • A result object (SimResult)

See Also

  • SimResultfor the resulting output description

Examples

Run this code
loading <- matrix(0, 6, 2)
loading[1:3, 1] <- NA
loading[4:6, 2] <- NA
LY <- bind(loading, 0.7)

latent.cor <- matrix(NA, 2, 2)
diag(latent.cor) <- 1
RPS <- binds(latent.cor, 0.5)

RTE <- binds(diag(6))

VY <- bind(rep(NA,6),2)

CFA.Model <- model(LY = LY, RPS = RPS, RTE = RTE, modelType = "CFA")

# In reality, more than 5 replications are needed.
Output <- sim(5, CFA.Model,n=200)
summary(Output)

# Example of data transformation: Transforming to standard score
fun1 <- function(data) {
	temp <- scale(data)
	temp[,"group"] <- data[,"group"]
	as.data.frame(temp)
}

# Example of additional output: Extract modification indices from lavaan
fun2 <- function(out) {
	inspect(out, "mi")
}

# In reality, more than 5 replications are needed.
Output <- sim(5, CFA.Model,n=200,datafun=fun1, outfun=fun2)
summary(Output)

# Get modification indices
getExtraOutput(Output)

Run the code above in your browser using DataLab