sim: Run a monte carlo simulation with a structural equation model.

Description

This function can be used to generate and analyze simulated data from SimSem objects created with the model function. In this function, parameters are drawn from the specified data-generation model and used to create data, specified missingness (if any) is imposed, and data are analyzed using the specified SimSem analysis model object. Provides a SimResult) object as ouput, which summarizes analyses across replications. Data can be transformed using the datafun argument. Additional output can be extracted using the outfun argument. Paralleled processing can be enabled using the multicore argument. The sim function can also be used to obtain raw data using the dataOnly argument, to analyze pre-existing data using the rawData argument, and to simulate data that follows the distribution of a real data set using the rawData argument.

Usage

sim(nRep, model, n, generate = NULL, rawData = NULL, miss = NULL, datafun=NULL, outfun=NULL,
pmMCAR = NULL, pmMAR = NULL, facDist = NULL, indDist = NULL, errorDist = NULL, sequential = FALSE, 
modelBoot = FALSE, realData = NULL, maxDraw = 50, misfitType = "f0", 
misfitBounds = NULL, averageNumMisspec = FALSE, optMisfit=NULL, optDraws = 50, createOrder = c(1, 2, 3), 
aux = NULL, seed = 123321, silent = FALSE, multicore = FALSE, cluster = FALSE, numProc = NULL,  
paramOnly = FALSE, dataOnly=FALSE, smartStart=FALSE, ...)

Arguments

nRep

Number of replications. If any of the n, pmMCAR, or pmMAR arguments are specified as lists, the number of replications will default to the length of the list(s), and nRep need not be specified.

model

SimSem object created by model. If the generate argument is not specified, then the object in the model argument will be used for both data ge

Sample size. Either a single value, or a list of values to vary sample size across replications. The n argument can also be specified as a random distribution object; if any resulting values are non-integers, the decimal will be rounded.

generate

SimSem object created using the model function. If included, this argument will be used to generate data instead of the code{SimSem} object s

rawData

If specified, a list of data objects to be used in simulations instead of generating data from a SimSem template.

miss

A missing data template created using the miss function.

datafun

A function to be applied to each generated data set across replications.

outfun

A function to be applied to the lavaan output at each replication. Output from this function in each replication will be saved in the simulation output (SimResult), and can be

pmMCAR

The percentage of data completely missing at random (0 <= pmmcar="" <="" 1).="" either="" a="" single="" value="" or="" vector="" of="" values="" in="" order="" to="" vary="" across="" replications="" (with="" length="" equal="" nrep="" divisor="" nrep).="" the="" objMissing argument is only r

pmMAR

The percentage of data missing at random (0 <= pmcar="" <="" 1).="" either="" a="" single="" value="" or="" vector="" of="" values="" in="" order="" to="" vary="" across="" replications="" (with="" length="" equal="" nrep="" divisor="" nrep).="" the="" objMissing argument is only required when

facDist

Factor distributions. Either a list of SimDataDist objects or a single SimDataDist object to give all factors the same distribution. Use when sequential is

indDist

Indicator distributions. Either a list of SimDataDist objects or a single SimDataDist object to give all indicators the same distribution. Use when sequential is

errorDist

An object or list of objects of type SimDataDist indicating the distribution of errors. If a single SimDataDist is specified, each error will be genrated with that distribution.

sequential

If TRUE, a sequential method is used to generate data in which factor data is generated first, and is subsequently applied to a set of equations to obtain the indicator data. If FALSE, data is generated directly from model-implie

modelBoot

When specified, a model-based bootstrap is used for data generation (for use with the realData argument). See draw for further information.

realData

A data.frame containing real data. Generated data will follow the distribution of this data set.

maxDraw

The maximum number of attempts to draw a valid set of parameters (no negative error variance, standardized coefficients over 1).

misfitType

Character vector indicating the fit measure used to assess the misfit of a set of parameters. Can be "f0", "rmsea", "srmr", or "all".

misfitBounds

Vector that contains upper and lower bounds of the misfit measure. Sets of parameters drawn that are not within these bounds are rejected.

averageNumMisspec

If TRUE, the provided fit will be divided by the number of misspecified parameters.

optMisfit

Character vector of either "min" or "max" indicating either maximum or minimum optimized misfit. If not null, the set of parameters out of the number of draws in "optDraws" that has either the maximum or minimum misfit of the given misfit type will be ret

optDraws

Number of parameter sets to draw if optMisfit is not null. The set of parameters with the maximum or minimum misfit will be returned.

createOrder

The order of 1) applying equality/inequality constraints, 2) applying misspecification, and 3) fill unspecified parameters (e.g., residual variances when total variances are specified). The specification of this argument is a vector of different orders of

aux

The names of auxiliary variables saved in a vector.

seed

Random number seed. Reproducibility across multiple cores or clusters is ensured using R'Lecuyer package.

silent

If TRUE, suppress warnings.

multicore

If TRUE, multiple processors within a computer will be utilized.

cluster

Not applicable now. Used to specify nodes in hpc in order to be parallelizable.

numProc

Number of processors for using multiple processors. If it is NULL, the package will find the maximum number of processors.

paramOnly

If TRUE, only the parameters from each replication will be returned.

dataOnly

If TRUE, only the raw data generated from each replication will be returned.

smartStart

Defaults to FALSE. If TRUE, population parameter values that are real numbers will be used as starting values. When tested in small models, the time elapsed when using population values as starting values was greater than the time reduced during analysis,

...

Additional arguments to be passed to lavaan.

Value

A result object (SimResult)

Examples

Run this code

loading <- matrix(0, 6, 2)
loading[1:3, 1] <- NA
loading[4:6, 2] <- NA
LY <- bind(loading, 0.7)

latent.cor <- matrix(NA, 2, 2)
diag(latent.cor) <- 1
RPS <- binds(latent.cor, 0.5)

RTE <- binds(diag(6))

VY <- bind(rep(NA,6),2)

CFA.Model <- model(LY = LY, RPS = RPS, RTE = RTE, modelType = "CFA")

# In reality, more than 5 replications are needed.
Output <- sim(5, CFA.Model,n=200)
summary(Output)

# Example of data transformation: Transforming to standard score
fun1 <- function(data) {
	temp <- scale(data)
	temp[,"group"] <- data[,"group"]
	as.data.frame(temp)
}

# Example of additional output: Extract modification indices from lavaan
fun2 <- function(out) {
	inspect(out, "mi")
}

# In reality, more than 5 replications are needed.
Output <- sim(5, CFA.Model,n=200,datafun=fun1, outfun=fun2)
summary(Output)

# Get modification indices
getExtraOutput(Output)

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples