Learn R Programming

⚠️There's a newer version (0.9.2) of this package.Take me there.

simstudy

The simstudy package is collection of functions that allow users to generate simulated data sets in order to explore modeling techniques or better understand data generating processes. The user defines the distributions of individual variables, specifies relationships between covariates and outcomes, and generates data based on these specifications. The final data sets can represent randomized control trials, repeated measure designs, cluster randomized trials, or naturally observed data processes. Other complexities that can be added include survival data, correlated data, factorial study designs, step wedge designs, and missing data processes.

Simulation using simstudy has two fundamental steps. The user (1) defines the data elements of a data set and (2) generates the data based on these definitions. Additional functionality exists to simulate observed or randomized treatment assignment/exposures, to create longitudinal/panel data, to create multi-level/hierarchical data, to create datasets with correlated variables based on a specified covariance structure, to merge datasets, to create data sets with missing data, and to create non-linear relationships with underlying spline curves.

The overarching philosophy of simstudy is to create data generating processes that mimic the typical models used to fit those types of data. So, the parameterization of some of the data generating processes may not follow the standard parameterizations for the specific distributions. For example, in simstudy gamma-distributed data are generated based on the specification of a mean (\mu) (or (log(\mu))) and a dispersion (d), rather than shape (\alpha) and rate (\beta) parameters that more typically characterize the gamma distribution. When we estimate the parameters, we are modeling (\mu) (or some function of ((\mu))), so we should explicitly recover the simstudy parameters used to generate the model - illuminating the relationship between the underlying data generating processes and the models.

Installation

You can install the released version of simstudy from CRAN with:

install.packages("simstudy")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("kgoldfeld/simstudy")

Example

Here is some simple sample code, much more in the vignettes:

library(simstudy)

def <- defData(varname="x", formula = 10, variance = 2)
def <- defData(def, varname="y", formula = "3 + 0.5 * x", variance = 1)
dd <- genData(250, def)

dd <- trtAssign(dd, nTrt = 4, grpName = "grp", balanced = TRUE)

dd
#>       id         x        y grp
#>   1:   1  9.614759 8.388761   3
#>   2:   2  8.286225 7.616225   2
#>   3:   3 11.035302 9.420590   1
#>   4:   4 10.333926 8.248654   1
#>   5:   5 11.199110 9.349459   3
#>  ---                           
#> 246: 246  9.276009 7.984796   2
#> 247: 247  9.086318 6.618701   2
#> 248: 248  8.628892 5.750542   1
#> 249: 249  8.823797 8.543863   4
#> 250: 250 11.527290 9.584074   3

Code of Conduct

Please note that the simstudy project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('simstudy')

Monthly Downloads

980

Version

0.2.0

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Keith Goldfeld

Last Published

October 6th, 2020

Functions in simstudy (0.2.0)

catProbs

Generate Categorical Formula
addMultiFac

Add multi-factorial data
addCondition

Add a single column to existing data set based on a condition
addPeriods

Create longitudinal/panel data
addCorGen

Create multivariate (correlated) data - for general distributions
addCorFlex

Create multivariate (correlated) data - for general distributions
betaGetShapes

Convert beta mean and precision parameters to two shape parameters
addColumns

Add columns to existing data set
addCorData

Add correlated data to existing data.table
addMarkov

Add Markov chain
genCorMat

Create a correlation matrix
defSurv

Add single row to survival definitions
defReadCond

Read external csv data set definitions for adding columns
genFactor

Create factor variable from an existing (non-double) variable
genFormula

Generate a linear formula
genCorOrdCat

Generate correlated ordinal categorical data
defData

Add single row to definitions table
trimData

Trim longitudinal data file once an event has occurred
defCondition

Add single row to definitions table of conditions that will be used to add data to an existing definitions table
gammaGetShapeRate

Convert gamma mean and dispersion parameters to shape and rate parameters
defRead

Read external csv data set definitions
delColumns

Delete columns from existing data set
genCorFlex

Create multivariate (correlated) data - for general distributions
defDataAdd

Add single row to definitions table that will be used to add data to an existing data.table
defMiss

Definitions for missing data
genOrdCat

Generate ordinal categorical data
mergeData

Merge two data tables
genCorGen

Create multivariate (correlated) data - for general distributions
distributions

Distributions for Data Definitions
genSpline

Generate spline curves
genCatFormula

Generate Categorical Formula
genSurv

Generate survival data
defReadAdd

Read external csv data set definitions for adding columns
genData

Calling function to simulate data
genMixFormula

Generate Mixture Formula
genMarkov

Generate Markov chain
trtObserve

Observed exposure or treatment
trtStepWedge

Assign treatment for stepped-wedge design
genNthEvent

Generate event data using longitudinal data, and restrict output to time until the nth event.
genObs

Create an observed data set that includes missing data
genCorData

Create correlated data
genCluster

Simulate clustered data
iccRE

Generate variance for random effects that produce desired intra-class coefficients (ICCs) for clustered data.
genMultiFac

Generate multi-factorial data
genDummy

Create dummy variables from a factor or integer variable
simstudy-deprecated

Deprecated functions in simstudy
negbinomGetSizeProb

Convert negative binomial mean and dispersion parameters to size and prob parameters
viewBasis

Plot basis spline functions
simstudy-package

simstudy: Simulation of Study Data
updateDef

Update definition table
updateDefAdd

Update definition table
genMiss

Generate missing data
trtAssign

Assign treatment
viewSplines

Plot spline curves