Learn R Programming

metafuse (version 2.0-1)

datagenerator: simulate data

Description

Simulate a dataset with data from K different sources, for demonstration of metafuse.

Usage

datagenerator(n, beta0, family, seed = NA)

Arguments

n
a vector of length K (the total number of datasets being integrated), specifying the sample sizes of individual datasets; can also be an scalar, in which case the function simulates K datasets of equal sample size
beta0
a coefficient matrix of dimension K * p, where K is the number of datasets being integrated and p is the number of covariates, including the intercept
family
the type of the response vector, c("gaussian", "binomial", "poisson", "cox"); "gaussian" for continuous response, "binomial" for binary response, "poisson" for count response, "cox" for observed time-to-event response, with censoring indicator
seed
the random seed for data generation, default is NA

Value

Returns data frame with n*K rows (if n is a scalar), or sum(n) rows (if n is a K-element vector). The data frame contains columns "y", "x1", ..., "x_p-1" and "group" if family="gaussian", "binomial" or "poisson"; or contains columns "time", "status", "x1", ..., "x_p-1" and "group" if family="cox".

Details

These datasets are artifical, and are used to demonstrate the features of metafuse. In the case when family="cox", the response will contain two vectors, a time-to-event variable time and a censoring indicator status.

Examples

Run this code
########### generate data ###########
n <- 200    # sample size in each dataset (can also be a K-element vector)
K <- 10     # number of datasets for data integration
p <- 3      # number of covariates in X (including the intercept)

# the coefficient matrix of dimension K * p, used to specify the heterogeneous pattern
beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,   # beta_0 of intercept
                  0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,   # beta_1 of X_1
                  0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0),  # beta_2 of X_2
                K, p)

# generate a data set, family=c("gaussian", "binomial", "poisson", "cox")
data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123)
names(data)

# if family="cox", returned dataset contains columns "time"" and "status" instead of "y"
data <- datagenerator(n=n, beta0=beta0, family="cox", seed=123)
names(data)

Run the code above in your browser using DataLab