DGP: Data Generating Process

Description

Generate a data set consisting of:

anno: (snps x 1) annotation vector.
covar: (subjects x 6) covariate matrix.
geno: (subjects x snps) genotype matrix.
pheno: (subjects x 1) phenotype vector.
type: Either "binary" or "quantitative".

Usage

DGP(
  anno = NULL,
  beta = c(1, 2, 3),
  binary = FALSE,
  geno = NULL,
  include_residual = TRUE,
  indicator = FALSE,
  maf_range = c(0.001, 0.005),
  method = "none",
  n = 100,
  prop_anno = c(0.5, 0.4, 0.1),
  prop_causal = 1,
  random_signs = FALSE,
  random_var = 0,
  snps = 100,
  weights = c(1, 1, 1)
)

Value

List containing: genotypes, annotations, covariates, phenotypes.

Arguments

anno: Annotation vector, if providing genotypes. Should match the number of columns in geno.
beta: If method = "none", a (L x 1) coefficient with effect sizes for each annotation category. By default, there are L = 3 annotation categories corresponding to BMVs, DMVs, and PTVs. If method != "none", a scalar effect size for the allelic series burden score.
binary: Generate binary phenotype? Default: FALSE.
geno: Genotype matrix, if providing genotypes.
include_residual: Include residual? If FALSE, returns the expected value. Intended for testing.
indicator: Convert raw counts to indicators? Default: FALSE.
maf_range: Range of minor allele frequencies: c(MIN, MAX).
method: Genotype aggregation method. Default: "none".
n: Sample size.
prop_anno: Proportions of annotations in each category. Length should equal the number of annotation categories. Default of c(0.5, 0.4, 0.1) is based on the approximate empirical frequencies of BMVs, DMVs, and PTVs.
prop_causal: Proportion of variants which are causal. Default: 1.0.
random_signs: Randomize signs? FALSE for burden-type genetic architecture, TRUE for SKAT-type.
random_var: Frailty variance in the case of random signs. Default: 0.
snps: Number of SNP in the gene. Default: 100.
weights: Annotation category weights. Length should match prop_anno.

Examples

Run this code

# Generate data.
data <- DGP(n = 100)

# View components.
table(data$anno)
head(data$covar)
head(data$geno[, 1:5])
hist(data$pheno)

Run the code above in your browser using DataLab