Learn R Programming

AllelicSeries (version 0.1.1.5)

DGP: Data Generating Process

Description

Generate a data set consisting of:

  • anno: (snps x 1) annotation vector.

  • covar: (subjects x 6) covariate matrix.

  • geno: (subjects x snps) genotype matrix.

  • pheno: (subjects x 1) phenotype vector.

  • type: Either "binary" or "quantitative".

Usage

DGP(
  anno = NULL,
  beta = c(1, 2, 3),
  binary = FALSE,
  geno = NULL,
  include_residual = TRUE,
  indicator = FALSE,
  maf_range = c(0.001, 0.005),
  method = "none",
  n = 100,
  prop_anno = c(0.5, 0.4, 0.1),
  prop_causal = 1,
  random_signs = FALSE,
  random_var = 0,
  snps = 100,
  weights = c(1, 1, 1)
)

Value

List containing: genotypes, annotations, covariates, phenotypes.

Arguments

anno

Annotation vector, if providing genotypes. Should match the number of columns in geno.

beta

If method = "none", a (L x 1) coefficient with effect sizes for each annotation category. By default, there are L = 3 annotation categories corresponding to BMVs, DMVs, and PTVs. If method != "none", a scalar effect size for the allelic series burden score.

binary

Generate binary phenotype? Default: FALSE.

geno

Genotype matrix, if providing genotypes.

include_residual

Include residual? If FALSE, returns the expected value. Intended for testing.

indicator

Convert raw counts to indicators? Default: FALSE.

maf_range

Range of minor allele frequencies: c(MIN, MAX).

method

Genotype aggregation method. Default: "none".

n

Sample size.

prop_anno

Proportions of annotations in each category. Length should equal the number of annotation categories. Default of c(0.5, 0.4, 0.1) is based on the approximate empirical frequencies of BMVs, DMVs, and PTVs.

prop_causal

Proportion of variants which are causal. Default: 1.0.

random_signs

Randomize signs? FALSE for burden-type genetic architecture, TRUE for SKAT-type.

random_var

Frailty variance in the case of random signs. Default: 0.

snps

Number of SNP in the gene. Default: 100.

weights

Annotation category weights. Length should match prop_anno.

Examples

Run this code
# Generate data.
data <- DGP(n = 100)

# View components.
table(data$anno)
head(data$covar)
head(data$geno[, 1:5])
hist(data$pheno)

# Generate data with L != 3 categories.
data <- DGP(
  beta = c(1, 2, 3, 4),
  prop_anno = c(0.25, 0.25, 0.25, 0.25),
  weights = c(1, 1, 1, 1)
)

Run the code above in your browser using DataLab