DGP: Data Generating Process

Description

Generate a data set consisting of:

"anno"A SNP-length annotation vector.
"covar"A subject by 6 covariate matrix.
"geno"A subject by SNP genotype matrix.
"pheno"A subject-length phenotype vector.

Usage

DGP(
  anno = NULL,
  beta = c(0, 1, 2),
  binary = FALSE,
  geno = NULL,
  include_residual = TRUE,
  indicator = FALSE,
  maf_range = c(0.005, 0.01),
  method = "none",
  n = 100,
  p_dmv = 0.4,
  p_ptv = 0.1,
  prop_causal = 1,
  random_signs = FALSE,
  random_var = 0,
  snps = 100,
  weights = c(1, 2, 3)
)

Value

List containing: genotypes, annotations, covariates, phenotypes.

Arguments

anno: Annotation vector, if providing genotypes. Should match the number of columns in geno.
beta: If method = "none", a (3 x 1) coefficient vector for bmvs, dmvs, and ptvs respectively. If method != "none", a scalar effect size.
binary: Generate binary phenotype? Default: FALSE.
geno: Genotype matrix, if providing genotypes.
include_residual: Include residual? If FALSE, returns the expected value. Intended for testing.
indicator: Convert raw counts to indicators? Default: FALSE.
maf_range: Range of minor allele frequencies: c(MIN, MAX).
method: Genotype aggregation method. Default: "none".
n: Sample size.
p_dmv: Frequency of deleterious missense variants. Default of 40% is based on the frequency of DMVs among rare coding variants in the UK Biobank.
p_ptv: Frequency of protein truncating variants. Default of 10% is based on the frequency of PTVs among rare coding variants in the UK Biobank.
prop_causal: Proportion of variants which are causal. Default: 1.0.
random_signs: Randomize signs? FALSE for burden-type genetic architecture, TRUE for SKAT-type.
random_var: Frailty variance in the case of random signs. Default: 0.
snps: Number of SNP in the gene. Default: 100.
weights: Aggregation weights.

Examples

Run this code

# Generate data.
data <- DGP(n = 100)

# View components.
table(data$anno)
head(data$covar)
head(data$geno[, 1:5])
hist(data$pheno)

Run the code above in your browser using DataLab