Learn R Programming

CSeQTL (version 1.0.0)

CSeQTL_dataGen: CSeQTL_dataGen

Description

Simulates a gene/SNP pair with baseline covariates XX, cell type compositions true_RHO, phased SNP genotypes true_SNP, and total (TReC) and allele-specific read counts (ASReC) contained in dat.

Usage

CSeQTL_dataGen(
  NN,
  MAF,
  true_BETA0 = log(1000),
  true_KAPPA,
  true_ETA,
  true_PHI = 0.1,
  true_PSI = 0.05,
  prob_phased = 0.05,
  true_ALPHA = NULL,
  batch = 1,
  RHO = NULL,
  cnfSNP = FALSE,
  show = TRUE
)

Value

A R list containing true parameters governing the simulated dataset, simulated covariate matrix XX, observed outcomes in dat.

Arguments

NN

Positive integer for sample size.

MAF

Positive numeric value between 0 and 1 for the minor allele frequency to simulate phased SNP genotypes assuming Hardy-Weinberg.

true_BETA0

A positive numeric value denoting the reference cell type and reference base's expression multiplied by two and log transformed. For example, if the TReC for reference base and cell type is 500, then true_BETA0 = log{2 * 500}.

true_KAPPA

A numeric vector denoting the baseline fold change in TReC between a cell type and reference. By definition, the first element is 1.

true_ETA

A numeric vector where each element denotes the fold change in TReC between the non-reference and reference base in a cell type.

true_PHI

A non-negative numeric value denoting the over-dispersion term associated with TReC. If true_PHI > 0, TReC is simulated with the negative binomial. If true_PHI = 0, TReC is simulated with the poisson.

true_PSI

A non-negative numeric value denoting the over-dispersion term associated with ASReC. If true_PSI > 0, ASReC is simulated with the beta-binomial, otherwise it is simulated with the binomial distribution.

prob_phased

A positive numeric value denoting the simulated proportion of simulated TReC that are ASReC.

true_ALPHA

By default, it is set to NULL setting each cell type with an eQTL to be cis-eQTL. Otherwise, a positive numeric vector of fold changes between TReC eQTL effect sizes and ASReC eQTL effect sizes.

batch

A numeric value set to 1 by default to allow underlying batch effects. Set to zero to eliminate batch effects.

RHO

A numeric matrix of cell type proportions where each row sums to one. If set to NULL, a matrix of cell type proportions will be simulated.

cnfSNP

A boolean value where TRUE re-arranges simulated SNPs to correlate with baseline bulk expression. When fitting the marginal model (not accounting for cell type proportions) and in the presence of cell type-specific differentiated expression, a marginal eQTL may be incorrectly inferred.

show

A boolean value to display verbose output and plot intermediate simulated results.