simureads_poly: simureads_poly

Description

Simulate read counts from count data

Usage

.simureads_poly(
  y_count,
  n_count,
  lambda,
  overdisp,
  min_rc,
  min_maf,
  eps,
  eps_exp
)

Value

Return an Integer matrix with nsnp rows and 2*npop columns (1:npop=ref allele readcount; (npop+1):2*npop=coverage)

Arguments

y_count: Integer Matrix with nsnp rows and npop columns giving allele counts at the reference allele
n_count: Integer Matrix with nsnp rows and npop columns giving total counts
lambda: Numeric Vector of length npop giving the expected coverage of each pool
overdisp: Numeric value giving overdispersion of coverages and their distribution (see details)
min_rc: Integer giving the minimal read count for an allele to be considered as true allele
min_maf: Float giving the MAF threshold for SNP filtering
eps: Numeric value giving the sequencing error
eps_exp: Numeric value giving the experimental error leading to unequal contribution of individual to the pool reads

Details

The function implements a simulation approach similar to that described in Gautier et al. (2021). Read coverages are sampled from a distribution specified by the lambda and overdisp vectors. Note that overdisp is the same for all pop sample but lambda (expected coverages) may vary across pool. If overdisp=1 (default in the R function), coverages are assumed Poisson distributed and the mean and variance of the coverages for the pool are both equal to the value specified in the lambda vector. If overdisp>1, coverages follows a Negative Binomial distribution with a mean equal the lamda but a variance equal to overdisp*lambda. Finally, if overdisp<1, no variation in coverage is introduced and all coverages are equal to the value specified in the lambda vector although they may (slightly) vary in the output when eps>0 due to the removal of error reads. The eps parameter control sequencing error rate. Sequencing errors are modeled following Gautier et al. (2021) i.e. read counts for the four possible bases are sampled from a multinomial distribution Multinom(c,{f*(1-eps)+(1-f)*eps/3;f*eps/3+(1-f)*(1-eps),eps/3,eps/3}) where c is the read coverage and f the reference allele frequencies (obtained from the count data). Experimental error eps_exp control the contribution of individual (assumed diploid) to the pools following the model described in Gautier et al. (2013). The parameter eps_exp corresponds to the coefficient of variation of the individual contributions When eps_exp tends toward 0, all individuals contribute equally to the pool and there is no experimental error. For example, with 10 individuals, eps_exp=0.5 correspond to a situation where 5 individuals contribute 2.8x more reads than the five others. Note that the number of (diploid) individuals for each SNP and pop. sample is deduced from the input total count (it may thus differ over SNP when the total counts are not the same).

Examples

Run this code