simureads_mono: simureads_mono

Description

Simulate read counts for monomorphic position when there is sequencing error

Usage

.simureads_mono(npos, npop, lambda, overdisp, min_rc, min_maf, eps)

Value

Return an Integer matrix with nsnp rows and 2*npop columns (1:npop=ref allele readcount; (npop+1):2*npop=coverage)

Arguments

npos: Integer giving the number of positions (close to genome size)
npop: Integer giving the number of population samples
lambda: Numeric Vector of length npop giving the expected coverage of each pool
overdisp: Numeric value giving overdispersion of coverages and their distribution (see details)
min_rc: Integer giving the minimal read count for an allele to be considered as true allele
min_maf: Float giving the MAF threshold for SNP filtering
eps: Numeric value giving the sequencing error

Details

The function implements a simulation approach similar to that described in Gautier et al. (2021). Read coverages are sampled from a distribution specified by the lambda and overdisp vectors. Note that overdisp is the same for all pop sample but lambda (expected coverages) may vary across pool. If overdisp=1 (default in the R function), coverages are assumed Poisson distributed and the mean and variance of the coverages for the pool are both equal to the value specified in the lambda vector. If overdisp>1, coverages follows a Negative Binomial distribution with a mean equal the lamda but a variance equal to overdisp*lambda. Finally, if overdisp<1, no variation in coverage is introduced and all coverages are equal to the value specified in the lambda vector although they may (slightly) vary in the output when eps>0 due to the removal of error reads. The eps parameter control sequencing error rate. Sequencing errors are modeled following Gautier et al. (2021) i.e. read counts for the four possible bases are sampled from a multinomial distribution Multinom(c,{1-eps;eps/3,eps/3,eps/3}) where c is the read coverage. Only bi-allelic SNPs (after considering min_rc) satisfying with MAF>min_maf are included in the output.

Examples

Run this code

Run the code above in your browser using DataLab