popsync2pooldata: Convert Popoolation Sync files into a pooldata object

Description

Convert Popoolation Sync files into a pooldata object

Usage

popsync2pooldata(
  sync.file = "",
  poolsizes = NA,
  poolnames = NA,
  min.rc = 1,
  min.cov.per.pool = -1,
  max.cov.per.pool = 1e+06,
  min.maf = 0.01,
  noindel = TRUE,
  nlines.per.readblock = 1e+06,
  nthreads = 1
)

Value

A pooldata object containing 7 elements:

"refallele.readcount": a matrix with nsnp rows and npools columns containing read counts for the reference allele (chosen arbitrarily) in each pool
"readcoverage": a matrix with nsnp rows and npools columns containing read coverage in each pool
"snp.info": a matrix with nsnp rows and four columns containing respectively the contig (or chromosome) name (1st column) and position (2nd column) of the SNP; the allele taken as reference in the refallele.readcount matrix (3rd column); and the alternative allele (4th column)
"poolsizes": a vector of length npools containing the haploid pool sizes
"poolnames": a vector of length npools containing the names of the pools
"nsnp": a scalar corresponding to the number of SNPs
"npools": a scalar corresponding to the number of pools

Arguments

sync.file: The name (or a path) of the Popoolation sync file (might be in compressed format)
poolsizes: A numeric vector with haploid pool sizes
poolnames: A character vector with the names of pool
min.rc: Minimal allowed read count per base. Bases covered by less than min.rc reads are discarded and considered as sequencing error. For instance, if nucleotides A, C, G and T are covered by respectively 100, 15, 0 and 1 over all the pools, setting min.rc to 0 will lead to discard the position (the polymorphism being considered as tri-allelic), while setting min.rc to 1 (or 2, 3..14) will make the position be considered as a SNP with two alleles A and C (the only read for allele T being disregarded).
min.cov.per.pool: Minimal allowed read count (per pool). If at least one pool is not covered by at least min.cov.perpool reads, the position is discarded
max.cov.per.pool: Maximal allowed read count (per pool). If at least one pool is covered by more than min.cov.perpool reads, the position is discarded
min.maf: Minimal allowed Minor Allele Frequency (computed from the ratio overal read counts for the reference allele over the read coverage)
noindel: If TRUE, positions with at least one indel count are discarded
nlines.per.readblock: Number of Lines read simultaneously. Should be adapted to the available RAM.
nthreads: Number of available threads for parallelization of some part of the parsing (default=1, i.e., no parallelization)

Examples

Run this code

 make.example.files(writing.dir=tempdir())
 pooldata=popsync2pooldata(sync.file=paste0(tempdir(),"/ex.sync.gz"),poolsizes=rep(50,15))

Run the code above in your browser using DataLab