Convert Popoolation Sync files into a pooldata object
popsync2pooldata(
sync.file = "",
poolsizes = NA,
poolnames = NA,
min.rc = 1,
min.cov.per.pool = -1,
max.cov.per.pool = 1e+06,
min.maf = 0.01,
noindel = TRUE,
nlines.per.readblock = 1e+06,
nthreads = 1
)
A pooldata object containing 7 elements:
"refallele.readcount": a matrix with nsnp rows and npools columns containing read counts for the reference allele (chosen arbitrarily) in each pool
"readcoverage": a matrix with nsnp rows and npools columns containing read coverage in each pool
"snp.info": a matrix with nsnp rows and four columns containing respectively the contig (or chromosome) name (1st column) and position (2nd column) of the SNP; the allele taken as reference in the refallele.readcount matrix (3rd column); and the alternative allele (4th column)
"poolsizes": a vector of length npools containing the haploid pool sizes
"poolnames": a vector of length npools containing the names of the pools
"nsnp": a scalar corresponding to the number of SNPs
"npools": a scalar corresponding to the number of pools
The name (or a path) of the Popoolation sync file (might be in compressed format)
A numeric vector with haploid pool sizes
A character vector with the names of pool
Minimal allowed read count per base. Bases covered by less than min.rc reads are discarded and considered as sequencing error. For instance, if nucleotides A, C, G and T are covered by respectively 100, 15, 0 and 1 over all the pools, setting min.rc to 0 will lead to discard the position (the polymorphism being considered as tri-allelic), while setting min.rc to 1 (or 2, 3..14) will make the position be considered as a SNP with two alleles A and C (the only read for allele T being disregarded).
Minimal allowed read count (per pool). If at least one pool is not covered by at least min.cov.perpool reads, the position is discarded
Maximal allowed read count (per pool). If at least one pool is covered by more than min.cov.perpool reads, the position is discarded
Minimal allowed Minor Allele Frequency (computed from the ratio overal read counts for the reference allele over the read coverage)
If TRUE, positions with at least one indel count are discarded
Number of Lines read simultaneously. Should be adapted to the available RAM.
Number of available threads for parallelization of some part of the parsing (default=1, i.e., no parallelization)
make.example.files(writing.dir=tempdir())
pooldata=popsync2pooldata(sync.file=paste0(tempdir(),"/ex.sync.gz"),poolsizes=rep(50,15))
Run the code above in your browser using DataLab