Learn R Programming

haploReconstruct (version 0.1.2)

sync_to_frequencies: Data input from a sync file

Description

Reads in SNP time series data from a file with .sync format.

Usage

sync_to_frequencies(file, base.pops, header, mincov = 15)

Arguments

file
the name of the ".sync" file where the data should be read from. Sync files are specified in Kofler et al. (2011). Sync files contain 3 + n columns with; col 1: chromosome (reference contig), col 2: position (in the reference contig), col 3: reference allele, col >3: sync entries for allele frequencies for all populations in the form A-count:T-count:C-count:G-count:N-count:deletion-count. Sync files originally don't have a header but headers are accepted when specified with header=T.
base.pops
logical vector with the same length as the number of libraries present in the sync file. Libraries indicated with TRUE will be used for identification on the two main alleles (minor and major allele). Allele frequencies of all libraries will subsequently be polarized for the minor allele in this specified subset.
header
logical value specifying whether a header is present in the provided sync file.
mincov
minimum coverage to calculate allele frequencies. If the sum of allele counts of the minor and major allele are below this threshold the respective frequency will be encoded as NA (default=15).

Value

a data.table with 6 plus N columns with; col 1: chr (chromosome), col 2: pos (position on respective chromosome), col 3: ref (reference allele), col 4: minallele (minor allele across all specified base populations), col 5: majallele (major allele across all specified base populations), col 6: weighted mean frequency of all specified base populations poloarlized for the minor allele, col >6: allele frequency of the minor allele for each library

Details

Time series data from a file with sync format are read in. The sync format is specified in Kofler et al. 2011 (PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq)). Allele counts are read in for each library and SNP and transformed to allele frequencies. Allele frequencies are polarized for the minor and major allele of a specifies (sub-)set of libraries, i.e. libraries of the experimentla founder population. Frequencies are determined only based on the counts of the two most common alleles in the specified base populations base.pops. Please note: This procedure does not substitute a proper SNP calling. Provided sync files are expected only to contain positions of previously called SNPs and at least two alleles should be present in the specified base populations.

References

Franssen, Barton & Schloetterer 2016, Reconstruction of haplotype-blocks selected during experimental evolution, MBE