Computes the frequency of the minor allele across all populations and removes sites where that frequency is below a certain threshold.
filterData(rMajor, rMinor, coverage, info, threshold = NA)
a list with the following elements:
a matrix with the number of major-allele reads. Each row of this matrix is a different site and each column a different population.
a matrix with the number of minor-allele reads. Each row of this matrix is a different site and each column a different population.
a matrix with the total coverage. Each row of this matrix is a different site and each column a different population.
a data frame with 5 different columns containing: the contig name, the SNP position, the reference character of the SNP and the reference character of the major and minor allele for each of the populations. Each row of this data frame corresponds to a different site
The rMajor
, rMinor
and coverage
are similar to the corresponding
input but without any sites where the frequency of the minor-allele is
below a certain threshold.
is a matrix containing the number of observed major-allele reads. Each row of the matrix should be a different site and each column should contain information for a single population
is a matrix containing the number of observed minor-allele reads. Each row of the matrix should be a different site and each column should contain information for a single population
is a matrix containing the total depth of coverage. Each row of the matrix should be a different site and each column should contain information for a single population
is a data frame containing the remaining relevant information, such as the contig name and the position of each SNP. Each row of the matrix should be a different site.
is the maximum allowed frequency for the major allele. Sites where the allelic frequency is above this threshold are removed from the data.
The frequency of the minor allele is computed by dividing the total number of
minor-allele reads at each site and across all populations by the total
coverage of that site. The total coverage is obtained by adding the depth of
coverage of each population at each site. If a threshold is supplied, the
computed frequency is compared to that threshold and sites where the
frequency is below the threshold are removed from the dataset. If no
threshold is supplied, the threshold is assumed to be 1/total
coverage
, meaning that a site should have, at least, one minor-allele read.