Learn R Programming

poolABC (version 1.0.0)

filterData: Filter the data by the frequency of the minor allele

Description

Computes the frequency of the minor allele across all populations and removes sites where that frequency is below a certain threshold.

Usage

filterData(rMajor, rMinor, coverage, info, threshold = NA)

Value

a list with the following elements:

rMajor

a matrix with the number of major-allele reads. Each row of this matrix is a different site and each column a different population.

rMinor

a matrix with the number of minor-allele reads. Each row of this matrix is a different site and each column a different population.

coverage

a matrix with the total coverage. Each row of this matrix is a different site and each column a different population.

info

a data frame with 5 different columns containing: the contig name, the SNP position, the reference character of the SNP and the reference character of the major and minor allele for each of the populations. Each row of this data frame corresponds to a different site

The rMajor, rMinor and coverage are similar to the corresponding input but without any sites where the frequency of the minor-allele is below a certain threshold.

Arguments

rMajor

is a matrix containing the number of observed major-allele reads. Each row of the matrix should be a different site and each column should contain information for a single population

rMinor

is a matrix containing the number of observed minor-allele reads. Each row of the matrix should be a different site and each column should contain information for a single population

coverage

is a matrix containing the total depth of coverage. Each row of the matrix should be a different site and each column should contain information for a single population

info

is a data frame containing the remaining relevant information, such as the contig name and the position of each SNP. Each row of the matrix should be a different site.

threshold

is the maximum allowed frequency for the major allele. Sites where the allelic frequency is above this threshold are removed from the data.

Details

The frequency of the minor allele is computed by dividing the total number of minor-allele reads at each site and across all populations by the total coverage of that site. The total coverage is obtained by adding the depth of coverage of each population at each site. If a threshold is supplied, the computed frequency is compared to that threshold and sites where the frequency is below the threshold are removed from the dataset. If no threshold is supplied, the threshold is assumed to be 1/total coverage, meaning that a site should have, at least, one minor-allele read.