cellfrequency_pdf: Computes the probability distribution of cellular frequencies for a single mutation.
Description
Calculates $P$ - the probability density distribution of cellular frequencies for one single mutation. For each $f$, the value of $P(f)$ reflects the probability that the mutation is present in a fraction $f$ of cells.
Usage
cellfrequency_pdf(af, cnv, pnb, freq, max_PM=6)
Arguments
af
The allelic frequency at which the muation has been observed.
cnv
The ploidy of the locus in which the mutation is embedded.
pnb
The ploidy of the B-allele in normal cells (binary variable: 1 if the mutation is a germline variant, 0 if somatic).
freq
Array of cellular frequencies at which the probabilities will be calculated.
max_PM
Upper threshold for the number of amplicons per mutated cell (default: 6). $max\_PM$ is the maximum number of amplicons above which solutions are rejected in the cell-frequency estimation step described below, i.e. $PM$
Value
List with three components:
pThe probability that the mutation is present in a fraction $f$ of cells, for each input frequency $f$.
bestFThe cellular frequency that best explains the observed allele frequency and ploidities.
errorsErrors encountered during the density estimation step.
Details
We consider two types of molecular mechanisms that convert a locus into its mutated state: copy number variation (CNV) inducing events and single nucleotide variation (SNV) inducing events. We assume that a normal state is defined by a total ploidy of two and B allele ploidy below two, whereas a mutated state has an increased fraction of B alleles. The conditions defining these states for each locus $l$ are as follows: i) $PM^B, PN^B, PM, PN \in N$; ii) $PM^B \geq 1; PN^B \leq 1; PN = 2$; iii) $\frac{PM^B}{PM} \geq \frac{PN^B}{PN}$.
$PM^B$ and $PN^B$ denote the ploidy of the B allele in each cell type: mutated cells and normal cells, respectively. The value of $PN^B$ is one if $l$ has a germline variant, zero otherwise. $PM, PN$ are the total ploidy of mutated cells and normal cells. $PM$ is required to be between one and $max\_PM$, that is, we exclude solutions for which the maximum number of amplicons per cell exceeds the user defined constant $max\_PM$.
The function returns the probability distribution, $P_l(f)$, that the mutation at locus $l$ is present in a fraction $f$ of cells, where $f \in [min_CellFreq,1.1]$. At default settings the interval starts at 0.1 because cellular frequencies below 0.1 are typically detected at very low allele-frequencies (
References
Noemi Andor, Julie Harness, Sabine Mueller, Hans Werner Mewes and Claudia Petritsch. (2013) ExPANdS: Expanding Ploidy and Allele Frequency on Nested Subpopulations. Bioinformatics.