cellfrequency_pdf: Computes the probability distribution of cellular frequencies for a single mutation.

Description

Calculates $P$ - the probability density distribution of cellular frequencies for one single point mutation or CNV. For each cell-frequency $f$, the value of $P(f)$ reflects the probability that the mutation is present in a fraction $f$ of cells.

Usage

cellfrequency_pdf(af, cnv, pnb, freq, max_PM=6, snv_cnv_flag=3, SP_cnv = NA, PM_cnv = NA)

Arguments

The allelic frequency at which the point muation has been observed.

cnv

The average copy number of the locus in which the mutation is embedded.

pnb

The ploidy of the B-allele in normal cells (binary variable: 1 if the mutation is a germline variant, 0 if somatic). B-alleles with normal cell ploidy>1 are not modeled.

freq

Array of cellular frequencies at which the probabilities will be calculated.

max_PM

Upper threshold for the number of amplicons per mutated cell (default: 6). $max\_PM$ is the maximum number of amplicons above which solutions are rejected in the cell-frequency estimation step described below, i.e. $PM$

snv_cnv_flag

Flag indicating the evolutionary scenario under which frequency should be estimated: 1 - cellular frequency of SNV only; 2 - cellular frequency of CNV only (parameters AF and pnb are ignored); 3 - cellular frequency of SNV and CNV simultaneously, under th

SP_cnv

Size of the subpopulation that harbors a copy number variation (CNV) at this locus. This variable is only relevant if an CNV and an SNV have overlapping genomic location, yet have been propagated during distinct clonal expansions (snv_cnv_flag=4).

PM_cnv

Total ploidy in subpopulation which harbors a copy number variation (CNV) at this locus. This variable is only relevant if an CNV and an SNV have overlapping genomic location, yet have been propagated during distinct clonal expansions (snv_cnv_flag=4).

Value

List with four components:
pThe probability that the point mutation/CNV is present in a fraction $f$ of cells, for each input frequency $f$ in parameter $freq$.
bestFThe cellular frequency that best explains the observed allele frequency and/or copy number.
fitMatrix with each row containing one alternative solution, (PM, PM_B, f), as well as an assesment of how well the solution fits above equations (Column "dev").
errorsErrors encountered during the density estimation step.

Details

We consider two types of molecular mechanisms that convert a locus into its mutated state: copy number variation (CNV) inducing events and single nucleotide variation (SNV) inducing events. We assume that a normal state is defined by a total ploidy of two and B allele ploidy below two, whereas a mutated state has an increased fraction of B alleles. The conditions defining these states for each locus $l$ are as follows: i) $PM^B, PN^B, PM, PN \in N$; ii) $PM^B \geq 1; PN^B \leq 1; PN = 2$; iii) $\frac{PM^B}{PM} \geq \frac{PN^B}{PN}$. $PM^B$ and $PN^B$ denote the ploidy of the B allele in each cell type: mutated cells and normal cells, respectively. The value of $PN^B$ is one if $l$ has a germline variant, zero otherwise. $PM, PN$ are the total ploidy of mutated cells and normal cells. $PM$ is required to be between one and $max\_PM$, that is, we exclude solutions for which the maximum number of amplicons per cell exceeds the user defined constant $max\_PM$. The function returns the probability distribution, $P(f)$, that the mutation at locus $l$ is present in a fraction $f$ of cells, where $f \in [0,1]$. Four alternative cell frequency probability distributions, $P(f)$, can be obtained for each allele-frequency + copy number pair (AF, CN). 1. $P_s(f_{cnv})$ separately modeling the size $f_{cnv}$ of the subpopulation propagating an CNV: $PM * f_{cnv} + PN *(1-f_{cnv}) = CN$ 2. $P_s(f_{snv})$ and $P_p(f_{snv})$ modeling the size $f_{snv}$ of the subpopulation propagating an SNV: 2a) $P_s(f_{snv})$: $PM^B * f_{snv} + PN^B *(1-f_{snv}) = AF*CN$, where $PM^B \leq max(2, PM)$; Here $f_{snv}$ is calcualted separately of $f_{cnv}$, under the assumption that i) SNV and CNV occur independently from each other (i.e. they are never co-propagated during the same clonal expansion) or ii) SNV occured in a denscendant of the subpopulation with the CNV. 2b) $P_p(f_{snv})$: $PM^B * (f_{snv}-f_{cnv}) + pm^B * f_{cnv} + PN^B *(1-f_{snv}) = AF*CN$, where $pm^B \neq PM^B$ and $pm^B \neq 2$. Here $f_{snv}$ is calcualted partially dependent on $f_{cnv}$, under the assumption that the SNV occured in an ancestor of the subpopulation with the CNV. 3. $P_j(f)$ jointly modeling the size $f$ of the subpopulation propagating both the SNV and the CNV simulataneously: enforcing both equations, 1) and 2a), with additional constraints: $f:=f_{snv}=f_{cnv}$ and $PM^B \leq PM$ In 1) and 2) the SNV is present in a subpopulation different of the CNV harboring subpopulation. In 3) both the SNV and an CNV at $l$ were propagated during the same clonal expansion.

References

Noemi Andor, Julie Harness, Sabine Mueller, Hans Werner Mewes and Claudia Petritsch. (2013) ExPANdS: Expanding Ploidy and Allele Frequency on Nested Subpopulations. Bioinformatics.

Examples

Run this code

freq=seq(0.1,1.0,by=0.01);
cfd=cellfrequency_pdf(af=0.26,cnv=1.95,pnb=0,freq=freq, max_PM=6)
plot(freq,cfd$p,type="l",xlab="f",ylab="P(f)");

Run the code above in your browser using DataLab