Learn R Programming

expands (version 1.2)

cellfrequency_pdf: Computes the probability distribution of cellular frequencies for a single mutation.

Description

Calculates $P$ - the probability density distribution of cellular frequencies for one single mutation. For each $f$, the value of $P(f)$ reflects the probability that the mutation is present in a fraction $f$ of cells.

Usage

cellfrequency_pdf(af, cnv, pnb, freq, max_PM=6)

Arguments

af
The allelic frequency at which the muation has been observed.
cnv
The ploidy of the locus in which the mutation is embedded.
pnb
The ploidy of the B-allele in normal cells (binary variable: 1 if the mutation is a germline variant, 0 if somatic).
freq
The array of cellular frequencies at which the probabilities will be calculated.
max_PM
Upper threshold for the number of amplicons per mutated cell (default: 6). See Detailsfor more information on this parameter.

Value

  • List with three components:
  • pThe probability that the mutation is present in a fraction $f$ of cells, for each input frequency $f$.
  • bestFThe cellular frequency that best explains the observed allele frequency and ploidities.
  • errorsErrors encountered during the density estimation step.

Details

We consider two types of molecular mechanisms that convert a locus into its mutated state: copy number variation (CNV) inducing events and single nucleotide variation (SNV) inducing events. We assume that a normal state is defined by a total ploidy of two and B allele ploidy below two, whereas a mutated state has an increased fraction of B alleles. The conditions defining these states for each locus $l$ are as follows: i) $PM^B, PN^B, PM, PN \in N$; ii) $PM^B \geq 1; PN^B \leq 1; PN = 2$; iii) $\frac{PM^B}{PM} \geq \frac{PN^B}{PN}$. $PM^B$ and $PN^B$ denote the ploidy of the B allele in each cell type: mutated cells and normal cells, respectively. The value of $PN^B$ is one if $l$ has a germline variant, zero otherwise. $PM, PN$ are the total ploidy of mutated cells and normal cells. $PM$ is required to be between one and $max\_PM$, that is, we exclude solutions for which the maximum number of amplicons per cell exceeds the user defined constant $max\_PM$. The function returns the probability distribution, $P_l(f)$, that the mutation at locus $l$ is present in a fraction $f$ of cells, where $f \in [0.1,1.1]$. The interval starts at 0.1 because cellular frequencies below 0.1 are typically detected at very low allele-frequencies (

References

Noemi Andor, Julie Harness, Hans Werner Mewes and Claudia Petritsch. (2013) ExPANdS: Expanding Ploidy and Allele Frequency on Nested Subpopulations. Bioinformatics. In Review.

Examples

Run this code
freq=seq(0.1,1.1,by=0.01);
cfd=cellfrequency_pdf(af=0.26,cnv=2.13,pnb=0,freq=freq, max_PM=6)
plot(freq,cfd$p,type="l",xlab="f",ylab="P(f)");

Run the code above in your browser using DataLab