assignMutations: Mutation Assignment

Description

Assigns mutations to previously predicted subpopulations.

Usage

assignMutations(dm, finalSPs, max_PM=6, peakselection='localsum')

Arguments

Matrix in which each row corrsponds to a mutation. Has to contain at least the following columnnames: chr - the chromosome on which each mutation is located; startpos - the genomic position of each mutation; AF_Tumor - the allele-fre

finalSPs

Matrix in which each row corresponds to a subpopulation, as computed by clusterCellFrequencies.

max_PM

Upper threshold for the number of amplicons per mutated cell (default: 6). See also cellfrequency_pdf.

peakselection

Strategy used to select mutation specific cell-frequency probability peaks. Options: 'maximum','localsum'.

Value

A list with two fields:
dmThe input matrix with seven additional columns: SP - the subpopulation to which the point mutation has been asssigned; PM_B - the ploidy of the B-allele at the mutated genomic locus, in the assigned subpopulation (SP). PM - the total ploidy of all alleles, in the assigned subpopulation (SP). SP_cnv - if the point mutation lies within an amplified or deleted region: the subpopulation to which the copy number variation has been asssigned. This entry has the same value as SP if and only if: i) the SNV and the CNV were propagated during the same clonal expansion or ii) the SNV lies within a copy neutral region. PM_cnv - the total ploidy of all alleles, in the CNV harboring subpopulation (SP_cnv). %maxP - confidence of the point mutation assignment to SP. scenario - the evolutionary scenario under which the subpopulation configurations for this genomic locus have been solved (see also parameter "snv_cnv_flag" in cellfrequency_pdf).
finalSPsThe input matrix of subpopulations with column nMutations updated according to the total number of mutations assigned to each subpopulation.

Details

Each mutated locus $l$ is assigned to the subpopulation $C$, whose size $f_C$ can best explain the allele frequency (AF) and copy number (CN) observed at $l$.Three alternative cell frequency probabilities, $P_x(f_C)$, are calculated for the SNV at locus $l$, with $x$ denoting three alternative evolutionary scenarios (see also cellfrequency_pdf): 1. $x:=s$ --> Separate fit of SNV and CNV. CNV does not influence ploidy of the SNV, either because CNV occurs before SNV or because SNV and CNV occur independently from each other (i.e. they are never co-propagated during the same clonal expansion) 2. $x:=p$ --> Partial dependency of SNV ploidy on CNV. The SNV is propagated during the expansion of $C$. Subsequently, the CNV is propagated during a clonal expansion of a cell-member of $C$. 3. $x:=j$ --> Joint fit of SNV and CNV, assuming they co-occur together in the same cell and are propagated during the exact same clonal expansion. If peakselection is set to 'maximum', then the SNV is assigned to subpopulation: $C:=argmax_C (P_s(f_C), P_p(f_C), P_j(f_C))$. If peakselection is set to 'localsum', then the SNV is assigned to subpopulation: $C:=argmax_C (L(P_s, f_C), L(P_p, f_C), L(P_j, f_C))$,where: $L(P,f_C):= \sum_{f \in peak(f_C)} P(f)$ calculates the sum of all probabilities in individual peaks. The mutated loci assigned to each subpopulation cluster represent the genetic profile of each predicted subpopulation. The assignment between subpopulation $C$ and locus $l$ only implies that the SNV at $l$ has been first propagated during the clonal expansion that gave rise to $C$. So SNVs present in $C$ may not be exclusive to $C$ but may also be present in subpopulations smaller than $C$. Whether or not this is the case can sometimes be inferred from the phylogenetic structure of the subpopulation composition. See also buildPhylo.

References