runExPANdS: Main Function

Description

Given a set of mutations, ExPANdS predicts the number of clonal expansions in a tumor, the size of the resulting subpopulations in the tumor bulk and which mutations accumulate in a cell prior to its clonal expansion. Input-parameters SNV and CBS hold the paths to tabdelimited files containing the point mutations and the copy numbers respectively. Alternatively SNV and CBS can be read into the workspace and passed to runExPANdS as numeric matrices. The robustness of the subpopulation predictions by ExPANdS increases with the number of mutations provided. It is recommended that SNV contains at least 200 point mutations to obtain stable results.

Usage

runExPANdS(SNV, CBS, maxScore=2.5, max_PM=6, min_CellFreq=0.1, precision=NA,
 plotF=2,snvF=NULL,maxN=8000,region=NA,peakselection='localsum')

Arguments

SNV

Matrix in which each row corresponds to a point mutation. Only mutations located on autosomes should be included. Columns in SNV must be labeled and must include: chr - the chromosome on which each mutation is located; startpos - the genomi

CBS

Matrix in which each row corresponds to a copy number segment. CBS is typically the output of a circular binary segmentation algorithm. Columns in CBS must be labeled and must include: chr - chromosome; startpos - the first genomic position

maxScore

Upper threshold for the noise score of subpopulation detection. Only subpopulations identified at a score below $maxScore$ (default 2.5) are kept.

max_PM

Upper threshold for the number of amplicons per mutated cell (default: 6). Increasing the value of this variable is not recommended unless extensive depth and breadth of coverage underly the measurements of copy numbers and allele frequencies. See also

min_CellFreq

Lower boundary for the cellular prevalence interval of a mutated cell. In default settings the interval starts at 0.1 because cellular frequencies below 0.1 typically correspond to low allele-frequencies (often

precision

Precision with which subpopulation size is predicted, a small value reflects a high resolution and can lead to a higher number of predicted subpopulations.

plotF

Option for displaying a visual representation of the identified subpopulations (0 - no display; 1 - display subpopulation size; 2 - display subpopulation size and phylogeny; default: 2).

snvF

Prefix of file to which predicted subpopulation composition will be saved. Default: the name of the file from which mutations have been read or "out.expands" if input mutations are not handed over as file path.

maxN

Upper limit for number of point mutations used during clustering (default: 8000; increasing value of this parameter not recommended). If number of user supplied point mutations exceeds $maxN$, the clustering of cellular frequency distributions will be res

region

Regional boundary for mutations included during clustering. Matrix in which each row corresponds to a genomic segment. Columns must include: chr - the chromosome of the segment; start - the first genomic position of the segment; end

peakselection

Strategy used when assigning mutations to subpopulations, to select mutation specific cell-frequency probability peaks. Options: 'maximum','localsum' (see also assignMutations).

Value

List with fields:
finalSPsMatrix of predicted subpopulations. Each row corresponds to a subpopulation and each column contains information about that subpopulation, such as the size in the sequenced tumor bulk (column Mean Weighted) and the noise score at which the subpopulation has been detected (column score).
dmMatrix containing the input mutations with at least five additional columns: SP - the subpopulation to which the point mutation has been asssigned; SP_cnv - the subpopulation to which the CNV has been asssigned (if an CNV exists at this locus); %maxP - the confidence of point mutation assignment. f - Deprecated. The maximum likelyhood cellular prevalence of this point mutation, before it has been assigned to SP. This value is based on the copy number and allele frequency of the mutation exclusively and is independent of other point mutations. Column SP is less sensitive to noise and considered the more accurate estimation of cellular mutation prevalence. PM - the total ploidy of all alleles at the mutated genomic locus, in the subpopulation harboring the point mutation (SP). PM_B - the ploidy of the B-allele at the mutated genomic locus, in the subpopulation harboring the point mutation (SP). PM_cnv - the total ploidy of all alleles at the mutated genomic locus, in the subpopulation harboring an CNV (SP_cnv). If phylogeny reconstruction was successful, matrix includes one additional column for each subpopulation from the phylogeny, indicating whether or not the point mutation is present in the corresponding subpopulation.
densitiesMatrix as obtained by computeCellFrequencyDistributions. Each row corresponds to a mutation and each column corresponds to a cellular frequency. Each value $densities[i,j]$ represents the probability that mutation $i$ is present in a fraction $f$ of cells, where $f$ is given by: $colnames(densities[,j]).$
ploidyMatrix as obtained by assignQuantityToSP. Each row corresponds to a copy number segment, e.g. as obtained from a circular binary segmentation algorithm. Includes one additional column for each predicted subpopulation, containing the ploidy of each segment in the corresponding subpopulation.
treeAn object of class "phylo" (library ape) as obtained by buildPhylo. Contains the inferred phylogenetic relationships between subpopulations.

References

Noemi Andor, Julie Harness, Sabine Mueller, Hans Werner Mewes and Claudia Petritsch. (2013) ExPANdS: Expanding Ploidy and Allele Frequency on Nested Subpopulations. Bioinformatics.

Examples

Run this code

data(snv);
data(cbs);
maxScore=2.5;
set.seed(4); idx=sample(1:nrow(snv), 60, replace=FALSE);
#out= runExPANdS(snv[idx,], cbs, maxScore);

Run the code above in your browser using DataLab