clusterCellFrequencies: Clustering of cellular frequency probability distributions

Description

Calculates overrepresented cell frequencies using a two-step approach. Based on the assumption that passenger mutations occur within a cell prior to the driver event that initiates the expansion, each clonal expansion should be marked by multiple mutations. Thus mutations and copy number variations that took place in a cell prior to a clonal expansion should be present in a similar fraction of cells and leave a similar "frequency-trace" in the subsequent clonal expansion.

Usage

clusterCellFrequencies(densities, precision, nrep=30, min_CellFreq=0.1)

Arguments

densities

Matrix as obtained by computeCellFrequencyDistributions.Each row corresponds to a mutation and each column corresponds to a cellular frequency. Each value $densities[i,j]$ represen

precision

Precision with which subpopulation size is predicted, a small value reflects a high resolution and can lead to a higher number of predicted subpopulations.

nrep

Positive integer indicating the number of algorithm repetitions (default: 30).

min_CellFreq

Lower threshold for the prevalence of a mutated cell (default: 0.1).

Value

SPsMatrix of predicted subpopulations. Each row corresponds to a subpopulation and each column contains information about that subpopulation, such as the size in the sequenced tumor bulk (column Mean Weighted) and the noise score at which the subpopulation has been detected (column score: lower values ~ higher subpopulation detection confidence).

Details

In the first step, mutations with similar cellular frequencies are grouped together by hierarchical cluster analysis of the probability distributions using the Kullback-Leibler divergence as a distance measure. The cell frequency at each cluster-maxima denotes the size of the subpopulation that harbors the clustered mutations. In the second step, each cluster is extended by members with similar distributions in an interval around the cluster-maxima.

References

Noemi Andor, Julie Harness, Sabine Mueller, Hans Werner Mewes and Claudia Petritsch. (2013) ExPANdS: Expanding Ploidy and Allele Frequency on Nested Subpopulations. Bioinformatics.