Usage
runExPANdS(SNV, CBS, maxScore=2.5, max_PM=6, min_CellFreq=0.1, precision=NA, plotF=2,snvF=NULL,maxN=8000,region=NA,peakselection='localsum')
Arguments
SNV
Matrix in which each row corresponds to a point mutation. Only mutations located on autosomes should be included. Columns in SNV must be labeled and must include:
chr - the chromosome on which each mutation is located;
startpos - the genomic position of each mutation;
AF_Tumor - the allele-frequency of each mutation;
PN_B - ploidy of B-allele in normal cells. A value of 0 indicates that the mutation has only been detected in the tumor sample (i.e. somatic mutation). A value of 1 indicates that the mutation is also present in the normal (control) sample, albeit at reduced allele frequency (i.e. mutation is consequence of LOH). Mutations, for which the allele frequency in the tumor sample is lower than the corresponding allele frequency in the normal sample, should not be included.
CBS
Matrix in which each row corresponds to a copy number segment. CBS is typically the output of a circular binary segmentation algorithm. Columns in CBS must be labeled and must include:
chr - chromosome;
startpos - the first genomic position of a copy number segment;
endpos - the last genomic position of a copy number segment;
CN_Estimate - the absolute copy number estimated for each segment.
maxScore
Upper threshold for the noise score of subpopulation detection. Only subpopulations identified at a score below $maxScore$ (default 2.5) are kept.
max_PM
Upper threshold for the number of amplicons per mutated cell (default: 6). Increasing the value of this variable is not recommended unless extensive depth and breadth of coverage underly the measurements of copy numbers and allele frequencies. See also cellfrequency_pdf.
min_CellFreq
Lower boundary for the cellular prevalence interval of a mutated cell. In default settings the interval starts at 0.1 because cellular frequencies below 0.1 typically correspond to low allele-frequencies (often
precision
Precision with which subpopulation size is predicted, a small value reflects a high resolution and can lead to a higher number of predicted subpopulations.
plotF
Option for displaying a visual representation of the identified subpopulations (0 - no display; 1 - display subpopulation size; 2 - display subpopulation size and phylogeny; default: 2).
snvF
Prefix of file to which predicted subpopulation composition will be saved. Default: the name of the file from which mutations have been read or "out.expands" if input mutations are not handed over as file path.
maxN
Upper limit for number of point mutations used during clustering (default: 8000; increasing value of this parameter not recommended). If number of user supplied point mutations exceeds $maxN$, the clustering of cellular frequency distributions will be restricted to point mutations found within $region$.
region
Regional boundary for mutations included during clustering.
Matrix in which each row corresponds to a genomic segment. Columns must include:
chr - the chromosome of the segment;
start - the first genomic position of the segment;
end - the last genomic position of the segment.
Default: SureSelectExome_hg19, comprising ca. 468 MB centered on the human exome. Alternative user supplied regions should also be coding regions, as the seletive pressure is higher as compared to non-coding regions.
peakselection
Strategy used when assigning mutations to subpopulations, to select mutation specific cell-frequency probability peaks. Options: 'maximum','localsum' (see also assignMutations).