pbcmcPackage: Permutation-Based Confidence for Molecular Classification (pbcmc)

Description

Gene expression-based classifiers, known as molecular signatures (MS), are a set of genes coordinately expressed and an algorithm that use these data to predict disease subtypes, response to therapy, disease risk or clinical outcome (Andre et al. 2006). They are especially important in breast cancer (BC) where several MS are currently on the market like PAM50 (Perou et al. 2000 & 2010), Prosigna www.prosigna.com, Oncotype DX www.oncotypedx.com, MammaPrint www.agendia.com, etc. As far as the authors know, these classifiers do not give a real uncertainty of the classification at all. This package characterizes MS classification uncertainty. In order to achieve this goal, synthetic simulated subjects are obtained by permutations of gene labels. Then, each synthetic subject is tested against the classifier corresponding subtype to build the null distribution, thus, classification confidence measurement can be provided for each subject. In this context, subjects belonging to the null distribution (random or noisy individuals) are not assigned (NA) to any class. On the contrary, if reliable results are obtained, subjects could be either assigned (A) to the more reliably subtype or marked as ambiguous (AMB) if proximal to two or more reliable subtypes. In the later, the combinations of classes are given. At present, it is only implemented for genefu's PAM50 package (Haibe-Kains et al. 2014) but it can easily be extended to other MS. This package includes the following features:

Implemented classifier:
1. PAM50.
Single subject classification:
1. No pilot study needs to be carried out to obtain classification uncertainty.
2. No normalization is required. If required, external database normalization, genefu normalization alternatives (scale/robust) or even gene median can be applied before simulations.
Classification:
1. The original PAM50 calls obtained by genefu.
2. The proposed classification scheme: Assigned (PAM50 call), Not Assigned (NA) or Ambiguous (reliable PAM50 class combinations).
3. Classification significance p-value or False Discovery Rate (FDR).
4. Observed subject Spearman's correlation for each breast cancer subtype.
Physician treatment decision support:
1. A friendly subject report is provided which includes summary data such as subtype centroid Spearman's correlation, p-value and FDR for each subtype, original PAM50 classification and the recommended strategy (assigned, not assigned or ambiguous classes).
2. Scatter plot of the observed gene-expression (subject) versus PAM50 centroids panel, plus the corresponding linear regression fit.
3. Null distribution boxplot, plus observed (subject) value.

Arguments

References

Andre F, Pusztai L, 2006, Molecular classification of breast cancer: implications for selection of adjuvant chemotherapy. Nature Clinical Practice Oncology 3(11), 621-632.
Haibe-Kains B, Schroeder M, Bontempi G, Sotiriou C and Quackenbush J, 2014, genefu: Relevant Functions for Gene Expression Analysis, Especially in Breast Cancer. R package version 1.16.0, www.pmgenomics.ca/bhklab/
Perou CM, Sorlie T, Eisen MB, et al., 2000, Molecular portraits of human breast tumors. Nature 406:747-752
Perou CM, Parker JS, Prat A, Ellis MJ, Bernard PB., 2010, Clinical implementation of the intrinsic subtypes of breast cancer, The Lancet Oncology 11(8):718-719