pbcmcPackage: Permutation-Based Confidence for Molecular Classification (pbcmc)
Description
Gene expression-based classifiers, known as molecular signatures (MS),
are a set of genes coordinately expressed and an algorithm that use these
data to predict disease subtypes, response to therapy, disease risk or
clinical outcome (Andre et al. 2006). They are especially important in
breast cancer (BC) where several MS are currently on the market like PAM50
(Perou et al. 2000 & 2010), Prosigna www.prosigna.com, Oncotype DX
www.oncotypedx.com, MammaPrint www.agendia.com, etc.
As far as the authors know, these classifiers do not give a real
uncertainty of the classification at all. This package characterizes MS
classification uncertainty. In order to achieve this goal, synthetic
simulated subjects are obtained by permutations of gene labels. Then,
each synthetic subject is tested against the classifier corresponding
subtype to build the null distribution, thus, classification confidence
measurement can be provided for each subject. In this context, subjects
belonging to the null distribution (random or noisy individuals) are not
assigned (NA) to any class. On the contrary, if reliable results are
obtained, subjects could be either assigned (A) to the more reliably
subtype or marked as ambiguous (AMB) if proximal to two or more reliable
subtypes. In the later, the combinations of classes are given.
At present, it is only implemented for genefu's PAM50 package
(Haibe-Kains et al. 2014) but it can easily be extended to other
MS. This package includes the following features:
- Implemented classifier:
- PAM50.
- Single subject classification:
- No pilot study needs to be carried out to obtain
classification uncertainty.
- No normalization is required. If required, external
database normalization, genefu normalization
alternatives (scale/robust) or even gene median can
be applied before simulations.
- Classification:
- The original PAM50 calls obtained by genefu.
- The proposed classification scheme: Assigned
(PAM50 call), Not Assigned (NA) or Ambiguous (reliable
PAM50 class combinations).
- Classification significance p-value or False
Discovery Rate (FDR).
- Observed subject Spearman's correlation for each
breast cancer subtype.
- Physician treatment decision support:
- A friendly subject report is provided which includes
summary data such as subtype centroid Spearman's
correlation, p-value and FDR for each subtype,
original PAM50 classification and the recommended
strategy (assigned, not assigned or ambiguous
classes).
- Scatter plot of the observed gene-expression
(subject) versus PAM50 centroids panel, plus the
corresponding linear regression fit.
- Null distribution boxplot, plus observed (subject)
value.
References
- Andre F, Pusztai L, 2006, Molecular classification of
breast cancer: implications for selection of adjuvant
chemotherapy. Nature Clinical Practice Oncology 3(11),
621-632.
- Haibe-Kains B, Schroeder M, Bontempi G, Sotiriou C and
Quackenbush J, 2014, genefu: Relevant Functions for Gene
Expression Analysis, Especially in Breast Cancer. R package
version 1.16.0, www.pmgenomics.ca/bhklab/
- Perou CM, Sorlie T, Eisen MB, et al., 2000, Molecular
portraits of human breast tumors. Nature 406:747-752
- Perou CM, Parker JS, Prat A, Ellis MJ, Bernard PB., 2010,
Clinical implementation of the intrinsic subtypes of breast
cancer, The Lancet Oncology 11(8):718-719