sigCheckKnown: Check signature performance against a panel of known signatures.

Description

Performance of a signature is compared to performance of a panel of known (previously identified) signature.

Usage

sigCheckKnown(check, known="cancer")

Arguments

check

A SigCheckObject, as returned by sigCheck.

known

Either a character string specifying which set of signatures to use from the included sets in knownSignatures, or a list of previously identified signatures to compare performance against. Each element in the list should be a vector of feature labels. Default is to use the "cancer" signatures from the included k knownSignatures data set, taken from Venet et. al.

Value

A result list with the following elements:

$checkType is equal to "Known".
$knownSigs is the number of tests run (equal to the number of known signatures indicated where at least one gene matches a feature).
$rank is the performance rank of the primary signature within the performance of the known signatures.
$checkPval is the empirical p-value computed using the scores of the known signature as a null distribution. A value of zero indicates that no known signatures performed as good or better than the primary signature.
$survivalPval represents the performance of the primary signature, if survival data were provided.
$survivalPvalsKnown is a vector of performance scores (p-values) for each known signature on the validation samples, if survival data were provided.
$trainingPvalsKnown is a vector of performance scores (p-values) for each known signature on the training samples, if survival data and separate validation samples were provided.
$sigPerformance is the proportion of validation samples correctly classified by the primary signature if a classifier was used.
$modePerformance is the proportion of validation samples correctly classified using a mode classifier.
$performanceKnown is a vector of classification performance scores for each known signature, each indicating the proportion of validation samples correctly classified is a classifier was used.

Details

Each specified known signature will be evaluated in the same manner as the primary signature. If survival data were supplied, a survival analysis will be carried out on the validation samples, and a p-value computed as a performance measure. If no survival data are available, the training samples will be used to train a classifier, and the performance score will be percentage of validation samples correctly classified. (If no validation samples are provided, leave-one-out cross validation will be used to calculate the classification performance for each known signature).

An empirical p-value will be computed based on the percentile rank of the performance of the primary signature compared to a null distribution of the performance of the known signatures.

References

Venet, David, Jacques E. Dumont, and Vincent Detours. "Most random gene expression signatures are significantly associated with breast cancer outcome." PLoS Computational Biology 7.10 (2011): e1002240.

Examples

Run this code

#Disable parallel so Bioconductor build won't hang
library(BiocParallel)
register(SerialParam())

library(breastCancerNKI)
data(nki)
nki <- nki[,!is.na(nki$e.dmfs)]
data(knownSignatures)

## survival analysis
check <- sigCheck(nki, classes="e.dmfs", survival="t.dmfs",
                  signature=knownSignatures$cancer$VANTVEER,
                  annotation="HUGO.gene.symbol",
                  validationSamples=150:319)

knownResult <- sigCheckKnown(check)
knownResult$checkPval
knownResult$survivalPvalsKnown[knownResult$survivalPvalsKnown <
                               knownResult$checkPval]
sigCheckPlot(knownResult)

Run the code above in your browser using DataLab