SigCheckObject and establish baseline performance.
SigCheckObject. Also establishes
baseline survival analysis and/or classification performance.
sigCheck(expressionSet, classes, survival, signature, annotation, validationSamples, scoreMethod="PCA1", threshold=median, classifierMethod=svmI, modeVal, survivalLabel, timeLabel, plotTrainingKM=TRUE, plotValidationKM=TRUE, impute=TRUE)ExpressionSet object containing the data to be checked,
including an expression matrix, feature labels, and samples.expressionSet can also be an existing SigCheckObject,
in which case everything will be inherited from the passed object
except the values for any specified parameters,
varLabels(expressionSet)).
There should be only
two unique values in expressionSet$classes.
varLabels(expressionSet)).
This may be missing if only classification is being checked.
annotation parameter.
Alternatively, this can be a integer vector of feature indexes.
fvarLabels field should be
used as the annotation. If missing, the row names of the expressionSet
are used as the feature names.
expressionSet should be considered validation samples.
If present, the main checks will be performed using only these samples.
If a the scoreMethod parameter is equal to "classifier",
the remaining samples will
be used as a training set to construct a classifier that will be used to
separate the training samples.
If a classifier is used, and validationSamples is not specified,
a leave-one-out (LOO) validation
method will be used, where a separate classifier will be trained to classify
each sample using the all the remaining samples.
"PCA1": default scoring method for separating validation
samples into groups
by taking the value of the first principal component of the expression
values in the signature for each sample.
"High": score used for separating validation samples into groups
for each sample is the mean value over
all the expression values in the signature for each sample.
"classifer": score used for separating validation
samples into groups is determined by a classifier specified in the
classifierMethod parameter. If the survival parameter is
specified, the classifier method must return a real-valued score for each
predicted sample.
scoreMethod can also be a user-defined function that computes a score.
The function should take a single parameter, an ExpressionSet, and
return a vector of score, one for each row.
if the survival parameter is missing, scoreMethod value must be
"classifier".
scoreMethod.
Can be either a function, (with default median) or a number between
zero and one indicating a percentile. Validation samples will be divided into
a group whose percentile scores are less than this value, and another
group with percentile scores greater
than or equal to this value.
threshold may also be a vector of two percentiles,
in which case samples
will be divided into High, Low, and Mid groups.
The survival p-value will be computed using only the high and low
groups, with the mid group samples excluded.
classes parameter) should be considered as the default
value when computing the performance of a "mode" classifier. Is missing,
the actual mode (most commonly occurring) value of the training set will be used.
survival and validationSamples parameters are provided,
a Kaplan-Meier plot can be plotted automatically for the training set samples if
this is TRUE. A value of FALSE will suppress the plot being
automatically generated.
survival parameter is provided,
a Kaplan-Meier plot can be plotted automatically for the validation set samples if
this is TRUE. A value of FALSE will suppress the plot being
automatically generated.
Note that is the validationSamples parameter is missing, the resulting
plot will be over all samples.
TRUE, missing data values in the expressionSet will be imputed.
If FALSE, any features with any missing values will be removed
from the dataset.
SigCheckObject is returned.
SigCheckObject and carried out
a baseline analysis,
which will vary depending on which parameters are specified.If the survival parameter is specified, a survival analysis
is carried out.
If the validationSamples parameter is specified, this will be done
separately on the validation samples and the remaining
(training/discovery) samples.
The main result is a p-value indicating the confidence that the samples are
separable into groups with distinct survival outcomes. This value is obtained
using the survdiff function in the survival package
(and applying pchisq to the
$chisq component of the result). The samples are separated into groups
using the scoreMethod and threshold parameters
(and possibly the classifierMethod parameter).
If the survival parameter is not specified, then the scoreMethod
parameter must be equal to "classifier", and a pure classification
analysis is completed (as was done in SigCheck 1.0).
If the validationSamples parameter is specified, the remaining samples
are used as a training set to construct a classifier that is used
to classify the validation samples. If validationSamples is not
specified, leave-one-out cross-validation is used whereby a separate
classifier is trained to predict each sample using all of the others.
sigCheckAll, sigCheckRandom,
sigCheckKnown, sigCheckPermuted.
library(breastCancerNKI)
data(nki)
nki <- nki[,!is.na(nki$e.dmfs)]
data(knownSignatures)
## survival analysis
check <- sigCheck(nki, classes="e.dmfs", survival="t.dmfs",
signature=knownSignatures$cancer$VANTVEER,
annotation="HUGO.gene.symbol")
check@survivalPval
check <- sigCheck(check, classes="e.dmfs", survival="t.dmfs",
signature=knownSignatures$cancer$VANTVEER,
annotation="HUGO.gene.symbol",
scoreMethod="High", threshold=.33)
check@survivalPval
## survival analysis with separate training and validation using SVM
check <- sigCheck(nki, classes="e.dmfs", survival="t.dmfs",
signature=knownSignatures$cancer$VANTVEER,
annotation="HUGO.gene.symbol",
validationSamples=150:319,
scoreMethod="classifier")
check
Run the code above in your browser using DataLab