SigCheckObject
and establish baseline performance.
SigCheckObject
. Also establishes
baseline survival analysis and/or classification performance.
sigCheck(expressionSet, classes, survival, signature, annotation, validationSamples, scoreMethod="PCA1", threshold=median, classifierMethod=svmI, modeVal, survivalLabel, timeLabel, plotTrainingKM=TRUE, plotValidationKM=TRUE, impute=TRUE)
ExpressionSet
object containing the data to be checked,
including an expression matrix, feature labels, and samples.expressionSet
can also be an existing SigCheckObject
,
in which case everything will be inherited from the passed object
except the values for any specified parameters,
varLabels(expressionSet))
.
There should be only
two unique values in expressionSet$classes.
varLabels(expressionSet))
.
This may be missing if only classification is being checked.
annotation
parameter.
Alternatively, this can be a integer vector of feature indexes.
fvarLabels
field should be
used as the annotation. If missing, the row names of the expressionSet
are used as the feature names.
expressionSet
should be considered validation samples.
If present, the main checks will be performed using only these samples.
If a the scoreMethod
parameter is equal to "classifier"
,
the remaining samples will
be used as a training set to construct a classifier that will be used to
separate the training samples.
If a classifier is used, and validationSamples
is not specified,
a leave-one-out (LOO) validation
method will be used, where a separate classifier will be trained to classify
each sample using the all the remaining samples.
"PCA1"
: default scoring method for separating validation
samples into groups
by taking the value of the first principal component of the expression
values in the signature for each sample.
"High"
: score used for separating validation samples into groups
for each sample is the mean value over
all the expression values in the signature for each sample.
"classifer"
: score used for separating validation
samples into groups is determined by a classifier specified in the
classifierMethod
parameter. If the survival
parameter is
specified, the classifier method must return a real-valued score for each
predicted sample.
scoreMethod
can also be a user-defined function that computes a score.
The function should take a single parameter, an ExpressionSet
, and
return a vector of score, one for each row.
if the survival
parameter is missing, scoreMethod
value must be
"classifier"
.
scoreMethod
.
Can be either a function, (with default median
) or a number between
zero and one indicating a percentile. Validation samples will be divided into
a group whose percentile scores are less than this value, and another
group with percentile scores greater
than or equal to this value.
threshold
may also be a vector of two percentiles,
in which case samples
will be divided into High, Low, and Mid groups.
The survival p-value will be computed using only the high and low
groups, with the mid group samples excluded.
classes
parameter) should be considered as the default
value when computing the performance of a "mode" classifier. Is missing,
the actual mode (most commonly occurring) value of the training set will be used.
survival
and validationSamples
parameters are provided,
a Kaplan-Meier plot can be plotted automatically for the training set samples if
this is TRUE
. A value of FALSE
will suppress the plot being
automatically generated.
survival
parameter is provided,
a Kaplan-Meier plot can be plotted automatically for the validation set samples if
this is TRUE
. A value of FALSE
will suppress the plot being
automatically generated.
Note that is the validationSamples
parameter is missing, the resulting
plot will be over all samples.
TRUE
, missing data values in the expressionSet
will be imputed.
If FALSE
, any features with any missing values will be removed
from the dataset.
SigCheckObject
is returned.
SigCheckObject
and carried out
a baseline analysis,
which will vary depending on which parameters are specified.If the survival
parameter is specified, a survival analysis
is carried out.
If the validationSamples
parameter is specified, this will be done
separately on the validation samples and the remaining
(training/discovery) samples.
The main result is a p-value indicating the confidence that the samples are
separable into groups with distinct survival outcomes. This value is obtained
using the survdiff
function in the survival
package
(and applying pchisq
to the
$chisq
component of the result). The samples are separated into groups
using the scoreMethod
and threshold
parameters
(and possibly the classifierMethod
parameter).
If the survival
parameter is not specified, then the scoreMethod
parameter must be equal to "classifier"
, and a pure classification
analysis is completed (as was done in SigCheck 1.0
).
If the validationSamples
parameter is specified, the remaining samples
are used as a training set to construct a classifier that is used
to classify the validation samples. If validationSamples
is not
specified, leave-one-out cross-validation is used whereby a separate
classifier is trained to predict each sample using all of the others.
sigCheckAll
, sigCheckRandom
,
sigCheckKnown
, sigCheckPermuted
.
library(breastCancerNKI)
data(nki)
nki <- nki[,!is.na(nki$e.dmfs)]
data(knownSignatures)
## survival analysis
check <- sigCheck(nki, classes="e.dmfs", survival="t.dmfs",
signature=knownSignatures$cancer$VANTVEER,
annotation="HUGO.gene.symbol")
check@survivalPval
check <- sigCheck(check, classes="e.dmfs", survival="t.dmfs",
signature=knownSignatures$cancer$VANTVEER,
annotation="HUGO.gene.symbol",
scoreMethod="High", threshold=.33)
check@survivalPval
## survival analysis with separate training and validation using SVM
check <- sigCheck(nki, classes="e.dmfs", survival="t.dmfs",
signature=knownSignatures$cancer$VANTVEER,
annotation="HUGO.gene.symbol",
validationSamples=150:319,
scoreMethod="classifier")
check
Run the code above in your browser using DataLab