Learn R Programming

messina (version 1.8.2)

messinaSurv: Find optimal prognostic features using the Messina algorithm

Description

Run the MessinaSurv algorithm to find features (eg. genes) that can define groups of patients with very different survival times.

Usage

messinaSurv(x, y, obj_min, obj_func, min_group_frac = 0.1, f_train = 0.8,
  n_boot = 50, seed = NULL, parallel = NULL, silent = FALSE)

Arguments

x
feature expression values, either supplied as an ExpressionSet, or as an object that can be converted to a matrix by as.matrix. In the latter case, features should be in rows and samples in columns, with feature names taken from the rows of the object.
y
a Surv object containing survival times and censoring status for each
obj_min
the minimum acceptable value of the objective metric. The metric used is specified by the parameter obj_func.
obj_func
the metric function that measures the difference in survival between patients with feature values above, and below, the threshold. Valid values are "tau", "reltau", or "coxcoef"; see details for more information.
min_group_frac
the size of the smallest sample group that is allowed to be generated by thresholding, as a fraction of the total sample. The default value of 0.1 means that no thresholds will be selected that result in a sample split yielding a group of smaller than 10 the samples. A modest value of this parameter increases the stability of the "reltau" and "coxcoef" objectives, which tend to become unstable as the number of samples in a group becomes very low; see details.
f_train
the fraction of samples to be used in the training splits of the bootstrap rounds.
n_boot
the number of bootstrap rounds to use.
seed
an optional random seed for the analysis. If NULL, the R PRNG is used as-is.
parallel
should calculations be parallelized using the doMC framework? If NULL, parallel mode is used if the doMC library is loaded, and more than one core has been registered with registerDoMC(). Note that no progress bar is displayed in parallel mode.
silent
be completely silent (except for error and warning messages)?

Value

  • an object of class "MessinaSurvResult" containing the results of the analysis.

Minimum group fraction

The parameter min_group_frac limits the size of the smallest subgroups that messinaSurv can select. As the groups become smaller, the "reltau" and "coxcoef" objective functions become unstable, and can generate spurious results. These are seen on the diagnostics produced by the messina plot functions as very high objective values at very low and high threshold values. To control these results, set min_group_frac to a high enough value that the objective functions reliably fit. Generally, max(0.1, 10/N), where N is the total number of patients, is sufficient. Keep in mind that setting this parameter too high will limit messinaSurv's ability to identify small subsets of patients with dramatically different survival from the rest: the smallest subset that will be reliably identified is min_group_frac of patients.

Details

The MessinaSurv algorithm aims to identify features for which patients with high signal and patients with low signal have very different survival outcomes. This is achieved by definining an objective function which assigns a numerical value to how strongly the survival in two groups of patients differs, then assessing the value of this objective at different signal levels of each feature. Those features for which, at a given signal level, the objective function is consistently above a user-supplied minimum level, are selected by MessinaSurv as being single-feature survival predictors.

MessinaSurv has applications as an algorithm to identify features that are survival-related, as well as a principled method to identify threshold signal values to separate a cohort into poor- and good-prognosis subgroups. It can also be used as a feature filter, selecting and discretising survival-related features before they are input into a multivariate predictor.

See Also

MessinaSurvResult-class

ExpressionSet

messina

messinaDE

Examples

Run this code
## Load a subset of the TCGA renal clear cell carcinoma data
## as an example.
data(tcga_kirc_example)

## Run the messinaSurv analysis on these data.  Use a tau
## objective, with a minimum performance of 0.6.  Note that
## messinaSurv analyses are very computationally-intensive,
## so in actual use multicore use with doMC and parallel = TRUE
## is strongly recommended.
fit = messinaSurv(kirc.exprs, kirc.surv, obj_func = "tau", obj_min = 0.6)

fit
plot(fit)

Run the code above in your browser using DataLab