distribution: Get Frequencies of Feature Selection and Sample Errors

Description

There are two modes. For aggregating feature selection results, the function counts the number of times each feature was selected in all cross validations. For aggregating classification results, the error rate for each sample is calculated. This is useful in identifying outlier samples that are difficult to classify.

Usage

"distribution"(result, type = c("features", "samples"), summary = c("density", "frequency"), plot = TRUE, xMax = NULL, ...)

Arguments

result

An object of class ClassifyResult.

type

Whether to calculate sample-wise error rate or the number of times a feature was selected.

summary

Whether to plot frequencies or densities. If feature distribution is analysed, it will also cause the retured vector to be a decimal representing the percentage.

plot

Whether to draw a histogram of the aggregation.

xMax

Maximum bin value for histogram to plot.

...

Further parameters, such as colour and fill, passed to geom_histogram.

Value

If type is "features", a vector as long as the number of features that were chosen at least once containing the number of times the feature was chosen in cross validations. If type is "samples", a vector as long as the number of samples, containing the cross validation error rate of the sample.

Examples

Run this code

  if(require(curatedOvarianData) && require(sparsediscrim))
  {
    data(TCGA_eset)
    badOutcome <- which(pData(TCGA_eset)[, "vital_status"] == "deceased" & pData(TCGA_eset)[, "days_to_death"] <= 365)
    goodOutcome <- which(pData(TCGA_eset)[, "vital_status"] == "living" & pData(TCGA_eset)[, "days_to_death"] >= 365 * 5)
    TCGA_eset <- TCGA_eset[, c(badOutcome, goodOutcome)]
    classes <- factor(rep(c("Poor", "Good"), c(length(badOutcome), length(goodOutcome))))
    pData(TCGA_eset)[, "class"] <- classes
    result <- runTests(TCGA_eset, "Ovarian Cancer", "Differential Expression", resamples = 2, fold = 2)
    sampleDistribution <- distribution(result, "samples", binwidth = 0.1)
    featureDistribution <- distribution(result, "features", binwidth = 1)
    print(head(sampleDistribution))
    print(head(featureDistribution))
  }

Run the code above in your browser using DataLab