Learn R Programming

ClassifyR (version 1.6.2)

KolmogorovSmirnovSelection: Selection of Differential Distributions with Kolmogorov-Smirnov Distance

Description

Ranks features by largest Kolmogorov-Smirnov distance and chooses the features which have best resubstitution performance.

Usage

"KolmogorovSmirnovSelection"(expression, classes, ...) "KolmogorovSmirnovSelection"(expression, datasetName, trainParams, predictParams, resubstituteParams, ..., selectionName, verbose = 3)

Arguments

expression
Either a matrix or ExpressionSet containing the training data. For a matrix, the rows are features, and the columns are samples.
classes
A vector of class labels.
datasetName
A name for the dataset used. Stored in the result.
trainParams
A container of class TrainParams describing the classifier to use for training.
predictParams
A container of class PredictParams describing how prediction is to be done.
resubstituteParams
An object of class ResubstituteParams describing the performance measure to consider and the numbers of top features to try for resubstitution classification.
...
For the matrix method, variables passed to the ExpressionSet method. For the ExpressionSet method, the options to be passed to function ks.test.
selectionName
A name to identify this selection method by. Stored in the result.
verbose
A number between 0 and 3 for the amount of progress messages to give. This function only prints progress messages if the value is 3.

Value

An object of class SelectResult or a list of such objects, if the classifier which was used for determining resubstitution error rate made a number of prediction varieties.

Details

Features are sorted in order of biggest distance to smallest. The top number of features is used in a classifier, to determine which number of features has the best resubstitution performance.

Examples

Run this code
  if(require(sparsediscrim))
  {
    # First 20 features have bimodal distribution for Poor class. Other 80 features have normal distribution for
    # both classes.
    genesMatrix <- sapply(1:25, function(sample) c(rnorm(20, sample(c(8, 12), 20, replace = TRUE), 1), rnorm(80, 10, 1)))
    genesMatrix <- cbind(genesMatrix, sapply(1:25, function(sample) rnorm(100, 10, 1)))
    classes <- factor(rep(c("Poor", "Good"), each = 25))
    KolmogorovSmirnovSelection(genesMatrix, classes, "Example",
                               trainParams = TrainParams(naiveBayesKernel, FALSE, doesTests = TRUE),
                               predictParams = PredictParams(function(){}, FALSE, getClasses = function(result) result),
                               resubstituteParams = ResubstituteParams(nFeatures = seq(10, 100, 10), performanceType = "balanced", better = "lower"))
  }

Run the code above in your browser using DataLab