likelihoodRatioSelection: Selection of Differential Distributions with Likelihood Ratio Statistic

Description

Ranks features by largest ratio and chooses the features which have the best resubstitution performance.

Usage

"likelihoodRatioSelection"(expression, classes, ...) "likelihoodRatioSelection"(expression, datasetName, trainParams, predictParams, resubstituteParams, alternative = c(location = "different", scale = "different"), ..., selectionName = "Likelihood Ratio Test (Normal)", verbose = 3)

Arguments

expression

Either a matrix or ExpressionSet containing the training data. For a matrix, the rows are features, and the columns are samples.

classes

A vector of class labels.

datasetName

A name for the dataset used. Stored in the result.

trainParams

A container of class TrainParams describing the classifier to use for training.

predictParams

A container of class PredictParams describing how prediction is to be done.

resubstituteParams

An object of class ResubstituteParams describing the performance measure to consider and the numbers of top features to try for resubstitution classification.

alternative

A vector of length 2. The first element specifies the location of the alternate hypothesis. The second element specifies the scale of the alternate hypothesis. Acceptable values are "same" or "different".

...

Either variables passed from the matrix method to the ExpressionSet method or variables passed to getLocationsAndScales from the ExpressionSet method.

selectionName

A name to identify this selection method by. Stored in the result.

verbose

A number between 0 and 3 for the amount of progress messages to give. This function only prints progress messages if the value is 3.

Value

A list of length 2. The first element has the features ranked from most important to least important. The second element has the features that were selected to be used for classification.

Details

Likelihood ratio test of null hypothesis that the location and scale are the same for both groups, and an alternate hypothesis that is specified by parameters. The location and scale of features is calucated by getLocationsAndScales. The distribution fitted in the normal distribution.

Examples

Run this code

  if(require(sparsediscrim))
  {
    # First 20 features have bimodal distribution for Poor class. Other 80 features have normal distribution for
    # both classes.
    genesMatrix <- sapply(1:25, function(sample) c(rnorm(20, sample(c(8, 12), 20, replace = TRUE), 1), rnorm(80, 10, 1)))
    genesMatrix <- cbind(genesMatrix, sapply(1:25, function(sample) rnorm(100, 10, 1)))
    classes <- factor(rep(c("Poor", "Good"), each = 25))
    likelihoodRatioSelection(genesMatrix, classes, "Example",
                             trainParams = TrainParams(naiveBayesKernel, FALSE, TRUE),
                             predictParams = PredictParams(function(){}, FALSE, getClasses = function(result) result),
                             resubstituteParams = ResubstituteParams(nFeatures = seq(10, 100, 10), performanceType = "balanced", better = "lower"))
  }

Run the code above in your browser using DataLab