mixmodels: Selection of Differential Distributions with Mixtures of Normals

Description

Fits mixtures of normals for every gene, separately for each class.

Usage

"mixModelsTrain"(expression, classes, ...) "mixModelsTrain"(expression, ..., verbose = 3) "mixModelsTest"(models, test, ...) "mixModelsTest"(models, test, weighted = c("both", "unweighted", "weighted"), weight = c("all", "height difference", "crossover distance", "sum differences"), densityXvalues = 1024, minDifference = 0, returnType = c("label", "score", "both"), verbose = 3)

Arguments

expression

Either a matrix or ExpressionSet containing the training data. For a matrix, the rows are features, and the columns are samples.

test

Either a matrix or ExpressionSet containing the test data. For a matrix, the rows are features, and the columns are samples.

classes

A vector of class labels.

weighted

In weighted mode, the difference in densities is summed over all features. If unweighted mode, each features's vote is worth the same. To save computational time, both can be calculated simultaneously.

weight

The type of weight to calculate. For "height difference", the weight of each prediction is equal to the sum of the verical distances for all of the mixture components within one class subtracted from the sum of the components of the other class, summed for each value of x. For "crossover distance", the x positions where two mixture densities cross is firstly calculated. The predicted class is the class with the highest mixture sum at the particular value of x and the weight is the distance of x from the nearest density crossover point.

densityXvalues

Only relevant when weight is "crossover distance". The number of equally-spaced locations at which to calculate y values for each mixture density.

minDifference

The minimum difference in sums of mixture densities within each class for a feature to be allowed to vote. Can be a vector of cutoffs. If no features for a particular sample have a difference large enough, the class predicted is simply the largest class.

...

For the training or testing function with matrix dispatch, arguments passed to the function with ExpressionSet dispatch. For the training function with ExpressionSet dispatch, extra arguments passed to mixmodCluster. The argument nbCluster is mandatory.

models

A list of length 2 of models generated by the training function. The first element has mixture models the same length as the number of features in the expression data for one class. The second element has the same information for the other class.

returnType

Either "label", "score", or "both". Sets the return value from the prediction to either a vector of class labels, score for a sample belonging to the second class, as determined by the factor levels, or both labels and scores in a data.frame.

verbose

A number between 0 and 3 for the amount of progress messages to give. A higher number will produce more messages.

Value

For mixModelsTrain, a list of trained models of class MixmodCluster. A vector or list of class prediction information, as long as the number of samples in the test data, or lists of such information, if both weighted and unweighted voting or a range of minDifference values was provided.

Details

If weighted is TRUE, then a sample's predicted class is the class with the largest sum of weights, scaled for the number of samples in the training data of each class. Otherwise, when weighted is FALSE, each feature has an equal vote, and votes for the class with the largest weight, scaled for class sizes in the training set.

If weight is "crossover distance", the crossover points are computed by considering the distance between y values of the two densities at every x value. x values for which the sign of the difference changes compared to the difference of the closest lower value of x are used as the crossover points. Setting weight to "sum differences" is intended to find a mix of features which are strongly differentially expressed and differentially variable.

Examples

Run this code

  # First 25 samples are mixtures of two normals. Last 25 samples are one normal.
  genesMatrix <- sapply(1:25, function(geneColumn) c(rnorm(50, 5, 1), rnorm(50, 15, 1)))
  genesMatrix <- cbind(genesMatrix, sapply(1:25, function(geneColumn) rnorm(100, 9, 3)))
  classes <- factor(rep(c("Poor", "Good"), each = 25))
  trained <- mixModelsTrain(genesMatrix, classes, nbCluster = 1:3)
  mixModelsTest(trained, genesMatrix, minDifference = 1:3)

Run the code above in your browser using DataLab