classification: General method for classification with various methods

Description

Most general function in the package, providing an interface to perform variable selection, hyperparameter tuning and classification in one step. Alternatively, the first two steps can be performed separately and can then be plugged into this function. For S4 method information, s. classification-methods.

Usage

classification(X, y, f, learningsets, genesel, genesellist = list(), nbgene, classifier, tuneres, tuninglist = list(), trace = TRUE, models=FALSE,...)

Arguments

Gene expression data. Can be one of the following:

A matrix. Rows correspond to observations, columns to variables.
A data.frame, when f is not missing (s. below).
An object of class ExpressionSet.

Class labels. Can be one of the following:

A numeric vector.
A factor.
A character if X is an ExpressionSet that specifies the phenotype variable.
missing, if X is a data.frame and a proper formula f is provided.

WARNING: The class labels will be re-coded to range from 0 to K-1, where K is the total number of different classes in the learning set.

A two-sided formula, if X is a data.frame. The left part correspond to class labels, the right to variables.

learningsets

An object of class learningsets. May be missing, then the complete datasets is used as learning set.

genesel

Optional (but usually recommended) object of class genesel containing variable importance information for the argument learningsets

genesellist

In the case that the argument genesel is missing, this is an argument list passed to GeneSelection. If both genesel and genesellist are missing, no variable selection is performed.

nbgene

Number of best genes to be kept for classification, based on either genesel or the call to GeneSelection using genesellist. In the case that both are missing, this argument is not necessary. note:

If the gene selection method has been one of "lasso", "elasticnet", "boosting", nbgene will be reset to min(s, nbgene) where s is the number of nonzero coefficients.
if the gene selection scheme has been "one-vs-all", "pairwise" for the multiclass case, there exist several rankings. The top nbgene will be kept of each of them, so the number of effective used genes will sometimes be much larger.

classifier

Name of function ending with CMA indicating the classifier to be used.

tuneres

Analogous to the argument genesel - object of class tuningresult containing information about the best hyperparameter choice for the argument learningsets.

tuninglist

Analogous to the argument genesellist. In the case that the argument tuneres is missing, this in argument list passed to tune. If both tuneres and tuninglist are missing, no variable selection is performed. warning: Note that if a user-defined hyperparameter grid is passed, this will result in a list within a list: tuninglist = list(grids=list(argname = c()), s. example. warning: Contrary to tune, if tuninglist is an empty list (default), no hyperparameter tuning will be performed at all. To use pre-defined hyperparameter grids, the argument is tuninglist = list(grids = list()).

trace

Should progress be traced ? Default is TRUE.

models

a logical value indicating whether the model object shall be returned

...

Further arguments passed to the function classifier.

Value

cloutput and clvarseloutput, respectively; its length equals the number of different learningsets. The single elements of the list can convenienly be combined using the join function. The results can be analyzed and evaluated by various measures using the method evaluation.

Details

For details about hyperparameter tuning, consult tune.

References

Slawski, M. Daumer, M. Boulesteix, A.-L. (2008) CMA - A comprehensive Bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics 9: 439

Examples

Run this code

### a simple k-nearest neighbour example
### datasets
## Not run: plot(x)
# data(golub)
# golubY <- golub[,1]
# golubX <- as.matrix(golub[,-1])
# ### learningsets
# set.seed(111)
# lset <- GenerateLearningsets(y=golubY, method = "CV", fold=5, strat =TRUE)
# ### 1. GeneSelection
# selttest <- GeneSelection(golubX, golubY, learningsets = lset, method = "t.test")
# ### 2. tuning
# tunek <- tune(golubX, golubY, learningsets = lset, genesel = selttest, nbgene = 20, classifier = knnCMA)
# ### 3. classification
# knn1 <- classification(golubX, golubY, learningsets = lset, genesel = selttest,
#                        tuneres = tunek, nbgene = 20, classifier = knnCMA)
# ### steps 1.-3. combined into one step:
# knn2 <- classification(golubX, golubY, learningsets = lset,
#                        genesellist = list(method  = "t.test"), classifier = knnCMA,
#                        tuninglist = list(grids = list(k = c(1:8))), nbgene = 20)
# ### show and analyze results:
# knnjoin <- join(knn2)
# show(knn2)
# eval <- evaluation(knn2, measure = "misclassification")
# show(eval)
# summary(eval)
# boxplot(eval)
# ## End(Not run)

Run the code above in your browser using DataLab