learningsets,
this method ranks the genes from the most relevant to the less relevant using
one of various 'filter' criteria or provides a sparse collection of variables
(Lasso, ElasticNet, Boosting). The results are typically used for variable selection for
the classification procedure that follows.
For S4 class information, s. GeneSelection-methods.
GeneSelection(X, y, f, learningsets, method = c("t.test", "welch.test", "wilcox.test", "f.test", "kruskal.test", "limma", "rfe", "rf", "lasso", "elasticnet", "boosting", "golub", "shrinkcat"), scheme, trace = TRUE, ...)matrix. Rows correspond to observations, columns to variables.
data.frame, when f is not missing (s. below).
ExpressionSet.
numeric vector.
factor.
character if X is an ExpressionSet.
missing, if X is a data.frame and a
proper formula f is provided.
X is a data.frame. The
left part correspond to class labels, the right to variables.learningsets. May
be missing, then the complete datasets is used as
learning set.t.test
welch.test
wilcox.test
f.testmethod = t.test in the two-class case.
kruskal.test
limmalimma.
rfee1071.
Take care that appropriate hyperparameters are passed by the ... argument.
rfrandomForest
lassoL1 penalized logistic regression leads to sparsity with respect
to the variables used. Calls the function LassoCMA, which requires
the package glmpath.
warning: Take care that appropriate hyperparameters are passed by the ... argument.
elasticnetL1 and L2 penalty, claimed
by Zhou and Hastie (2004) to select 'variable groups'. Calls
the function ElasticNetCMA, which requires the package glmpath.
warning: Take care that appropriate hyperparameters are passed by the ... argument.
boostingcompBoostCMA
Take care that appropriate hyperparameters are passed by the ... argument.
golubgolub.
shrinkcat
"pairwise","one-vs-all" or "multiclass". The
last case only makes sense if method is one of f.test, limma, rf, boosting,
which can directly be applied to the multi class case.TRUE.method.Guyon, I., Weston, J., Barnhill, S., Vapnik, V. (2002). Gene Selection for Cancer Classification using support vector machines. Journal of Machine Learning Research, 46, 389-422
Zhou, H., Hastie, T. (2004). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society B, 67(2),301-320
Buelmann, P., Yu, B. (2003). Boosting with the L2 loss: Regression and Classification. Journal of the American Statistical Association, 98, 324-339 Efron, B., Hastie, T., Johnstone, I., Tibshirani, R. (2004). Least Angle Regression. Annals of Statistics, 32:407-499
Buehlmann, P., Yu, B. (2006). Sparse Boosting. Journal of Machine Learning Research, 7- 1001:1024 Slawski, M. Daumer, M. Boulesteix, A.-L. (2008) CMA - A comprehensive Bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics 9: 439
filter, GenerateLearningsets, tune,
classification# load Golub AML/ALL data
data(golub)
### extract class labels
golubY <- golub[,1]
### extract gene expression from first 10 genes
golubX <- as.matrix(golub[,-1])
### Generate five different learningsets
set.seed(111)
five <- GenerateLearningsets(y=golubY, method = "CV", fold = 5, strat = TRUE)
### simple t-test:
selttest <- GeneSelection(golubX, golubY, learningsets = five, method = "t.test")
### show result:
show(selttest)
toplist(selttest, k = 10, iter = 1)
plot(selttest, iter = 1)
Run the code above in your browser using DataLab