`sda.ranking`

determines a ranking of predictors by computing CAT scores
(correlation-adjusted t-scores)
between the group centroids and the pooled mean. `plot.sda.ranking`

provides a graphical visualization of the top ranking features..

```
sda.ranking(Xtrain, L, lambda, lambda.var, lambda.freqs, ranking.score=c("entropy", "avg", "max"), diagonal=FALSE, fdr=TRUE, plot.fdr=FALSE, verbose=TRUE)
"plot"(x, top=40, arrow.col="blue", zeroaxis.col="red", ylab="Features", main, ...)
```

Xtrain

A matrix containing the training data set. Note that
the rows correspond to observations and the columns
to variables.

L

A factor with the class labels of the training samples.

lambda

Shrinkage intensity for the correlation matrix. If not specified it is
estimated from the data.

`lambda=0`

implies no shrinkage
and `lambda=1`

complete shrinkage. lambda.var

Shrinkage intensity for the variances. If not specified it is
estimated from the data.

`lambda.var=0`

implies no shrinkage
and `lambda.var=1`

complete shrinkage. lambda.freqs

Shrinkage intensity for the frequencies. If not specified it is
estimated from the data.

`lambda.freqs=0`

implies no shrinkage (i.e. empirical frequencies)
and `lambda.freqs=1`

complete shrinkage (i.e. uniform frequencies). diagonal

Chooses between LDA (default,

`diagonal=FALSE`

) and DDA (`diagonal=TRUE`

).ranking.score

how to compute the summary score for each variable from the CAT scores of all classes - see Details.

fdr

compute FDR values and HC scores for each feature.

plot.fdr

Show plot with estimated FDR values.

verbose

Print out some info while computing.

x

An "sda.ranking" object -- this is produced by the sda.ranking() function.

top

The number of top-ranking features shown in the plot (default: 40).

arrow.col

Color of the arrows in the plot (default is

`"blue"`

).zeroaxis.col

Color for the center zero axis (default is

`"red"`

).ylab

Label written next to feature list (default is

`"Features"`

).main

Main title (if missing,

`"The", top, "Top Ranking Features"`

is used)....

Other options passed on to generic plot().

- idx
- original feature number
- score
- sum of the squared CAT scores across groups - this determines the overall ranking of a feature
- cat
- for each group and feature the cat score of the centroid versus the pooled mean If

`sda.ranking`

returns a matrix with the following columns:`fdr=TRUE`

then additionally local false discovery rate (FDR) values
as well as higher criticism (HC) scores are computed for each feature
(using `fdrtool`

).The overall ranking of a feature is determine by computing a summary score from the CAT scores.
This is controlled by the option `ranking.score`

. The default setting
(`ranking.score="entropy"`

) uses mutual information
between the response and the respective predictors (`ranking.score`

) for ranking. This is equivalent to
a weighted sum of squared CAT scores across the classes. Another possibility is to employ
the average of the squared CAT scores for ranking (as suggested in Ahdesm\"aki and Strimmer 2010)
by setting `ranking.score="avg"`

. A third option is to use the maximum of the squared CAT scores across groups (similarly as in the PAM algorithm) via setting `ranking.score="max"`

.
Note that in the case of two classes all three options are equivalent and
lead to identical scores. Thus, the choice of `ranking.score`

is important only
in the multi-class setting. In the two-class case the features are simply ranked according to the
(shrinkage) squared CAT-scores (or t-scores, if there is no correlation among predictors).

The current default approach is to use ranking by mutual information (i.e. relative entropy
between full model vs. model without predictor) and to use shrinkage estimators of frequencies.
In order to reproduce exactly the ranking computed by previous versions (1.1.0 to 1.3.0) of the `sda`

package set the options `ranking.score="avg"`

and `lambda.freqs=0`

.

Calling `sda.ranking`

is step 1 in a classification analysis with the
sda package. Steps 2 and 3 are
`sda`

and `predict.sda`

See Zuber and Strimmer (2009) for CAT scores in general, and Ahdesm\"aki and Strimmer (2010) for details on multi-class CAT scores. For shrinkage t scores see Opgen-Rhein and Strimmer (2007).

Opgen-Rhein, R., and K. Strimmer. 2007. Accurate ranking of differentially expressed genes by a distribution-free shrinkage approach. Statist. Appl. Genet. Mol. Biol. 6:9.

Zuber, V., and K. Strimmer. 2009. Gene ranking and biomarker discovery under correlation. Bioinformatics 25: 2700-2707. Preprint available from http://arxiv.org/abs/0902.0751.

`catscore`

, `sda`

, `predict.sda`

.# load sda library library("sda") ################# # training data # ################# # prostate cancer set data(singh2002) # training data Xtrain = singh2002$x Ytrain = singh2002$y ######################################### # feature ranking (diagonal covariance) # ######################################### # ranking using t-scores (DDA) ranking.DDA = sda.ranking(Xtrain, Ytrain, diagonal=TRUE) ranking.DDA[1:10,] # plot t-scores for the top 40 genes plot(ranking.DDA, top=40) # number of features with local FDR < 0.8 # (i.e. features useful for prediction) sum(ranking.DDA[,"lfdr"] < 0.8) # number of features with local FDR < 0.2 # (i.e. significant non-null features) sum(ranking.DDA[,"lfdr"] < 0.2) # optimal feature set according to HC score plot(ranking.DDA[,"HC"], type="l") which.max( ranking.DDA[1:1000,"HC"] ) ##################################### # feature ranking (full covariance) # ##################################### # ranking using CAT-scores (LDA) ranking.LDA = sda.ranking(Xtrain, Ytrain, diagonal=FALSE) ranking.LDA[1:10,] # plot t-scores for the top 40 genes plot(ranking.LDA, top=40) # number of features with local FDR < 0.8 # (i.e. features useful for prediction) sum(ranking.LDA[,"lfdr"] < 0.8) # number of features with local FDR < 0.2 # (i.e. significant non-null features) sum(ranking.LDA[,"lfdr"] < 0.2) # optimal feature set according to HC score plot(ranking.LDA[,"HC"], type="l") which.max( ranking.LDA[1:1000,"HC"] )