limma (version 3.28.14)

plotRLDF: Plot of regularized linear discriminant functions for microarray data

Description

Plot regularized linear discriminant functions for classifying samples based on expression data.

Usage

plotRLDF(y, design=NULL, z=NULL, labels.y=NULL, labels.z=NULL, col.y="black", col.z="black", show.dimensions=c(1,2), ndim=max(show.dimensions), nprobes=100, plot=TRUE, var.prior=NULL, df.prior=NULL, trend=FALSE, robust=FALSE, ...)

Arguments

y
the training dataset. Can be any data object which can be coerced to a matrix, such as ExpressionSet or EList.
design
design matrix defining the training groups to be distinguished. The first column is assumed to represent the intercept. Defaults to model.matrix(~factor(labels.y)).
z
the dataset to be classified. Can be any data object which can be coerced to a matrix, such as ExpressionSet or EList. Rows must correspond to rows of y.
labels.y
character vector of sample names or labels in y. Defaults to colnames(y) or failing that to 1:n.
labels.z
character vector of sample names or labels in z. Defaults to colnames(z) or failing that to letters[1:n].
col.y
colors for the plotting labels.y.
col.z
colors for the plotting labels.z.
show.dimensions
integer vector of length two indicating which two discriminant functions to plot. Functions are in decreasing order of discriminatory power.
ndim
number of discriminant functions to compute
nprobes
number of probes to be used for the calculations. The probes will be selected by moderated F statistic.
plot
logical, should a plot be created?
var.prior
prior variances, for regularizing the within-group covariance matrix. By default is estimated by squeezeVar.
df.prior
prior degrees of freedom for regularizing the within-group covariance matrix. By default is estimated by squeezeVar.
trend
logical, should a trend be estimated for var.prior? See eBayes for details. Only used if var.prior or df.prior are NULL.
robust
logical, should var.prior and df.prior be estimated robustly? See eBayes for details. Only used if var.prior or df.prior are NULL.
...
any other arguments are passed to plot.

Value

If plot=TRUE a plot is created on the current graphics device. A list containing the following components is (invisibly) returned:
training
numeric matrix with ncol(y) rows and ndim columns containing discriminant functions evaluated for the training data.
predicting
numeric matrix with ncol(z) rows and ndim columns containing discriminant functions evalulated on the classification data.
top
integer vector of length nprobes giving indices of probes used.
metagenes
numeric matrix with nprobes rows and ndim columns containing probe weights defining each discriminant function.
singular.values
singular.values showing the predictive power of each discriminant function.
rank
maximum number of discriminant functions with singular.values greater than zero.
var.prior
numeric vector of prior variances.
df.prior
numeric vector of prior degrees of freedom.

Details

The function builds discriminant functions from the training data (y) and applies them to the test data (z). The method is a variation on classifical linear discriminant functions (LDFs), in that the within-group covariance matrix is regularized to ensure that it is invertible, with eigenvalues bounded away from zero. The within-group covariance matrix is squeezed towards a diagonal matrix with empirical Bayes posterior variances as diagonal elements.

The calculations are based on a filtered list of probes. The nprobes probes with largest moderated F statistics are used to discriminate.

The ndim argument allows all required LDFs to be computed even though only two are plotted.

See Also

lda in package MASS

Examples

Run this code
# Simulate gene expression data for 1000 probes and 6 microarrays.
# Samples are in two groups
# First 50 probes are differentially expressed in second group
sd <- 0.3*sqrt(4/rchisq(1000,df=4))
y <- matrix(rnorm(1000*6,sd=sd),1000,6)
rownames(y) <- paste("Gene",1:1000)
y[1:50,4:6] <- y[1:50,4:6] + 2

z <- matrix(rnorm(1000*6,sd=sd),1000,6)
rownames(z) <- paste("Gene",1:1000)
z[1:50,4:6] <- z[1:50,4:6] + 1.8
z[1:50,1:3] <- z[1:50,1:3] - 0.2

design <- cbind(Grp1=1,Grp2vs1=c(0,0,0,1,1,1))
options(digit=3)

# Samples 1-6 are training set, samples a-f are test set:
plotRLDF(y, design, z=z, col.y="black", col.z="red")
legend("top", pch=16, col=c("black","red"), legend=c("Training","Predicted"))

Run the code above in your browser using DataCamp Workspace