Learn R Programming

clValid (version 0.6-2)

BSI: Biological Stability Index

Description

Calculates the biological stability index (BSI) for a given statistical clustering partition and biological annotation.

Usage

BSI(statClust, statClustDel, annotation, names = NULL, category = "all",
goTermFreq = 0.05, dropEvidence=NULL)

Arguments

statClust
An integer vector indicating the statistical cluster partitioning
statClustDel
An integer vector indicating the statistical cluster partitioning based on one column removed
annotation
Either a character string naming the Bioconductor annotation package for mapping genes to GO categories, or a matrix where each column is a logical vector indicating which genes belong to the biological functional class. See details below.
names
An optional vector of names for the observations
category
Indicates the GO categories to use for biological validation. Can be one of "BP", "MF", "CC", or "all".
goTermFreq
What threshold frequency of GO terms to use for functional annotation.
dropEvidence
Which GO evidence codes to omit. Either NULL or a character vector, see 'Details' below.

Value

  • Returns the BSI value corresponding to the particular column that was removed.

Details

The BSI inspects the consistency of clustering for genes with similar biological functionality. Each sample is removed, and the cluster membership for genes with similar functional annotation is compared with the cluster membership using all available samples. The BSI is in the range [0,1], with larger values corresponding to more stable clusters of the functionally annotated genes. For details see the package vignette. The dropEvidence argument indicates which GO evidence codes to omit. For example, "IEA" is a relatively weak association based only on electronic information, and users may wish to omit this evidence when determining the functional annotation classes. When inputting the biological annotation and functional classes directly, the BSI function expects the input in ``matrix'' format, where each column is a logical vector indicating which genes belong to the biological class. For details on how to input the biological annotation from an Excel file see readAnnotationFile and for converting from list to matrix format see annotationListToMatrix. NOTE: The BSI function only calculates these measures for one particular column removed. To get the overall scores, the user must average the measures corresponding to each removed column.

References

Datta, S. and Datta, S. (2006). Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes. BMC Bioinformatics 7:397.

See Also

For a description of the function 'clValid' see clValid. For a description of the class 'clValid' and all available methods see clValidObj or clValid-class. For additional help on the other validation measures see connectivity, dunn, stability, and BHI.

Examples

Run this code
data(mouse)
express <- mouse[1:25,c("M1","M2","M3","NC1","NC2","NC3")]
rownames(express) <- mouse$ID[1:25]
## hierarchical clustering
Dist <- dist(express,method="euclidean")
clusterObj <- hclust(Dist, method="average")
nc <- 4 ## number of clusters      
cluster <- cutree(clusterObj,nc)

## first way - functional classes predetermined
fc <- tapply(rownames(express),mouse$FC[1:25], c)
fc <- fc[-match( c("EST","Unknown"), names(fc))]
fc <- annotationListToMatrix(fc, rownames(express))
bsi <- numeric(ncol(express))
## Need loop over all removed samples
for (del in 1:ncol(express)) {
  matDel <- express[,-del]               
  DistDel <- dist(matDel,method="euclidean")
  clusterObjDel <- hclust(DistDel, method="average")
  clusterDel <- cutree(clusterObjDel,nc)
  bsi[del] <- BSI(cluster, clusterDel, fc)
}
mean(bsi)

## second way - using Bioconductor
if(require("Biobase") && require("annotate") && require("GO.db") &&
   require("moe430a.db")) {
  bsi <- numeric(ncol(express))
  for (del in 1:ncol(express)) {
    matDel <- express[,-del]               
    DistDel <- dist(matDel,method="euclidean")
    clusterObjDel <- hclust(DistDel, method="average")
    clusterDel <- cutree(clusterObjDel,nc)
    bsi[del] <- BSI(cluster, clusterDel, annotation="moe430a.db",
                    names=rownames(express), category="all")
  }
  mean(bsi)
}

Run the code above in your browser using DataLab