clusterRepro: Gene expression clusters reproducibility and validation

Description

Validate gene expression clusters by determining whether or not they are reproducible

Usage

clusterRepro(Centroids, New.data, Number.of.permutations)

Arguments

Centroids

The matrix of centroids with annotated rows. The labeled rows are either genes (for gene clusters) or samples (for sample clusters) and the columns are the centroids.

New.data

The matrix of gene expression data (with annotated rows) independent of the dataset used to form the centroids. For gene clusters, the rows are samples and the columns are genes. For sample clusters, the rows are genes and the columns are samples.

Number.of.permutations

The number of times the centroids will be permuted to generate the null distribution.

Value

Actual.Size

The number of columns of New.data assigned to each centroid.

Actual.IGP

The in-group proportions of the groups formed when New.data is classified using Centroids.

p.value

The p-values for each of the groups represented by the centroids.

Number

The number of permutations used to compute the corresponding p-value.

Details

This function looks for gene expression clusters found in one dataset in another independent dataset. The centroids from the first dataset are used to classify the independent data and the corresponding in-group proportions (IGPs) are computed. These in-group proportions are compared to null distributions of in-group proportions to produce p-values. The IGP null distributions are generated by repeatedly permuting the centroids within the box aligned with the principal components, classifying the independent data, and calculating the corresponding IGPs.

References

Amy Kapp and Robert Tibshirani. Are clusters in one dataset present in another dataset? To be published.

Examples

Run this code

# NOT RUN {
### Generate centroids with annotated rows
Centroids <- matrix(rnorm(30, sd = 10), 10)
rownames(Centroids) <- letters[1:nrow(Centroids)]

### Generate data with annotated rows
Data <- cbind(matrix(rep(Centroids[,1], 10), 10),
matrix(rep(Centroids[,2], 15), 10), matrix(rep(Centroids[,3], 20), 10))
Data <- Data + matrix(rnorm(length(Data), sd = 10), nrow(Data))
rownames(Data) <- letters[1:nrow(Data)]

### Classify the data and calculate the corresponding in-group
### proportions and group size
Result <- clusterRepro(Centroids, Data, Number.of.permutations = 1)
Result$Actual.IGP
Result$Actual.Size

### Generate null distributions and compare to actual in-group proportions to obtain p-values
Result2 <- clusterRepro(Centroids, Data, Number.of.permutations = 1000)

### If the number of rows in the centroid matrix does not match the
### number of rows in the data matrix and the row labels are unique, this
### function will only use the rows that the two matrices have in common. 
Data <- matrix(rnorm(200), 20)
rownames(Data) <- letters[(nrow(Data)+6):7]
Result <- IGP.clusterRepro(Data, Centroids)
Result2 <- clusterRepro(Centroids, Data, Number.of.permutations = 1000)
# }

Run the code above in your browser using DataLab