Learn R Programming

HCsnip (version 1.12.0)

TwoHC_assign: Function to assign new samples to one of the two given hierarchical clustering trees in a semi-supervised way

Description

For given molecular data sets from two non-overlapping groups of patients, this functions constructs two independent HC trees and assigns new samples to one of them in semi-supervised way. See details.

Usage

TwoHC_assign(X, index1, index2, new.X, dis.method = "cor", link.method = "ward", minclus = 4, maxmiss = 30, surv.time, status, method1 = "BIC", method2 = "g2")

Arguments

X
An object of class ExpressionSet or data matrix from which two HC tress to be derived. Columns are assumed to represent the samples, and rows represent the sample's features. Missing values are allowed.
index1
Column indices of patients in X correspond to the first group.
index2
Column indices of patients in X correspond to the second group.
new.X
An object of class ExpressionSet or data matrix corresponds to new samples. Columns are assumed to represent the samples, and rows represents the sample's features. Missing values are allowed.
dis.method
The distance measure to be used. This must be one of method acceptable for dist function or the Pearson correlation (default).
link.method
The agglomeration method to be used. This should be one of "ward" (default), "single", "complete", "average", "mcquitty", "median" or "centroid".
minclus
The minimum number of samples allowed to form a cluster. This parameter inversely proportional to the number of partition returned from a HC tree. e.g. a large value returns small number of partitions, and vice versa.
maxmiss
Maximum percentage of missing values per row in X.
surv.time
A numeric vector contains follow-up information of patient's in X
status
A binary vector contains survival status of patients in X, normally 0=alive, 1=dead.
method1
Type of partition evaluation measures to use for assessing the relationship between follow-up and a partition. Default is "BIC".
method2
Type of Partition evaluation measure to use for assessing the relationship between data matrix X and a partition. Default is Goodman and Kruskal index "g2".

Value

hc1
HC tree derived from the data corresponds to the first group.
hc2
HC tree derived from the data corresponds to the second group.
partitions.hc1
A matrix includes partitions extracted from hc1. Rows represent partitions and columns represent samples.
partitions.hc2
A matrix includes partitions extracted from hc2. Rows represent partitions and columns represent samples.
best.hc1
Optimal partition found on the hc1
best.hc2
Optimal partition found on the hc2
score.hc1
A matrix with two columns. The first column contains the quality scores of partitions.hc1 calculated using the follow-up data. The second column contains the quality scores of partition.hc1 calculated by using X.
score.hc2
The same as score.hc1, but for partitions.hc2.
Assign
A matrix with three columns. The first column contains the indices of HC trees to which a test sample was assigned. The second column contains the indices of clusters in best.hc1 to which a test sample was most similar. The third column contains the indices of clusters in best.hc2 to which a test sample was most similar.
surv.time
The same as input
status
The same as input
index1
The same as input
index2
The same as input
new.X
The same as input
X
The same as input
method1
The same as input
method2
The same as input
minclus
The same as input
id1
indices of the partitions obtained from the hc1 in which minimum cluster size is equal or larger than minclus.
id2
indices of the partitions obtained from the hc2 in which minimum cluster size is equal or larger than minclus.

Details

Say molecular profiles of two groups patients (without overlap) treated with two different drugs or the same drugs in different combinations are available. Besides that, their follow-up information are also given. When a new patient comes in (for which only molecular profiles are available), question will be to which group this patient should be assigned so that he/she will benefit most by the type of treatment this group received.

This function is designed for this problem. it works as follows: first, two independent HC trees will be derived from given data; second, partitions are extracted and the optimal partition is selected from each HC tree, separately; third, new patient's molecular profile is compared with each cluster in each optimal partition to calculate average similarity and identify two most similar clusters (competing clusters) fromt the two HC trees; finally, new sample is assigned to one of the two competing clusters which has better overall survival.

References

Harrel,E.F. et al., (1982). "Evaluating the yield of medical tests", JAMA, 247, 2543-2546.

Obulkasim,A. et al., (2011). "Stepwise classification of cancer samples using clinical and molecular data", BMC Bioinformatics, 12, 422.

Troyanskaya,O. et al., (2001). "Missing value estimation methods for DNA microarrays". Bioinformatics, 17, 520-525.

Obulkasim,A. et al., (2013). "Semi-supervised adaptive-height snipping of the Hierarchical Clustering tree", submitted.

See Also

See also TwoHC_perm, cluster_pred

Examples

Run this code
data(TcgaGBM)
attach(TcgaGBM)
id1 <- which(drugs == "Avastin")
id2 <- which(drugs == "Temodar") 
result <- TwoHC_assign(X = em[ ,c(id1[1:30], id2[1:30])], index1 = 1:30, index2 = 31:60, 
                      new.X = em[, c(id1[31:60], id2[31:60])], minclus = 4,
                     surv.time = surv.time[c(id1[1:30], id2[1:30])], 
                     status = status[c(id1[1:30], id2[1:30])])  

Run the code above in your browser using DataLab