cluster_pred(X, partition, surv.time = NULL, status = NULL, te.index, minclus = 4,
te.surv.time = NULL, te.status = NULL, method = "conc", maxmiss = 30,
plot.it = FALSE, ...)ExpressionSet or data matrix in which columns are assumed to represent the samples, and rows represents the sample's features. X can also be a square distance matrix or object cass of dist. It must include the data set from which partition is obtained and the data set of test samples.impute.knn from the Second option is to use the Harrel's concordance index (Harrel et al., 1982). For this option both main and follow-up data corresponds to the given partition are required. Main data is used to find the pseudo nearest neighbours (PNN) of a test sample (Obulkasim et a., 2011), and follow-up data is used to check how much PNN's follow-up info is concordant with follow-up info of samples in each cluster. The test sample is assigned to a cluster for which average concordance is the highest.
Before selecting either one of the options, we recommend user to check the correlation between main data and follow-up info (e.g. using global test). If correlation is relatively large, we recommend to use 'conc' option, and vice versa.
Harrel,E.F. et al., (1982). "Evaluating the yield of medical tests", JAMA, 247, 2543-2546.
Obulkasim,A. et al., (2011). "Stepwise classification of cancer samples using clinical and molecular data", BMC Bioinformatics, 12, 422.
Troyanskaya,O. et al., (2001). "Missing value estimation methods for DNA microarrays". Bioinformatics, 17, 520-525.
TwoHC_assigndata(BullingerLeukemia)
attach(BullingerLeukemia)
cl <- HCsnipper(em[, 1:30], min = 5)
cl <- cl$partitions[cl$id, ]
result <- cluster_pred(X = em[, 1:50], partition = cl[1, ], surv.time = surv.time[1:30],
status = status[1:30], te.index = 31:50)
names(result)Run the code above in your browser using DataLab