
Performs an unsupervised Random Forests for returning clustering, based on dissimilarity, and optional neighbor distance.
rf.unsupervised(x, n = 2, proximity = FALSE, silhouettes = FALSE,
clara = FALSE, ...)
A matrix/data/frame object to cluster
Number of clusters
(FALSE/TRUE) Return matrix of neighbor distances based on proximity
(FALSE/TRUE) Return adjusted silhouette values
(FALSE/TRUE) Use clara partitioning, for large data
Additional Random Forests arguments
A vector of clusters or list class object of class "unsupervised", containing the following components:
distances Scaled proximity matrix representing dissimilarity neighbor distances
k Vector of cluster labels using adjusted silhouettes
silhouette.values Adjusted silhouette cluster labels and silhouette values
Rand, W.M. (1971) Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical Association, 66:846-850.
Shi, T., Seligson, D., Belldegrun, A.S., Palotie, A., and Horvath, Ss (2005) Tumor Classification by Tissue Microarray Profiling: Random Forest Clustering Applied to Renal Cell Carcinoma. Modern Pathology, 18:547-557.
randomForest
for ... options
pam
for details on Partitioning Around Medoids (PAM)
clara
for details on Clustering Large Applications (clara)
# NOT RUN {
library(randomForest)
data(iris)
n = 4
clust.iris <- rf.unsupervised(iris[,1:4], n=n, proximity = TRUE,
silhouettes = TRUE)
clust.iris$k
mds <- stats:::cmdscale(clust.iris$distances, eig=TRUE, k=n)
colnames(mds$points) <- paste("Dim", 1:n)
mds.col <- ifelse(clust.iris$k == 1, rainbow(4)[1],
ifelse(clust.iris$k == 2, rainbow(4)[2],
ifelse(clust.iris$k == 3, rainbow(4)[3],
ifelse(clust.iris$k == 4, rainbow(4)[4], NA))))
plot(mds$points[,1:2],col=mds.col, pch=20)
pairs(mds$points, col=mds.col, pch=20)
# }
Run the code above in your browser using DataLab