rf.unsupervised: Unsupervised Random Forests

Description

Performs an unsupervised Random Forests for returning clustering, based on dissimilarity, and optional neighbor distance.

Usage

rf.unsupervised(x, n = 2, proximity = FALSE, silhouettes = FALSE,
  clara = FALSE, ...)

Arguments

A matrix/data/frame object to cluster

Number of clusters

proximity

(FALSE/TRUE) Return matrix of neighbor distances based on proximity

silhouettes

(FALSE/TRUE) Return adjusted silhouette values

clara

(FALSE/TRUE) Use clara partitioning, for large data

...

Additional Random Forests arguments

Value

A vector of clusters or list class object of class "unsupervised", containing the following components:

distances Scaled proximity matrix representing dissimilarity neighbor distances

k Vector of cluster labels using adjusted silhouettes

silhouette.values Adjusted silhouette cluster labels and silhouette values

References

Rand, W.M. (1971) Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical Association, 66:846-850.

Shi, T., Seligson, D., Belldegrun, A.S., Palotie, A., and Horvath, Ss (2005) Tumor Classification by Tissue Microarray Profiling: Random Forest Clustering Applied to Renal Cell Carcinoma. Modern Pathology, 18:547-557.

Examples

Run this code

# NOT RUN {
 library(randomForest) 
 data(iris)
 n = 4
 clust.iris <- rf.unsupervised(iris[,1:4], n=n, proximity = TRUE, 
                               silhouettes = TRUE)
 clust.iris$k

 mds <- stats:::cmdscale(clust.iris$distances, eig=TRUE, k=n)
   colnames(mds$points) <- paste("Dim", 1:n)
   mds.col <- ifelse(clust.iris$k == 1, rainbow(4)[1],
                ifelse(clust.iris$k == 2, rainbow(4)[2],
 			     ifelse(clust.iris$k == 3, rainbow(4)[3],
 				   ifelse(clust.iris$k == 4, rainbow(4)[4], NA))))
 plot(mds$points[,1:2],col=mds.col, pch=20) 				   
 pairs(mds$points, col=mds.col, pch=20)
  
# }

Run the code above in your browser using DataLab

State of Data and AI Literacy Report 2025