Learn R Programming

clustering.sc.dp (version 1.0)

backtracking.sc.dp: Backtracking Clustering for a Specific Cluster Number

Description

Creates clustering for k number of clusters by using the backtrack data produced by findwithinss.sc.dp.

Usage

backtracking.sc.dp(x, k, backtrack)

Arguments

x
a multi-dimensional array containing input data to be clustered
k
the number of clusters
backtrack
the backtrack data

Value

An object of class clustering.sc.dp which has a print method and is a list with components:
cluster
A vector of integers (1:k) indicating the cluster to which each point is allocated.
centers
A matrix whose rows represent cluster centres.
withinss
The within-cluster sum of squares for each cluster.
size
The number of points in each cluster.

Details

If the number of clusters is unknown findwithinss.sc.dp followed by backtracking.sc.dp can be used for performing clustering. If only subsequent elements of the input data may form a cluster method findwithinss.sc.dp calculates the exact minimum of the sum of squares of within-cluster distances (withinss) from each element to its corresponding cluster centre (mean) for different cluster numbers. The user may analyse the withinss in order to select the proper number of clusters. In this case, it is enough to run method backtracking.sc.dp only once. Another option is to run findwithinss.sc.dp once, repeat the backtracking.sc.dp step for a range of potential cluster numbers and then the user may evaluate the optimal solutions created for different number of clusters. This requires much less time than repeating the whole clustering algorithm for the different cluster numbers.

See Also

findwithinss.sc.dp, clustering.sc.dp

Examples

Run this code
# Example: clustering data generated from a random walk with small withinss
x<-matrix(, nrow = 100, ncol = 2)
x[1,]<-c(0,0)
for(i in 2:100) {
  x[i,1]<-x[i-1,1] + rnorm(1,0,0.1)
  x[i,2]<-x[i-1,2] + rnorm(1,0,0.1)
}
k<-10
r<-findwithinss.sc.dp(x,k)

# select the first cluster number where withinss drops below a threshold
thres <- 5.0
k_th <- 1;
while(r$twithinss[k_th] > thres & k_th < k) {
    k_th <- k_th + 1
}

# backtrack
result<-backtracking.sc.dp(x,k_th, r$backtrack)
plot(x, type = 'b', col = result$cluster)
points(result$centers, pch = 24, bg = (1:k_th))

Run the code above in your browser using DataLab