clarachunk: Clustering Large Chunks

Description

Clustering data splitted in several chunks into k clusters.

Usage

clarasub(x, k, samples = 50)
claramerge(subclusters, k, samples = 50)

Arguments

data matrix or data frame, each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric. Missing values (NAs) are allowed.

integer, the number of clusters. It is required that 0 < k < n where n is the number of observations of each chunk (i.e., n = nrow(x)).

samples

integer, number of samples to be drawn from the dataset.

subclusters

list of objects returned by clarasub

Value

A list with the following values (see clara):

number of rows of the data set.

sample

labels or case numbers of the observations in the best sample, that is, the sample used by the clara algorithm for the final partition.

medoids

the medoids or representative objects of the clusters. It is a matrix with in each row the coordinates of one medoid.

tablefreq

a table of frequency. It is an approximation to the number of cases in each group.

Details

See clara for further details. See Examples.

References

Antonio Piccolboni mclust.mr https://github.com/RevolutionAnalytics/rmr2/blob/master/pkg/examples/mclust.mr.R

Examples

Run this code

if(require(cluster)){
  k <- 3

  chunk1 <- iris[1:30,1:4]
  clus1 <- clarasub(chunk1,k)

  chunk2 <- iris[-c(1:30),1:4]
  clus2 <- clarasub(chunk2,k)

  subclusters <- list(clus1, clus2)
  b <- claramerge(subclusters,k)
     print(b$medoids)

   print(nrow(b$tablefreq))
  print(b$tablefreq)
}

Run the code above in your browser using DataLab