freqweights (version 1.0.1)

clarachunk: Clustering Large Chunks

Description

Clustering data splitted in several chunks into k clusters.

Usage

clarasub(x, k, samples = 50)

claramerge(subclusters, k, samples = 50)

Arguments

x
data matrix or data frame, each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric. Missing values (NAs) are allowed.
k
integer, the number of clusters. It is required that 0 < k < n where n is the number of observations of each chunk (i.e., n = nrow(x)).
samples
integer, number of samples to be drawn from the dataset.
subclusters
list of objects returned by clarasub

Value

  • A list with the following values (see clara):
  • nnumber of rows of the data set.
  • samplelabels or case numbers of the observations in the best sample, that is, the sample used by the clara algorithm for the final partition.
  • medoidsthe medoids or representative objects of the clusters. It is a matrix with in each row the coordinates of one medoid.
  • tablefreqa table of frequency. It is an approximation to the number of cases in each group.

Details

See clara for further details.

See Examples.

References

Antonio Piccolboni mclust.mr https://github.com/RevolutionAnalytics/rmr2/blob/master/pkg/examples/mclust.mr.R

See Also

clara, make.readchunk

Examples

Run this code
if(require(cluster)){
  k <- 3

  chunk1 <- iris[1:30,1:4]
  clus1 <- clarasub(chunk1,k)

  chunk2 <- iris[-c(1:30),1:4]
  clus2 <- clarasub(chunk2,k)

  subclusters <- list(clus1, clus2)
  b <- claramerge(subclusters,k)
     print(b$medoids)

   print(nrow(b$tablefreq))
  print(b$tablefreq)
}

Run the code above in your browser using DataLab