freqweights (version 1.0.3)

clarachunk: Clustering Large Chunks

Description

Clustering data splitted in several chunks into k clusters.

Usage

clarasub(x, k, samples = 50)

claramerge(subclusters, k, samples = 50)

Arguments

x
data matrix or data frame, each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric. Missing values (NAs) are allowed.
k
integer, the number of clusters. It is required that 0 < k < n where n is the number of observations of each chunk (i.e., n = nrow(x)).
samples
integer, number of samples to be drawn from the dataset.
subclusters
list of objects returned by clarasub

Value

A list with the following values (see clara):
n
number of rows of the data set.
sample
labels or case numbers of the observations in the best sample, that is, the sample used by the clara algorithm for the final partition.
medoids
the medoids or representative objects of the clusters. It is a matrix with in each row the coordinates of one medoid.
tablefreq
a table of frequency. It is an approximation to the number of cases in each group.

Details

See clara for further details. See Examples.

References

Antonio Piccolboni mclust.mr https://github.com/RevolutionAnalytics/rmr2/blob/master/pkg/examples/mclust.mr.R

See Also

clara, make.readchunk

Examples

Run this code
if(require(cluster)){
  k <- 3

  chunk1 <- iris[1:30,1:4]
  clus1 <- clarasub(chunk1,k)

  chunk2 <- iris[-c(1:30),1:4]
  clus2 <- clarasub(chunk2,k)

  subclusters <- list(clus1, clus2)
  b <- claramerge(subclusters,k)
     print(b$medoids)

   print(nrow(b$tablefreq))
  print(b$tablefreq)
}

Run the code above in your browser using DataLab