Combine documents (columns) into k clusters that have texts that are most
similar based on their text distance. Documents with no terms are assigned
to the last cluster.
A textcluster object with three items; cluster, centroids, and size,
where cluster contains a vector indicating for each column in M what
cluster they have been assigned to, centroids contains a matrix with each
column the centroid of a cluster, and size a named vector with the size of
each cluster.
Arguments
M
A term document matrix with terms on the rows and documents on
the columns.
k
A positive integer with the number of clusters needed
mx
Maximum number of times to iterate (default 100)
md
Maximum number of documents to use for the initial setup (default
10*k).