With the previous calculated similarity matrix or the original categorical dataframe, the results of both overlap clustering and hierarchical clustering are obtained with several recommended cluster numbers(k) after processing the merge cluster step.
BossaClust(data, data.pre = NULL, alpha = 1, p = c(0.9, 0.75, 0.5),
lin = 0.25, is.pca = TRUE, pca.sum.prop = 0.95, n.comp = 50,
fix.pca.comp = FALSE, cri = 1, lintype = "ward.D2", perplexity = 30)an original categorical data with n observations and p variables.
an list obtained by BossaSimi including original
categorical data, similarity matrix, dissimilarity matrix and transformed data,
Bossa scores. It is recommended to calculate the data.pre first and then do
BossaClust in order to save time when trying to change parameters
of this function.
A power scaling for Bossa scores, representing the weight of variable sigma value.
A set of quantiles(90 similarity matrix to form clusters at different levels of within-cluster similarity.
A tuning parameter to control the size of each overlap cluster before merging, smaller lin leads to larger cluster size.
A logical variable indicating if the Bossa scores should transformed to principle components and then calculate the similarity matrix. It is recommended when processing the ultra-dimension data.
A numeric indicating how many components should be reserved
in order to make this proportion of variance. The default is pca.sum.prop = 0.95.
The number of components of PCA. The default is n.comp = 50.
A numeric variable indicating whether choosing the fixed number of components or the fixed proportion of variance and the default is to choose fixed proportion.
A tuning parameter, if p value smaller than cri, then reject
the NULL hypothesis and merge overlap sub-clusters. And cri can be any numeric less
than 1, if cri = 1 then the criteria will be reset to 0.05/N
(N is the number of all overlap sub-clusters), and if cri = 2 then the
criteria 0.05/N(N-1).
The agglomeration method to be used in hclust.
This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2",
"single", "complete", "average" and so on. The default is "ward.D2".
A parameter of tsne
An object including overlap clusters after merging and non-overlap
clusters, which can be showed by function bossa_interactive
# NOT RUN {
{
data(bo.simu.data)
object <- BossaClust(bo.simu.data)
}
# }
Run the code above in your browser using DataLab