Performs consensus (weighted) clustering. The underlying algorithm (e.g.
hierarchical clustering) is run with different number of clusters nc
.
In consensus weighed clustering, weighted distances are calculated using the
cosa2
algorithm with different penalty parameters
Lambda
. The hyper-parameters are calibrated by maximisation of the
consensus score. This function uses a serial implementation and requires the grids of
hyper-parameters as input (for internal use only).
SerialClustering(
xdata,
nc,
eps,
Lambda,
K = 100,
tau = 0.5,
seed = 1,
n_cat = 3,
implementation = HierarchicalClustering,
scale = TRUE,
linkage = "complete",
row = TRUE,
output_data = FALSE,
verbose = TRUE,
...
)
A list with:
a matrix of the best stability scores for different (sets of) parameters controlling the number of clusters and penalisation of attribute weights.
a matrix of numbers of clusters.
a matrix of regularisation parameters for attribute weights.
a matrix of the average number of selected attributes by the underlying algorithm with different regularisation parameters.
an array of consensus matrices. Rows and columns correspond to items. Indices along the third dimension correspond to different parameters controlling the number of clusters and penalisation of attribute weights.
an array of selection proportions. Columns correspond to attributes. Rows correspond to different parameters controlling the number of clusters and penalisation of attribute weights.
a list with type="clustering"
and values
used for arguments implementation
, linkage
, and
resampling
.
a list with values used for arguments
K
, tau
, pk
, n
(number of observations in
xdata
), and seed
.
The rows of Sc
, nc
,
Lambda
, Q
, selprop
and indices along the third
dimension of coprop
are ordered in the same way and correspond to
parameter values stored in nc
and Lambda
.
data matrix with observations as rows and variables as columns.
matrix of parameters controlling the number of clusters in the
underlying algorithm specified in implementation
. If nc
is
not provided, it is set to seq(1, tau*nrow(xdata))
.
radius in density-based clustering, see
dbscan
. Only used if
implementation=DBSCANClustering
.
vector of penalty parameters for weighted distance calculation.
Only used for distance-based clustering, including for example
implementation=HierarchicalClustering
,
implementation=PAMClustering
, or
implementation=DBSCANClustering
.
number of resampling iterations.
subsample size.
value of the seed to initialise the random number generator and
ensure reproducibility of the results (see set.seed
).
computation options for the stability score. Default is
NULL
to use the score based on a z test. Other possible values are 2
or 3 to use the score based on the negative log-likelihood.
function to use for clustering. Possible functions
include HierarchicalClustering
(hierarchical clustering),
PAMClustering
(Partitioning Around Medoids),
KMeansClustering
(k-means) and GMMClustering
(Gaussian Mixture Models). Alternatively, a user-defined function taking
xdata
and Lambda
as arguments and returning a binary and
symmetric matrix for which diagonal elements are equal to zero can be used.
logical indicating if the data should be scaled to ensure that all variables contribute equally to the clustering of the observations.
character string indicating the type of linkage used in
hierarchical clustering to define the stable clusters. Possible values
include "complete"
, "single"
and "average"
(see
argument "method"
in hclust
for a full list).
Only used if implementation=HierarchicalClustering
.
logical indicating if rows (if row=TRUE
) or columns (if
row=FALSE
) contain the items to cluster.
logical indicating if the input datasets xdata
and
ydata
should be included in the output.
logical indicating if a loading bar and messages should be printed.
additional parameters passed to the functions provided in
implementation
or resampling
.