cca: K-Median Cluster Component Analysis

Description

K-Median Cluster Component Analysis, a distribution-free soft-clustering method for preference rankings.

Usage

cca(X, k, control = ccacontrol(...), ...)

Arguments

A n by m data matrix containing preference rankings, in which there are n judges and m objects to be judged. Each row is a ranking of the objects which are represented by the columns.

The number of cluster components

control

a list of options that control details of the cca algorithm governed by the function ccacontrol. The options govern maximum number of iterations of cca (itercca=1 is the default), the algorithm chosen to compute the median ranking (default, "quick"), and other options related to the consrank algorithm, which is called by cca

…

arguments passed bypassing ccacontrol

Value

An object of the class "cca". It contains:

pk		the membership probability matrix
clc		cluster centers
oclc		cluster centers in terms of orderings
idc		crisp partition: id of the cluster component associated with the highest membership probability
Hcca		Global homogeneity measure (tau_X rank correlation coefficient)
hk		Homogeneity within cluster
props		estimated proportion of cases within cluster
Us		Uncertainty measure per-individual (see details)
Ucca		Global uncertainty measure
Uprods		Uncertainty measure per-individual (see details)
Uprodscca		Global uncertainty measure

Details

The user can use any algorithm implemented in the consrank function from the ConsRank package. All algorithms allow the user to set the option 'full=TRUE' if the median ranking(s) must be searched in the restricted space of permutations instead of in the unconstrained universe of rankings of n items including all possible ties. There are two classification uncertainty measures: Us and Uprods. "Us" is the geometric mean of the membership probabilities of each individual, normalized in such a way that in the case of maximum uncertainty Us=1. "Ucca" is the average of all the "Us". "Uprods" is the product of the membership probabilities of each individual, normalized in such a way that in the case of maximum uncertainty Uprods=1. "Uprodscca" is the average of all the "Uprods".

References

D'Ambrosio, A. and Heiser, W.J. (2019). A Distribution-free Soft Clustering Method for Preference Rankings. Behaviormetrika , vol. 46(2), pp. 333<U+2013>351, DOI: 10.1007/s41237-018-0069-5

Heiser W.J., and D'Ambrosio A. (2013). Clustering and Prediction of Rankings within a Kemeny Distance Framework. In Berthold, L., Van den Poel, D, Ultsch, A. (eds). Algorithms from and for Nature and Life.pp-19-31. Springer international. DOI: 10.1007/978-3-319-00035-0_2.

Ben-Israel, A., and Iyigun, C. (2008). Probabilistic d-clustering. Journal of Classification, 25(1), pp.5-26. DOI: 10.1007/s00357-008-9002-z

Examples

Run this code

# NOT RUN {
data(Irish)
set.seed(135) #for reproducibility
# CCA with four components
ccares <- cca(Irish$rankings, 4, itercca=10)
summary(ccares)
# }
# NOT RUN {

# }

Run the code above in your browser using DataLab