K-Median Cluster Component Analysis, a distribution-free soft-clustering method for preference rankings.
cca(X, k, control = ccacontrol(...), ...)
A n by m data matrix containing preference rankings, in which there are n judges and m objects to be judged. Each row is a ranking of the objects which are represented by the columns.
The number of cluster components
a list of options that control details of the cca
algorithm governed by the function
ccacontrol
. The options govern maximum number of iterations of cca
(itercca=1 is the default), the algorithm chosen to
compute the median ranking (default, "quick"), and other options related
to the consrank algorithm, which is called by cca
arguments passed bypassing ccacontrol
An object of the class "cca". It contains:
pk | the membership probability matrix | |
clc | cluster centers | |
oclc | cluster centers in terms of orderings | |
idc | crisp partition: id of the cluster component associated with the highest membership probability | |
Hcca | Global homogeneity measure (tau_X rank correlation coefficient) | |
hk | Homogeneity within cluster | |
props | estimated proportion of cases within cluster | |
Us | Uncertainty measure per-individual (see details) | |
Ucca | Global uncertainty measure | |
Uprods | Uncertainty measure per-individual (see details) | |
Uprodscca | Global uncertainty measure |
The user can use any algorithm implemented in the consrank
function from the ConsRank package. All algorithms allow the user to set the option 'full=TRUE'
if the median ranking(s) must be searched in the restricted space of permutations instead of in the unconstrained universe of rankings of n items including all possible ties.
There are two classification uncertainty measures: Us and Uprods. "Us" is the geometric
mean of the membership probabilities of each individual, normalized in such a way that
in the case of maximum uncertainty Us=1. "Ucca" is the average of all the "Us".
"Uprods" is the product of the membership probabilities of each individual, normalized in such a way that
in the case of maximum uncertainty Uprods=1. "Uprodscca" is the average of all the "Uprods".
D'Ambrosio, A. and Heiser, W.J. (2019). A Distribution-free Soft Clustering Method for Preference Rankings. Behaviormetrika , vol. 46(2), pp. 333<U+2013>351, DOI: 10.1007/s41237-018-0069-5
Heiser W.J., and D'Ambrosio A. (2013). Clustering and Prediction of Rankings within a Kemeny Distance Framework. In Berthold, L., Van den Poel, D, Ultsch, A. (eds). Algorithms from and for Nature and Life.pp-19-31. Springer international. DOI: 10.1007/978-3-319-00035-0_2.
Ben-Israel, A., and Iyigun, C. (2008). Probabilistic d-clustering. Journal of Classification, 25(1), pp.5-26. DOI: 10.1007/s00357-008-9002-z
ccacontrol
ranktree
# NOT RUN {
data(Irish)
set.seed(135) #for reproducibility
# CCA with four components
ccares <- cca(Irish$rankings, 4, itercca=10)
summary(ccares)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab