Leveraged Affinity Propagation
Runs leveraged affinity propagation clustering
# S4 method for matrix,missing apclusterL(s, x, sel, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=FALSE, nonoise=FALSE, seed=NA) # S4 method for character,ANY apclusterL(s, x, frac, sweeps, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, nonoise=FALSE, seed=NA, ...) # S4 method for function,ANY apclusterL(s, x, frac, sweeps, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, nonoise=FALSE, seed=NA, ...)
an \(l \times length(sel)\) similarity matrix or a similarity function either specified as the name of a package provided similarity function as character string or a user provided function object for similarity calculation. If
sis supplied as a similarity matrix, the columns must correspond to the same sub-selection of samples as specified in the
selargument and must be in the same increasing order. For a package- or user-defined similarity function, additional parameters can be specified as appropriate for the chosen method and are passed on to the similarity function via the
...argument (see below). See the package vignette for a non-trivial example or supplying a user-defined similarity measure.
input data to be clustered; if
xis a matrix or data frame, rows are interpreted as samples and columns are interpreted as features; apart from matrices or data frames,
xmay be any other structured data type that contains multiple data items - provided that an appropriate
lengthfunction is available that returns the number of items
fraction of samples that should be used for leveraged clustering. The similarity matrix will be generated for all samples against a random fraction of the samples as specified by this parameter.
number of sweeps of leveraged clustering performed with changing randomly selected subset of samples.
selected sample indices; a vector containing the sample indices of the sample subset used for leveraged AP clustering in increasing order.
input preference; can be a vector that specifies individual preferences for each data point. If scalar, the same value is used for all data points. If
NA, exemplar preferences are initialized according to the distribution of non-Inf values in
s. How this is done is controlled by the parameter
q. See also
p=NA, exemplar preferences are initialized according to the distribution of non-Inf values in
q=NA, exemplar preferences are set to the median of non-Inf values in
qis a value between 0 and 1, the sample quantile with threshold
qis used, whereas
q=0.5again results in the median. See also
maximal number of iterations that should be executed
the algorithm terminates if the examplars have not changed for
damping factor; should be a value in the range [0.5, 1); higher values correspond to heavy damping which may be needed if oscillations occur
TRUE, the similarity matrix (either computed internally or passed via the
sargument) is stored to the slot
simof the returned
object. The default is
apclusterLhas been called for a similarity matrix, otherwise the default is
apclusteradds a small amount of noise to
sto prevent degenerate cases; if
TRUE, this is disabled
for reproducibility, the seed of the random number generator can be set to a fixed value before adding noise (see above), if
NA, the seed remains unchanged
all other arguments are passed to the selected similarity function as they are; note that possible name conflicts between arguments of
apclusterand arguments of the similarity function may occur; therefore, we recommend to write user-defined similarity functions without additional parameters or to use closures to fix parameters (such as, in the example below);
Affinity Propagation clusters data using a set of real-valued pairwise similarities as input. Each cluster is represented by a representative cluster center (the so-called exemplar). The method is iterative and searches for clusters maximizing an objective function called net similarity.
Leveraged Affinity Propagation reduces dynamic and static load for large datasets. Only a subset of the samples are considered in the clustering process assuming that they provide already enough information about the cluster structure.
When called with input data and the name of a package provided or a user
provided similarity function the function selects a random sample subset
according to the
frac parameter, calculates a rectangular
similarity matrix of all samples against this subset and repeats
sweep times. A new sample subset is used
for each repetition. The clustering result of the sweep with the highest
net similarity is returned. Any parameters specific to the chosen
method of similarity calculation can be passed to
in addition to the parameters described above. The similarity matrix
for the best trial is also returned in the result object when requested
by the user (argument
When called with a rectangular similarity matrix (which represents a
column subset of the full similarity matrix) the function performs
AP clustering on this similarity matrix. The information
about the selected samples is passed to clustering with the
sel. This function is only needed when the user needs full
control of distance calculation or sample subset selection.
Apart from minor adaptations and optimizations, the implementation
of the function
is largely analogous to Frey's and Dueck's Matlab code
Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972-976. DOI: 10.1126/science.1136800.
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: 10.1093/bioinformatics/btr406.
## create two Gaussian clouds cl1 <- cbind(rnorm(150,0.2,0.05),rnorm(150,0.8,0.06)) cl2 <- cbind(rnorm(100,0.7,0.08),rnorm(100,0.3,0.05)) x <- rbind(cl1,cl2) ## leveraged apcluster apres <- apclusterL(negDistMat(r=2), x, frac=0.2, sweeps=3, p=-0.2) ## show details of leveraged clustering results show(apres) ## plot leveraged clustering result plot(apres, x) ## plot heatmap of clustering result heatmap(apres) ## show net similarities of single sweeps apres@netsimLev ## show samples on which best sweep was based apres@sel