apclusterL: Leveraged Affinity Propagation

Description

Runs leveraged affinity propagation clustering

Usage

## S3 method for class 'matrix,missing':
apclusterL(s, x,
          sel, p=NA, q=NA, maxits=1000, convits=100, lam=0.9,
          includeSim=FALSE, nonoise=FALSE, seed=NA)
## S3 method for class 'character,ANY':
apclusterL(s, x, 
          frac, sweeps, p=NA, q=NA, maxits=1000, convits=100, lam=0.9,
          includeSim=TRUE, nonoise=FALSE, seed=NA, ...)
## S3 method for class 'function,ANY':
apclusterL(s, x,
          frac, sweeps, p=NA, q=NA, maxits=1000, convits=100, lam=0.9,
          includeSim=TRUE, nonoise=FALSE, seed=NA, ...)

Arguments

an $l \times length(sel)$ similarity matrix or a similarity function either specified as the name of a package provided similarity function as character string or a user provided function object for similarity calculation. For a package or

input data to be clustered; if x is a matrix or data frame, rows are interpreted as samples and columns are interpreted as features; apart from matrices or data frames, x may be any other structured data type that c

frac

fraction of samples that should be used for leveraged clustering. The similarity matrix will be generated for all samples against a random fraction of the samples as specified by this parameter.

sweeps

number of sweeps of leveraged clustering performed with changing randomly selected subset of samples.

sel

selected sample indices; a vector containing the sample indices of the sample subset used for leveraged AP clustering in the same order as the samples in the rows of the similarity matrix s.

input preference; can be a vector that specifies individual preferences for each data point. If scalar, the same value is used for all data points. If NA, exemplar preferences are initialized according to the distribution of n

if p=NA, exemplar preferences are initialized according to the distribution of non-Inf values in s. If q=NA, exemplar preferences are set to the median of non-Inf values in s. If q

maxits

maximal number of iterations that should be executed

convits

the algorithm terminates if the examplars have not changed for convits iterations

lam

damping factor; should be a value in the range [0.5, 1); higher values correspond to heavy damping which may be needed if oscillations occur

includeSim

if TRUE, the similarity matrix (either computed internally or passed via the s argument) is stored to the slot sim of the returned APResult object. The default is <

nonoise

apcluster adds a small amount of noise to s to prevent degenerate cases; if TRUE, this is disabled

seed

for reproducibility, the seed of the random number generator can be set to a fixed value before adding noise (see above), if NA, the seed remains unchanged

...

all other arguments are passed to the selected similarity function as they are; note that possible name conflicts between arguments of apcluster and arguments of the similarity function may occur; therefore, we recommend to write

Value

Upon successful completion, both functions returns an APResult object.

Details

Affinity Propagation clusters data using a set of real-valued pairwise similarities as input. Each cluster is represented by a representative cluster center (the so-called exemplar). The method is iterative and searches for clusters maximizing an objective function called net similarity.

Leveraged Affinity Propagation reduces dynamic and static load for large datasets. Only a subset of the samples are considered in the clustering process assuming that they provide already enough information about the cluster structure.

When called with input data and the name of a package provided or a user provided similarity function the function selects a random sample subset according to the frac parameter, calculates a rectangular similarity matrix of all samples against this subset and repeats affinity propagation sweep times. A new sample subset is used for each repetition. The clustering result of the sweep with the highest net similarity is returned. Any parameters specific to the chosen method of similarity calculation can be passed to apcluster in addition to the parameters described above. The similarity matrix for the best trial is also returned in the result object when requested by the user (argument includeSim).

When called with a rectangular similarity matrix (which represents a column subset of the full similarity matrix) the function performs AP clustering on this similarity matrix. The information about the selected samples is passed to clustering with the parameter sel. This function is only needed when the user needs full control of distance calculation or sample subset selection. Apart from minor adaptations and optimizations, the implementation of the function apclusterL is largely analogous to Frey's and Dueck's Matlab code (see http://www.psi.toronto.edu/affinitypropagation/).

References

http://www.bioinf.jku.at/software/apcluster

Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972-976. DOI: http://dx.doi.org/10.1126/science.1136800{10.1126/science.1136800}.

Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: http://dx.doi.org/10.1093/bioinformatics/btr406{10.1093/bioinformatics/btr406}.

Examples

Run this code

## define percentage of samples to be used for clustering
frac=0.2
sweeps <- 3

## create two Gaussian clouds
cl1 <- cbind(rnorm(150,0.2,0.05),rnorm(150,0.8,0.06))
cl2 <- cbind(rnorm(100,0.7,0.08),rnorm(100,0.3,0.05))
x <- rbind(cl1,cl2)

## leveraged apcluster
apres <- apclusterL(negDistMat(r=2), x, frac, sweeps, p=-0.2)

## show details of leveraged clustering results
show(apres)

## plot leveraged clustering result
plot(apres, x)

## plot heatmap of clustering result
heatmap(apres)

## show net similarities of single sweeps
apres@netsimLev

## show samples on which best sweep was based
apres@sel

Run the code above in your browser using DataLab