progenyClust: Progeny Clustering

Description

Select the optimal number for clustering using Progeny Clustering.

Usage

progenyClust(data, FUNclust = kmeans, method = "gap", score.invert = F, ncluster = 2:10, 
size = 10, iteration = 100, repeats = 1, nrandom = 10, ...)
## S3 method for class 'progenyClust':
summary(object,...)

Arguments

data

data matrix or data frame for clustering: each row correpsonds to a sample or observation, whereas each column corresponds to a feature or variable.

FUNclust

clustering function: accepts data as its first argument and the number for clustering as the second argument; returns a list containing a component called 'cluster' which is a vector of integers recording the clustering assignment for all samples. The def

method

character string indicating the criterion used to pick the optimal cluster number. 'gap': the default value, selecting the cluster number that has the biggest or smallest (when score.invert=TRUE) gap from its neighboring numbrs. The optimal cluster number

score.invert

logical flag: specifies whether the score should be inverted. The default score is the ratio of true classification probabilities over false classification probilities. The inverted score is the ratio of false classification over true classification probi

ncluster

sequence of integers specifying candidate cluster numbers for evaluation: ncluster needs to be continuous if the method 'gap' is chosen.

size

integer specifying the number of progenies generated from each cluster. Default value is 10.

iteration

integer specifying the number of times the algorithm samples progenies and evalutes similarity among progenies. Default value is 100.

repeats

integer specifying the number of times the algorithm should be run: needs to be greater than 0. Values greater than 1 output standard deviations of the scores, which are plotted as error bars in print(...,errorbar=T,...) function. Default value is 1.

nrandom

integer specifying the number of random datasets used to generate reference scores when using method 'score'. Default value is 10.

object

the S3 object of class "progenyClust".

...

additional arguments for FUNclust in progenyClust(...).

Value

progenyClust returns an object of class "progenyClust" which has a plot and summary method. It is a list with the following components:
clustermatrix of clustering memberships for all samples under given cluster numbers: each row corresponds to a sample; each column corresponds to a given cluster number.
scorematrix of stability scores from clustering the input data under given cluster numbers: each column corresponds to a given cluster number; each row corresponds to a repeat, the number of which is defined by 'repeats' in the input argument.
random.scorematrix of stability scores from clustering random datasets under given cluster numbers: each column corresponds to a given cluster number; each row corresponds to a random dataset, the number of which is defined by 'nrandom' in the input argument.
random.scorematrix of stability scores from clustering random datasets under given cluster numbers: each column corresponds to a given cluster number; each row corresponds to a random dataset, the number of which is defined by 'nrandom' in the input argument.
mean.gapvector of mean stability scores based on the 'gap' criterion when the input argument 'method' is set to be 'gap' or 'both'.
mean.scorevector of mean stability scores based on the 'score' criterion when the input argument 'method' is set to be 'score' or 'both'.
sd.gapvector of standard deviations of stability scores for each given cluster number based on the 'gap' criterion, when the input argument 'method' is set to be 'gap' or 'both'.
sd.scorevector of standard deviations of stability scores for each given cluster number based on the 'score' criterion, when the input argument 'method' is set to be 'score' or 'both'.
callthe call with arguments specified.
nclusterthe specified value of input argument 'ncluster'.
methodthe specified value of input argument 'method'.
score.invertthe specified value of input argument 'score.invert'.

References

Hu, C.W., et al. "Progeny Clustering: A Method to Identify Biological Phenotypes." Scientific reports 5 (2015). http://www.nature.com/articles/srep12894

Examples

Run this code

# a 3-cluster 2-dimensional example dataset
data('test')

# default progeny clsutering
progenyClust(test,ncluster=2:5)->pc

summary(pc)
plot(pc)

Run the code above in your browser using DataLab