dynamicTreeCut (version 1.63-1)

cutreeHybrid: Hybrid Adaptive Tree Cut for Hierarchical Clustering Dendrograms

Description

Detect clusters in a dendorgram produced by the function hclust.

Usage

cutreeHybrid(
      # Input data: basic tree cutiing
      dendro, distM,

# Branch cut criteria and options cutHeight = NULL, minClusterSize = 20, deepSplit = 1,

# Advanced options maxCoreScatter = NULL, minGap = NULL, maxAbsCoreScatter = NULL, minAbsGap = NULL,

minSplitHeight = NULL, minAbsSplitHeight = NULL,

# External (user-supplied) measure of branch split externalBranchSplitFnc = NULL, minExternalSplit = NULL, externalSplitOptions = list(), externalSplitFncNeedsDistance = NULL, assumeSimpleExternalSpecification = TRUE,

# PAM stage options pamStage = TRUE, pamRespectsDendro = TRUE, useMedoids = FALSE, maxPamDist = cutHeight, respectSmallClusters = TRUE,

# Various options verbose = 2, indent = 0)

Arguments

dendro
a hierarchical clustering dendorgram such as one returned by hclust.
distM
Distance matrix that was used as input to hclust.
cutHeight
Maximum joining heights that will be considered. It defaults to 99of the range between the 5th percentile and the maximum of the joining heights on the dendrogram.
minClusterSize
Minimum cluster size.
deepSplit
Either logical or integer in the range 0 to 4. Provides a rough control over sensitivity to cluster splitting. The higher the value, the more and smaller clusters will be produced. A finer control can be achieved via maxBranchCor, min
maxCoreScatter
Maximum scatter of the core for a branch to be a cluster, given as the fraction of cutHeight relative to the 5th percentile of joining heights. See Details.
minGap
Minimum cluster gap given as the fraction of the difference between cutHeight and the 5th percentile of joining heights.
maxAbsCoreScatter
Maximum scatter of the core for a branch to be a cluster given as absolute heights. If given, overrides maxCoreScatter.
minAbsGap
Minimum cluster gap given as absolute height difference. If given, overrides minGap.
minSplitHeight
Minimum split height given as the fraction of the difference between cutHeight and the 5th percentile of joining heights. Branches merging below this height will automatically be merged. Defaults to zero but is used only if minAbsSplitH
minAbsSplitHeight
Minimum split height given as an absolute height. Branches merging below this height will automatically be merged. If not given (default), will be determined from minSplitHeight above.
externalBranchSplitFnc
Optional function to evaluate split (dissimilarity) between two branches. Either a single function or a list in which each component is a function (see assumeSimpleExternalSpecification below for how to specify a single function). Each functi
minExternalSplit
Thresholds to decide whether two branches should be merged. It should be a numeric vector of the same length as the number of functions in externalBranchSplitFnc above. Only used for method "hybrid".
externalSplitOptions
Further arguments to function externalBranchSplitFnc. If only one external function is specified in externalBranchSplitFnc above, externalSplitOptions can be a named list of arguments or a list with one component t
externalSplitFncNeedsDistance
Optional specification of whether the external branch split functions need the distance matrix as one of their arguments. Either NULL or a logical vector with one element per branch split function that specifies whether the corresponding bra
assumeSimpleExternalSpecification
Logical: when minExternalSplit above is a scalar (has length 1), should the function assume a simple specification of externalBranchSplitFnc and externalSplitOptions? If TRUE, externalBranchSplitFn
pamStage
Logical, only used for method "hybrid". If TRUE, the second (PAM-like) stage will be performed.
pamRespectsDendro
Logical, only used for method "hybrid". If TRUE, the PAM stage will respect the dendrogram in the sense an object can be PAM-assigned only to clusters that lie below it on the branch that the object is merged into. See
useMedoids
if TRUE, the second stage will be use object to medoid distance; if FALSE, it will use average object to cluster distance. The default (FALSE) is recommended.
maxPamDist
Maximum object distance to closest cluster that will result in the object assigned to that cluster. Defaults to cutHeight.
respectSmallClusters
If TRUE, branches that failed to be clusters in stage 1 only because of insufficient size will be assigned together in stage 2. If FALSE, all objects will be assigned individually.
verbose
Controls the verbosity of the output. 0 will make the function completely quiet, values up to 4 gradually increase verbosity.
indent
Controls indentation of printed messages (see verbose above). Each unit adds two spaces before printed messages; useful when several functions' output is to be nested.

Value

  • A list containg the following elements:
  • labelsNumerical labels of clusters, with 0 meaning unassigned, label 1 labeling the largest cluster etc.
  • coresNumerical labels indicating cores of found clusters.
  • smallLabelsNumerical labels for branches that failed to be recognized clusters only because of insufficient number of objects.
  • mergeDiagnosticsA data.frame with one row per merge in the input dendrogram. The columns give the values of the various merging criteria used by the algorithm. Missing data indicate that at least one of the "branches" merged was actually a singleton (single node) and hence the branch merging was automatic.
  • mergeCriteriaValues of the merging thresholds. Either a copy of the corresponding input thresholds or values determined by deepSplit.
  • branchesA list detailing the deteced branch structure.

Details

The function detects clusters in a hierarchical dendrogram based on the shape of branches on the dendrogram. For details on the method, see http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/BranchCutting.

In order to make the shape parameters maxCoreScatter and minGap more universal, their values are interpreted relative to cutHeight and the 5th percetile of the merging heights (we arbitrarily chose the 5th percetile rather than the minimum for reasons of stability). Thus, the absolute maximum allowable core scatter is calculated as maxCoreScatter * (cutHeight - refHeight) + refHeight and the absolute minimum allowable gap as minGap * (cutHeight - refHeight), where refHeight is the 5th percentile of the merging heights.

References

Langfelder P, Zhang B, Horvath S, 2007. http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/BranchCutting

See Also

hclust, as.dist