s.cluster.h.group: Group Variables with Hierarchical Clustering

Description

This function groups the columns of a numeric matrix based on the hierarchical clustering algorithm.

Usage

s.cluster.h.group(
  data,
  nGroups = 2,
  threshold = 0,
  distance = "correlation",
  linkage = "single",
  correlation = "pearson"
)

Value

A list with the following items:

groups: A list of integer vectors representing the indexes of variables in each group.
removed: An integer vector representing the indexes of removed variables.

Arguments

data: A numeric matrix with variables in the columns.
nGroups: Integer value specifying the number of required groups.
threshold: Numeric value specifying a threshold for omitting variables. If the distance between two variables in a group is less than this value, the second one will be omitted. Note that a change in the order of the columns might change the results.
distance: Character string specifying how distances are calculated. It can be correlation, absCorrelation, euclidean, manhattan, or maximum. See s.distance function.
linkage: Character string specifying how distances are calculated in a left-right node merge. It can be single, complete, uAverage, wAverage, or ward. See s.cluster.h function.
correlation: Character string specifying the type of correlation if distance is correlation. It can be pearson or spearman. See s.distance function.

Details

The results might be different from R's 'cutree' function. (I don't know how 'cutree' works) Here this function iterates over the nodes and whenever a split occurs, it adds a group until the required number of groups is reached.