klaR (version 0.6-11)

corclust: Function to identify groups of highly correlated variables for removing correlated features from the data for further analysis.

Description

A hierarchical clustering of variables using hclust is performed using 1 - the absolute correlation as a distance measure between tow variables.

Usage

corclust(x, cl = NULL, mincor = NULL, prnt = FALSE, method = "complete")

Arguments

x
Either a data frame or a matrix consisting of numerical attributes.
cl
Optional vector of ty factor indicating class levels, if class specific correlations should to be considered.
mincor
Optional vector of degrees of correlation within a cluster of variables that will be indicated in the plot by a line.
prnt
Logical indicating whether the matrix of distances should be printed.
method
Linkage to be used for clustering. Default is complete linkage.

Value

  • min.abs.corMatrix of distances used for clustering containing 1 - the absolute correlation between any two variables.
  • clusteringResult object of the hierarchical clustering.

Details

The main output consists in the tree visualization of the clustered variables. Each cluster consists of a set of correlated variables according to the chosen clustering criterion. The default criterion is complete. This choice is meaningful as it represents the minimum absolute correlation between all variables of a cluster.\ Further proceeding would consist in chosing one variable of each cluster to obtain a subset of rather uncorrelated variables for further analysis.\ If an additional class vector cl is given to the function for any two variables their minimum correlation over all classes is used.

See Also

See also hclust, for details on the clustering algorithm.

Examples

Run this code
data(iris)
    classes <- iris$Species
    variables <- iris[,1:4]
    corclust(variables, classes, mincor = 0.6)

Run the code above in your browser using DataLab