isopam.2: Isopam (Hierarchical Clustering) for large matrices

Description

Slow variant of Isopam for large matrices. Performs Isopam which consists of dimensionality reduction and partitioning of the resulting feature space. Optimizes clusters and cluster numbers for maximum performance of group indicators. Developed for matrices representing species abundances in plots.

Usage

isopam.2(dat, c.num = FALSE, c.max = 10, filtered = TRUE, 
             distance = 'bray', g.min = 3.5, k.max = 100, 
             stopcrit = c(2,7), maxlev = FALSE,
             juice = FALSE)

Arguments

dat

data matrix: each row corresponds to an object (typically a plot), each column corresponds to a descriptor (typically a species). All variables must be numeric. Missing values (NAs) are not allowed. At least 4 ro

c.num

number of clusters to be computed. If FALSE (the default), cluster numbers are optimized in the range between 2 and c.max. If a number is given, non-hierarchical partitioning is performed (maxlev = 1

c.max

maximum number of clusters per partition.

distance

distance measure for the distance matrix used as a starting point for Isomap. All but the Bray-Curtis and the Jaccard measure are passed to the method argument in package proxy (see details).

filtered

logical. If TRUE, only descriptors (species) exceeding a standardized G-value of g.min are used in the search for the best partition. Their number is multiplied with their mean standardized G-value and

g.min

threshold for descriptors (species) to be considered as indicators during clustering (standardized G-value). Effective with (default) option filtered = TRUE.

k.max

maximum Isomap k.

stopcrit

vector with stopping rules for hierarchical clustering. Two values define if a partition should be retained: the first determines how many indicators must be present, the second defines the standardized G-value that

maxlev

maximum number of hierarchy levels. Defaults to FALSE (no maximum number).

juice

logical. If TRUE input files for Juice are generated.

Value

callgenerating call
distancedistance measure used by Isomap
flatobservations (plots) with group affiliation. Running group numbers for each level of the hierarchy.
hierobservations (plots) with group affiliation. Group identifiers reflect the cluster hierarchy. Not present with only one level of partitioning.
medoidsobservations (plots) representing the medoids of the resulting groups.
analyticstable summarizing parameter settings for the final partitioning steps. These are the name of the parent cluster (0 in case of the first partition), the number of subgroups, Isomap dimensions, Isomap k used, and the number of indicators reaching or exceeding g.min.
dendroan object of class hclust representing the clustering. Not present with only one level of partitioning.
datdata used

encoding

UTF-8

Details

This is isopam for large matrices and should be only used when isopam stops due to lack of memory. The function is extremely slow. Apart from speed and digestibility of large amounts of data there is no difference between both functions. Isopam is described in Schmidtlein et al. (2010). It consists of dimensionality reduction (Isomap: Tenenbaum et al. 2000; isomap in package vegan) and partitioning of the resulting ordination space (PAM: Kaufman & Rousseeuw 1990; pam in package cluster). Compared to other hierarchical clustering methods, it has the following features: (a) it optimizes partitions for the performance of group indicators (typically species); (b) in this process it selects the number of clusters per division; (c) the shapes of groups in feature space are not limited to spherical or other geometric shapes (thanks to the underlying Isomap algorithm) and (d) the distance measure used for the initial distance matrix can be freely defined. In order to examine frequencies of descriptors (species) for an Isopam clustering solution use isotab. The included dimensionality reduction is not suitable for very small data sets. Therefore, tables or groups with few rows (plots) or columns (species) are not partitioned. For division, a group must consist of at least three members and the partition must be supported by at least stopcrit [1] descriptors (species) reaching a standardized G-value of stopcrit [2]. There are plot and identify methods for class isopam linking to the hclust object $dendro resulting from isopam in case of hierarchical partitioning. The methods work just like plot.hclust and identify.hclust. The preset distance measure is Bray-Curtis (Odum 1950). Bray-Curtis ('bray') and Jaccard distances ('jaccard') are passed to vegdist in vegan. All other measures are passed to the method argument in package proxy. This package contains a growing number of relevant measures. Measures registered in proxy can be listed with summary(pr_DB) once proxy is loaded. New measures can be defined and registered as described in ?pr_DB.

References

Odum, E.P. (1950): Bird populations in the Highlands (North Carolina) plateau in relation to plant succession and avian invasion. Ecology 31: 587--605. Kaufman, L., Rousseeuw, P.J. (1990): Finding groups in data. Wiley. Schmidtlein, S., Tichý{Tichy}, L., Feilhauer, H., Faude, U. (2010): A brute force approach to vegetation classification. in review. Tenenbaum, J.B., de Silva, V., Langford, J.C. (2000): A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319--2323.

Examples

Run this code

## load data to the current environment
     data(andechs)
     
     ## call isopam with the standard options
     ip<-isopam.2(andechs)
     
     ## examine cluster hierarchy
     plot(ip)
     
     ## examine frequency table (second 
     ## hierarchy level)
     isotab(ip, 2)

Run the code above in your browser using DataLab