hipamAnthropom: HIPAM algorithm for anthropometric data

Description

The HIerarchical Partitioning Around Medoids clustering method (HIPAM) was originally created to gene clustering (Wit et al. (2004)). The HIPAM algorithm is a divisive hierarchical clustering method based on the PAM algorithm.

This function is a HIPAM algorithm adapted to deal with anthropometric data. To that end, a different dissimilarity function is incorporated. This function is that explained in McCulloch et al. (1998) and it is implemented in GetDistMatrix. We call it $d_{MO}$. In addition, a different method to obtain a classification tree is also incorporated.

Two HIPAM algorithms are proposed. The first one, called $HIPAM_{MO}$, is a HIPAM that uses $d_{MO}$. The second one, $HIPAM_{IMO}$, is a HIPAM algorithm that uses $d_{MO}$ and the INCA (Index Number Clusters Atypical) statistic criterion (Irigoien et al. (2008)) to decide the number of child clusters and as a stopping rule.

See Vinue et al. (2013) for more details.

Usage

hipamAnthropom(x,asw.tol=0,maxsplit=5,local.const=NULL,orness=0.7,type,
               ahVect=c(23,28,20,25,25),...)

Arguments

Data frame. In our approach, this is each of the subframes originated after segmenting the whole anthropometric Spanish survey into twelve bust segments, according to the European standard on sizing systems. Size designation of clothes. Part 3: Measuremen

asw.tol

If this value is given, a tolerance or penalty can be introduced (asw.tol > 0 or asw.tol < 0, respectively) in the branch splitting procedure. Default value (0) is maintained. See page 154 of Wit et al. (2004) for more details.

maxsplit

The maximum number of clusters that any cluster can be divided into when searching for the best clustering.

local.const

If this value is given (meaningful values are those between -1 and 1), a proposed partition is accepted only if the associated asw is greater than this constant. Default option for this argument is maintained, that is to say, this value is ignored. See pa

orness

Quantity to measure the degree to which the aggregation is like a min or max operation. See WeightsMixtureUB and GetDistMatrix.

type

Type of HIPAM algorithm to be used. The possible options are 'MO' (for $HIPAM_{MO}$) and 'IMO' (for $HIPAM_{IMO}$).

ahVect

Constants that define the ah slopes of the distance function in GetDistMatrix. Given the five variables considered, this vector is c(23,28,20,25,25). This vector would be different accord

...

Other arguments that may be supplied to the internal functions of the HIPAM algorithms.

Value

A list with the following elements:
clustering: Final clustering that corresponds to the last level of the tree.
asw: The asw of the final clustering.
n.levels: Number of levels in the tree.
medoids: Medoids of all of the clusters in the tree.
active: Activity status of each cluster (FALSE for every cluster of the final partition).
development: Matrix that indicates the ancestors of the final clusters.
num.of.clusters: Number of clusters in the final clustering.
metric: Dissimilarity used (called 'McCulloch' because the dissimilarity function used is that explained in McCulloch et al. (1998)).

Details

The $HIPAM_{MO}$ algorithm uses the getBestPamsamMO and checkBranchLocalMO functions, while the $HIPAM_{IMO}$ algorithm uses the getBestPamsamIMO and checkBranchLocalIMO functions.

For more details of HIPAM, see van der Laan et al. (2003), Wit et al. (2004) and the manual of the smida R package.

References

Vinue, G., Leon, T., Alemany, S., and Ayala, G., (2013). Looking for representative fit models for apparel sizing, Decision Support Systems 57, 22--33.

Wit, E., and McClure, J., (2004). Statistics for Microarrays: Design, Analysis and Inference. John Wiley & Sons, Ltd.

Wit, E., and McClure, J., (2006). Statistics for Microarrays: Inference, Design and Analysis. R package version 0.1. http://www.math.rug.nl/~ernst/book/smida.html.

van der Laan, M. J., and Pollard, K. S., (2003). A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap, Journal of Statistical Planning and Inference 117, 275--303.

Pollard, K. S., and van der Laan, M. J., (2002). A method to identify significant clusters in gene expression data. Vol. II of SCI2002 Proceedings, 318--325.

Irigoien, I., and Arenas, C., (2008). INCA: New statistic for estimating the number of clusters and identifying atypical units, Statistics in Medicine 27, 2948--2973.

Irigoien, I., Sierra, B., and Arenas, C., (2012). ICGE: an R package for detecting relevant clusters and atypical units in gene expression, BMC Bioinformatics 13, 1--29.

McCulloch, C., Paal, B., and Ashdown, S., (1998). An optimization approach to apparel sizing, Journal of the Operational Research Society 49, 492--499.

European Committee for Standardization. Size designation of clothes. Part 3: Measurements and intervals. (2005).

Alemany, S., Gonzalez, J. C., Nacher, B., Soriano, C., Arnaiz, C., and Heras, H., (2010). Anthropometric survey of the Spanish female population aimed at the apparel industry. Proceedings of the 2010 Intl. Conference on 3D Body scanning Technologies, 307--315.

Examples

Run this code

dataDef <- dataDemo
bust <- dataDef$bust

bustCirc_4 <- seq(74,102,4)  ; bustCirc_6 <- seq(107,131,6)  ; bustCirc <- c(bustCirc_4,bustCirc_6) 
nsizes <- length(bustCirc)
maxsplit <- 5 ; orness <- 0.7 ; type <- "IMO" #type <- "MO" for $HIPAM_{MO}$

ahVect <- c(23, 28, 20, 25, 25)

hip <- list()
for(i in 1 : (nsizes - 1)){
  data =  dataDef[(bust >= bustCirc[i]) & (bust < bustCirc[i + 1]), ]   
  d <- as.matrix(data)
  hip[[i]] <- hipamAnthropom(d,maxsplit=maxsplit,orness=orness,type=type,ahVect=ahVect) 
}   
str(hip) 

ress <- list()
for(i in 1 : length(hip)){
  ress[[i]] <- table(hip[[i]]$clustering)
}
ress #clustering results in each bust size.

Run the code above in your browser using DataLab