TDDclust: Trimmed clustering based on L1 data depth

Description

This is the trimmed version of the clustering algorithm based on the L1 depth proposed by Rebecka Jornsten (2004). She segments all the observations in clusters, and assigns to each point z in the data space, the L1 depth value regarding its cluster. A trimmed procedure is incorporated to remove the more extreme individuals of each cluster (those one with the lowest depth values), in line with trimowa.

Usage

TDDclust(x,K,lambda,Th,A,T0,alpha,Trimm,data1)

Arguments

Data frame. Each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric.

Number of clusters.

lambda

Tuning parameter that controls the influence the data depth has over the clustering, see Jornsten (2004).

Threshold for observations to be relocated, usually set to 0.

Number of iterations.

Simulated annealing parameter. It is the current temperature in the simulated annealing procedure.

alpha

Simulated annealing parameter. It is the decay rate, default 0.9.

Trimm

Proportion of non-accommodated sample.

data1

The same data frame as x, used to incorporate the trimmed observations into the rest of them for the next iteration.

Value

A list with the following elements:
NN: Cluster assignment, NN[1,] is the final partition.
Y: The multivariate median cluster representatives
DD: Depth values of the observations (only if there are trimmed observations).
Cost: Final value of the optimal partition.
indivTrimmed: Trimmed observations.
klBest: Iteration in which the optimal partition was found.

References

Jornsten R., (2004). Clustering and classification based on the L1 data depth, Journal of Multivariate Analysis 90, 67--89

Vinue, G., and Ibanez, M. V., (2014). Data depth and Biclustering applied to anthropometric data. Exploring their utility in apparel design. In progress.

Examples

Run this code

dataDef <- dataDemo[1 : 25, c(2, 3, 5)] #In the interests of simplicity of the computation involved.
data1 <- dataDemo[1 : 25, c(2, 3, 5)]

K <- 3 ; percTrimm <- 0.01 ; lambda <- 0.5 ; A <- 5
Th <- 0 ; T0 <- 0 ; alpha <- 0.9

set.seed(2014)
Dout <- TDDclust(x = dataDef, K = K, lambda = lambda, Th = Th, A = A,
                 T0 = T0, alpha = alpha, Trimm = percTrimm, data1 = data1)

table(Dout$NN[1,])
Dout$Cost
Dout$klBest
Dout$indivTrimmed

Run the code above in your browser using DataLab