kma: Clustering and alignment of functional data

Description

kma jointly performs clustering and alignment of a functional dataset (multidimensional or unidimensional functions). To run kma function with different numbers of clusters and/or different alignment methods see kma.compare.

Usage

kma(x, y0 = NULL, y1 = NULL, n.clust = 1, warping.method = "affine",similarity.method = "d1.pearson", center.method = "k-means", seeds = NULL,optim.method = "L-BFGS-B", span = 0.15, t.max = 0.1, m.max = 0.1, n.out = NULL,tol = 0.01, fence = TRUE, iter.max = 100, show.iter = 0, nstart=1, return.all=FALSE)

Arguments

matrix n.func X grid.size or vector grid.size: the abscissa values where each function is evaluated. n.func: number of functions in the dataset. grid.size: maximal number of abscissa values where each functi

matrix n.func X grid.size or array n.func X grid.size X d: evaluations of the set of original functions on the abscissa grid x. n.func: number of functions in the dataset. grid.size

matrix n.func X grid.size or array n.func X grid.size X d: evaluations of the set of original functions first derivatives on the abscissa grid x. Default value of y1 is NULL

n.clust

scalar: required number of clusters. Default value is 1. Note that if n.clust=1 kma performs only alignment without clustering.

warping.method

character: type of alignment required. If warping.method='NOalignment' kma performs only k-mean clustering (without alignment). If warping.method='affine' kma performs alignment (and possibly clustering) of functions using linear

similarity.method

character: required similarity measure. Possible choices are: 'd0.pearson', 'd1.pearson', 'd0.L2', 'd1.L2', 'd0.L2.centered', 'd1.L2.centered'. Default value is 'd1.pearso

center.method

character: type of clustering method to be used. Possible choices are: 'k-means' and 'k-medoids'. Default value is 'k-means'.

seeds

vector max(n.clust) or matrix nstart X n.clust: indexes of the functions to be used as initial centers. If it is a matrix, each row contains the indexes of the initial centers of one of the nstart initializations; i

optim.method

character: optimization method chosen to find the best warping functions at each iteration. Possible choices are: 'L-BFGS-B' and 'SANN'. See optim function for details. Default method is 'L

span

scalar: the span to be used for the loess procedure in the center estimation step when center.method='k-means'. Default value is 0.15. If center.method='k-medoids' value of span is i

t.max

scalar: t.max controls the maximal allowed shift, at each iteration, in the alignment procedure with respect to the range of curve domains. t.max must be such that 0 (e.g., t.max=0.1 means that shift

m.max

scalar: m.max controls the maximal allowed dilation, at each iteration, in the alignment procedure. m.max must be such that 0 (e.g., m.max=0.1 means that dilation is bounded, at each iteration, betwe

n.out

scalar: the desired length of the abscissa for computation of the similarity indexes and the centers. Default value is round(1.1*grid.size).

tol

scalar: the algorithm stops when the increment of similarity of each function with respect to the corrispondent center is lower than tol. Default value is 0.01.

fence

boolean: if fence=TRUE a control is activated at the end of each iteration. The aim of the control is to avoid shift/dilation outlighers with respect to their computed distributions. If fence=TRUE the running time can increase co

iter.max

scalar: maximum number of iterations in the k-mean alignment cycle. Default value is 100.

show.iter

boolean: if show.iter=TRUE kma shows the current iteration of the algorithm. Default value is FALSE.

nstart

scalar: number of initializations with different seeds. Default value is 1.

return.all

boolean: if return.all=TRUE the results of all the nstart initializations are return; the output is a list of length nstart. If return.all=FALSE only the best result is provided (the one with higher mean

Value

The function output is a list containing the following elements:
iterationsscalar: total number of iterations performed by kma function.
xas input.
y0as input.
y1as input.
n.clustas input.
warping.methodas input.
similarity.methodas input.
center.methodas input.
x.center.origvector n.out: abscissa of the original center.
y0.center.origmatrix 1 X n.out: the unique row contains the evaluations of the original function center. If warping.method='k-means' there are two scenarios: if similarity.method='d0.pearson' or 'd0.L2' or d0.L2.centered the original function center is computed via loess procedure applied to original data; if similarity.method='d1.pearson' or 'd1.L2' or d1.L2.centered it is computed by integration of first derivatives center y1.center.orig (the integration constant is computed minimizing the sum of the weighed L2 distances between the center and the original functions). If warping.method='k-medoids' the original function center is the medoid of original functions.
y1.center.origmatrix 1 X n.out: the unique row contains the evaluations of the original function first derivatives center. If warping.method='k-means' the original center is computed via loess procedure applied to original function first derivatives. If warping.method='k-medoids' the original center is the medoid of original functions.
similarity.origvector: original similarities between the original functions and the original center.
x.finalmatrix n.func X grid.size: aligned abscissas.
n.clust.finalscalar: final number of clusters. Note that, when center.method='k.means', the parameter n.clust.final may differ from initial number of clusters (i.e., from n.clust) if some clusters are found to be empty. In this case a warning message is issued.
x.centers.finalvector n.out: abscissas of the final function centers and/or of the final function first derivatives centers.
y0.centers.finalmatrix n.clust.final X n.out: rows contain the evaluations of the final functions centers. y0.centers.final is NULL if y0 is not given as input.
y1.centers.finalmatrix n.clust.final X n.out: rows contains the evaluations of the final derivatives centers. y1.centers.final is NULL if the chosen similarity measure does not concern function first derivatives.
labelsvector: cluster assignments.
similarity.finalvector: similarities between each function and the center of the cluster the function is assigned to.
dilation.listlist: dilations obtained at each iteration of kma function.
shift.listlist: shifts obtained at each iteration of kma function.
dilationvector: dilation applied to the original abscissas x to obtain the aligned abscissas x.final.
shiftvector: shift applied to the original abscissas x to obtain the aligned abscissas x.final.

References

Sangalli, L.M., Secchi, P., Vantini, S., Vitelli, V., 2010. "K-mean alignment for curve clustering". Computational Statistics and Data Analysis, 54, 1219-1233.

Examples

Run this code

data(kma.data)

x <- kma.data$x # abscissas
y0 <- kma.data$y0 # evaluations of original functions
y1 <- kma.data$y1 # evaluations of original function first derivatives

# Plot of original functions
matplot(t(x),t(y0), type='l', xlab='x', ylab='orig.func')
title ('Original functions')

# Plot of original function first derivatives
matplot(t(x),t(y1), type='l', xlab='x', ylab='orig.deriv')
title ('Original function first derivatives')


# Example: result of kma function with 2 clusters, 
# allowing affine transformation for the abscissas
# and considering 'd1.pearson' as similarity.method.
kma_example <- kma (
  x=x, y0=y0, y1=y1, n.clust = 2, 
  warping.method = 'affine', 
  similarity.method = 'd1.pearson',
  center.method = 'k-means', 
  seeds = c(1,21)
  )

kma.show.results(kma_example)

names(kma_example)

# Labels assigned to each function
kma_example$labels

# Total shifts and dilations applied to the original 
# abscissa to obtain the aligned abscissa
kma_example$shift
kma_example$dilation

Run the code above in your browser using DataLab