Learn R Programming

fdasrvf (version 2.0.0)

kmeans_align: K-Means Clustering and Alignment

Description

This function clusters functions and aligns using the elastic square-root slope function (SRSF) framework.

Usage

kmeans_align(
  f,
  time,
  K = 1L,
  seeds = NULL,
  nonempty = 0L,
  lambda = 0,
  showplot = FALSE,
  smooth_data = FALSE,
  sparam = 25L,
  parallel = FALSE,
  alignment = TRUE,
  omethod = c("DP", "DP2", "RBFGS"),
  max_iter = 50L,
  thresh = 0.01
)

Value

An object of class fdakma which is a list containing:

  • f0: the original functions;

  • q0: the original SRSFs;

  • fn: the aligned functions as matrices or a 3D arrays of the same shape than f0 by clusters in a list;

  • qn: the aligned SRSFs as matrices or a 3D arrays of the same shape than f0 separated in clusters in a list;

  • labels: the cluster memberships as an integer vector;

  • templates: the centroids in the original functional space;

  • templates.q: the centroids in SRSF space;

  • gam: the warping functions as matrices or a 3D arrays of the same shape than f0 by clusters in a list;

  • qun: cost function value.

Arguments

f

Either a numeric matrix or a numeric 3D array specifying the functions that need to be jointly clustered and aligned.

  • If a matrix, it must be of shape \(M \times N\). In this case, it is interpreted as a sample of \(N\) curves observed on a grid of size \(M\).

  • If a 3D array, it must be of shape \(L \times M \times N\) and it is interpreted as a sample of \(N\) \(L\)-dimensional curves observed on a grid of size \(M\).

time

A numeric vector of length \(M\) specifying the grid on which the curves are evaluated.

K

An integer value specifying the number of clusters. Defaults to 1L.

seeds

An integer vector of length K specifying the indices of the curves in f which will be chosen as initial centroids. Defaults to NULL in which case such indices are randomly chosen.

nonempty

An integer value specifying the minimum number of curves per cluster during the assignment step. Set it to a positive value to avoid the problem of empty clusters. Defaults to 0L.

lambda

A numeric value specifying the elasticity. Defaults to 0.0.

showplot

A boolean specifying whether to show plots. Defaults to FALSE.

smooth_data

A boolean specifying whether to smooth data using a box filter. Defaults to FALSE.

sparam

An integer value specifying the number of box filters applied. Defaults to 25L.

parallel

A boolean specifying whether parallel mode (using foreach::foreach() and the doParallel package) shoud be activated. Defaults to FALSE.

alignment

A boolean specifying whether to perform alignment. Defaults to TRUE.

omethod

A string specifying which method should be used to solve the optimization problem that provides estimated warping functions. Choices are "DP", "DP2" or "RBFGS". Defaults to "DP".

max_iter

An integer value specifying the maximum number of iterations. Defaults to 50L.

thresh

A numeric value specifying a threshold on the cost function below which convergence is assumed. Defaults to 0.01.

References

Srivastava, A., Wu, W., Kurtek, S., Klassen, E., Marron, J. S., May 2011. Registration of functional data using Fisher-Rao metric, arXiv:1103.3817v2.

Tucker, J. D., Wu, W., Srivastava, A., Generative models for functional data using phase and amplitude separation, Computational Statistics and Data Analysis (2012), 10.1016/j.csda.2012.12.001.

Sangalli, L. M., et al. (2010). "k-mean alignment for curve clustering." Computational Statistics & Data Analysis 54(5): 1219-1233.

Examples

Run this code
if (FALSE) {
  out <- kmeans_align(growth_vel$f, growth_vel$time, K = 2)
}

Run the code above in your browser using DataLab