clus.torus: Clustering on the torus by conformal prediction

Description

clus.torus returns clustering results of data on the torus based on inductive conformal prediction set

Usage

clus.torus(
  data,
  split.id = NULL,
  model = c("kmeans", "mixture"),
  mixturefitmethod = c("axis-aligned", "circular", "general"),
  kmeansfitmethod = c("general", "homogeneous-circular", "heterogeneous-circular",
    "ellipsoids"),
  J = NULL,
  level = NULL,
  option = NULL,
  verbose = TRUE,
  ...
)
# S3 method for clus.torus
plot(
  x,
  panel = 1,
  assignment = "outlier",
  data = NULL,
  ellipse = TRUE,
  type = NULL,
  overlay = FALSE,
  out = FALSE,
  ...
)

Arguments

data

n x d matrix of toroidal data on $[0, 2\pi)^d$ or $[-\pi, \pi)^d$. Default is NULL.

split.id

a n-dimensional vector consisting of values 1 (estimation) and 2(evaluation)

model

A string. One of "mixture" and "kmeans" which determines the model or estimation methods. If "mixture", the model is based on the von Mises mixture, fitted with an EM algorithm. It supports the von Mises mixture and its variants based conformity scores. If "kmeans", the model is also based on the von Mises mixture, but the parameter estimation is implemented with the elliptical k-means algorithm. It supports the log-max-mixture based conformity score only. If the dimension of data space is greater than 2, only "kmeans" is supported. Default is model = "kmeans".

mixturefitmethod

A string. One of "circular", "axis-aligned", and "general" which determines the constraint of the EM fitting. Default is "axis-aligned". This argument only works for model = "mixture".

kmeansfitmethod

A string. One of "general", ellipsoids", "heterogeneous-circular" or "homogeneous-circular". If "general", the elliptical k-means algorithm with no constraint is used. If "ellipsoids", only the one iteration of the algorithm is used. If"heterogeneous-circular", the same as above, but with the constraint that ellipsoids must be spheres. If "homogeneous-circular", the same as above but the radii of the spheres are identical. Default is "general". This argument only works for model = "kmeans".

the number of components for mixture model fitting. If J is a vector, then hyperparam.torus is used to choose optimal J. If J == NULL, then J = 4:30 is used.

level

a scalar in $[0,1]$. The level of the conformal prediction set used for clustering. If level == NULL, then hyperparam.alpha is used to choose optimal level

option

A string. One of "elbow", "risk", "AIC", or "BIC", which determines the criterion for the model selection. "risk" is based on the negative log-likelihood, "AIC" for the Akaike Information Criterion, and "BIC" for the Bayesian Information Criterion. "elbow" is based on minimizing the criterion used in Jung et. al.(2021). This argument is only used if J is a vector or NULL.

verbose

boolean index, which indicates whether display additional details as to what the algorithm is doing or how many loops are done. Default is TRUE.

...

Further arguments that will be passed to icp.torus and hyperparam.torus

clus.torus object

panel

One of 1 or 2 which determines the type of plot. If panel = 1, x$cluster.obj will be plotted, if panel = 2, x$icp.torus will be plotted. If panel = 3, x$hyperparam.select will be plotted. Default is panel = 1.

assignment

A string. One of "outlier", "log.density", "posterior", "mahalanobis". Default is "outlier".

ellipse

A boolean index which determines whether plotting ellipse-intersections. Default is TRUE. Only available for panel = 2.

type

A string. One of "mix", "max" or "e". This argument is only available if icp.torus object is fitted with model = "mixture". Default is NULL. If type != NULL, argument ellipse automatically becomes FALSE. If "mix", it plots based on von Mises mixture. If "max", it plots based on von Mises max-mixture. If "e", it plots based on ellipse-approximation.

overlay

A boolean index which determines whether plotting ellipse-intersections on clustering plots. Default is FALSE. Only available for panel = 1.

out

An option for returning the ggplot object. Default is FALSE.

Value

clus.torus returns a clus.torus object, which consists of following 3 different S3 objects;

cluster.obj: cluster.obj object; clustering assignment results for several methods. For detail, see cluster.assign.torus.
icp.torus: icp.torus object; containing model parameters and conformity scores. For detail, see icp.torus.
hyperparam.select: hyperparam.torus object (if J = NULL or a sequence of numbers, and level = NULL or a sequence of numbers), hyperparam.J object (if level is a scalar), or hyperparam.alpha object (if J is a scalar); contains information for the optimally chosen model (number of components J) and level (alpha) based on prespecified criterion. For detail, see hyperparam.torus, hyperparam.J, and hyperparam.alpha.

Details

clus.torus is a user-friendly all-in-one function which implements following procedures automatically: 1. compute conformity scores for given model and fitting method, 2. choose optimal model and level based on prespecified criterion, and 3. make clusters based on the chosen model and level. Procedure 1-3 can be independently done with icp.torus, hyperparam.torus, hyperparam.J, hyperparam.alpha and cluster.assign.torus. If you want to see more detail for each procedure, please see icp.torus, hyperparam.J, hyperparam.alpha hyperparam.torus, cluster.assign.torus.

References

Jung, S., Park, K., & Kim, B. (2021). Clustering on the torus by conformal prediction. The Annals of Applied Statistics, 15(4), 1583-1603.

Mardia, K. V., Kent, J. T., Zhang, Z., Taylor, C. C., & Hamelryck, T. (2012). Mixtures of concentrated multivariate sine distributions with applications to bioinformatics. Journal of Applied Statistics, 39(11), 2475-2492.

Shin, J., Rinaldo, A., & Wasserman, L. (2019). Predictive clustering. arXiv preprint arXiv:1903.08125.

Examples

Run this code

# NOT RUN {
data <- toydata2[, 1:2]
n <- nrow(data)
clus.torus(data = data, model = "kmeans", kmeansfitmethod = "general", J = 5:30, option = "risk")
# }

Run the code above in your browser using DataLab