icp.torus: Conformity score for inductive prediction sets

Description

icp.torus prepares all values for computing the conformity score for specified methods.

plot.icp.torus plots icp.torus object with some options.

Usage

icp.torus(
  data,
  split.id = NULL,
  model = c("kmeans", "kde", "mixture"),
  mixturefitmethod = c("axis-aligned", "circular", "general"),
  kmeansfitmethod = c("general", "homogeneous-circular", "heterogeneous-circular",
    "ellipsoids"),
  init = c("hierarchical", "kmeans"),
  d = NULL,
  additional.condition = TRUE,
  J = 4,
  concentration = 25,
  kmax = 500,
  THRESHOLD = 1e-10,
  maxiter = 200,
  verbose = TRUE,
  ...
)
# S3 method for icp.torus
logLik(object, ...)
# S3 method for icp.torus
predict(object, newdata, ...)
# S3 method for icp.torus
plot(
  x,
  data = NULL,
  level = 0.1,
  ellipse = TRUE,
  out = FALSE,
  type = NULL,
  ...
)

Arguments

data

n x d matrix of toroidal data on \([0, 2\pi)^d\) or \([-\pi, \pi)^d\). Default is NULL.

split.id

a n-dimensional vector consisting of values 1 (estimation) and 2(evaluation)

model

A string. One of "kde", "mixture", and "kmeans" which determines the model or estimation methods. If "kde", the model is based on the kernel density estimates. It supports the kde-based conformity score only. If "mixture", the model is based on the von Mises mixture, fitted with an EM algorithm. It supports the von Mises mixture and its variants based conformity scores. If "kmeans", the model is also based on the von Mises mixture, but the parameter estimation is implemented with the elliptical k-means algorithm illustrated in Appendix. It supports the log-max-mixture based conformity score only. If the dimension of data space is greater than 2, only "kmeans" is supported. Default is model = "kmeans".

mixturefitmethod

A string. One of "circular", "axis-aligned", and "general" which determines the constraint of the EM fitting. Default is "axis-aligned". This argument only works for model = "mixture".

kmeansfitmethod

A string. One of "general", ellipsoids", "heterogeneous-circular" or "homogeneous-circular". If "general", the elliptical k-means algorithm with no constraint is used. If "ellipsoids", only the one iteration of the algorithm is used. If"heterogeneous-circular", the same as above, but with the constraint that ellipsoids must be spheres. If "homogeneous-circular", the same as above but the radii of the spheres are identical. Default is "general". This argument only works for model = "kmeans".

init

Methods for choosing initial values of "kmeans" fitting. Must be "hierarchical" or "kmeans". If "hierarchical", the initial parameters are obtained with hierarchical clustering method. If "kmeans", the initial parameters are obtained with extrinsic k-means method. Additional arguments for k-means clustering and hierarchical clustering can be designated via argument .... If no options are designated, nstart=1 for init="kmeans" and method="complete" for init="hierarchical" are used. Default is "hierarchical".

pairwise distance matrix(dist object) for init = "hierarchical", which used in hierarchical clustering. If init = "hierarchical" and d = NULL, d will be automatically filled with ang.pdist(data).

additional.condition

boolean index. If TRUE, a singular matrix will be altered to the scaled identity.

A scalar or numeric vector for the number(s) of components for model = c("mixture", "kmeans"). Default is J = 4.

concentration

A scalar or numeric vector for the concentration parameter(s) for model = "kde". Default is concentration = 25.

kmax

the maximal number of kappa. If estimated kappa is larger than kmax, then put kappa as kmax.

THRESHOLD

number for difference between updating and updated parameters. Default is 1e-10.

maxiter

the maximal number of iteration. Default is 200.

verbose

boolean index, which indicates whether display additional details as to what the algorithm is doing or how many loops are done. Moreover, if additional.condition is TRUE, the warning message will be reported.

...

additional parameters. For plotting icp.torus, these parameters are for ggplot2::ggplot().

object

icp.torus object

newdata

n x d matrix of toroidal data on \([0, 2\pi)^d\). Dimension d must be the same as data used for icp.torus object.

icp.torus object

level

either a numeric scalar or a vector in \([0,1]\). Default value is 0.1.

ellipse

A boolean index which determines whether plotting ellipses from mixture models. Default is TRUE. (This option is used only when the icp.torus object x is fitted by model kmeans or mixture.)

out

An option for returning the ggplot object. Default is FALSE.

type

A string. One of "mix", "max" or "e". This argument is only available if icp.torus object is fitted with model = "mixture". Default is NULL. If type != NULL, argument ellipse automatically becomes FALSE. If "mix", it plots based on von Mises mixture. If "max", it plots based on von Mises max-mixture. If "e", it plots based on ellipse-approximation.

Value

icp.torus returns an icp.torus object, containing all values to compute the conformity score (if J or concentration is a single value). if J or concentration is a vector containing multiple values, then icp.torus returns a list of icp.torus objects

References

Jung, S., Park, K., & Kim, B. (2021). Clustering on the torus by conformal prediction. The Annals of Applied Statistics, 15(4), 1583-1603.

Mardia, K. V., Kent, J. T., Zhang, Z., Taylor, C. C., & Hamelryck, T. (2012). Mixtures of concentrated multivariate sine distributions with applications to bioinformatics. Journal of Applied Statistics, 39(11), 2475-2492.

Di Marzio, M., Panzera, A., & Taylor, C. C. (2011). Kernel density estimation on the torus. Journal of Statistical Planning and Inference, 141(6), 2156-2173.

Shin, J., Rinaldo, A., & Wasserman, L. (2019). Predictive clustering. arXiv preprint arXiv:1903.08125.

Examples

Run this code

# NOT RUN {
data <- toydata1[, 1:2]

icp.torus <- icp.torus(data, model = "kmeans",
                       kmeansfitmethod = "general",
                       J = 4, concentration = 25)
# }

Run the code above in your browser using DataLab