twonn: `TWO-NN` estimator

Description

The function can fit the two-nearest neighbor estimator within the maximum likelihood and the Bayesian frameworks. Also, one can obtain the estimates using least squares estimation, depending on the specification of the argument method. This model has been originally presented in Facco et al., 2017 . See also Denti et al., 2022 for more details.

Usage

twonn(
  X = NULL,
  dist_mat = NULL,
  mus = NULL,
  method = c("mle", "linfit", "bayes"),
  alpha = 0.95,
  c_trimmed = 0.01,
  unbiased = TRUE,
  a_d = 0.001,
  b_d = 0.001,
  ...
)
# S3 method for twonn_bayes
print(x, ...)
# S3 method for twonn_bayes
summary(object, ...)
# S3 method for summary.twonn_bayes
print(x, ...)
# S3 method for twonn_bayes
plot(x, plot_low = 0.001, plot_upp = NULL, by = 0.05, ...)
# S3 method for twonn_linfit
print(x, ...)
# S3 method for twonn_linfit
summary(object, ...)
# S3 method for summary.twonn_linfit
print(x, ...)
# S3 method for twonn_linfit
plot(x, ...)
# S3 method for twonn_mle
print(x, ...)
# S3 method for twonn_mle
summary(object, ...)
# S3 method for summary.twonn_mle
print(x, ...)
# S3 method for twonn_mle
plot(x, ...)

Value

list characterized by a class type that depends on the method

chosen. Regardless of the method, the output list always contains the object est, which provides the estimated intrinsic dimension along with uncertainty quantification. The remaining objects vary with the estimation method. In particular, if

method = "mle": the output reports the MLE and the relative confidence interval;
method = "linfit": the output includes the lm() object used for the computation;
method = "bayes": the output contains the (1 + alpha) / 2 and (1 - alpha) / 2 quantiles, mean, mode, and median of the posterior distribution of d.

Arguments

X

data matrix with n observations and D variables.

dist_mat

distance matrix computed between the n observations.

mus

vector of second to first NN distance ratios.

method

chosen estimation method. It can be

"mle": for maximum likelihood estimator;

"linfit"

for estimation via the least squares approach;

"bayes"

for estimation with the Bayesian approach.

alpha

the confidence level (for mle and least squares fit) or posterior probability in the credible interval (bayes).

c_trimmed

the proportion of trimmed observations.

unbiased

logical, applicable when method = "mle". If TRUE, the MLE is corrected to ensure unbiasedness.

a_d

shape parameter of the Gamma prior on the parameter d, applicable when method = "bayes".

b_d

rate parameter of the Gamma prior on the parameter d, applicable when method = "bayes".

...

ignored.

object of class twonn_mle, the output of the twonn function when method = "mle".

object

object of class twonn_mle, obtained from the function twonn_mle().

plot_low

lower bound of the interval on which the posterior density is plotted.

plot_upp

upper bound of the interval on which the posterior density is plotted.

step-size at which the sequence spanning the interval is incremented.

References

Facco E, D'Errico M, Rodriguez A, Laio A (2017). "Estimating the intrinsic dimension of datasets by a minimal neighborhood information." Scientific Reports, 7(1). ISSN 20452322, tools:::Rd_expr_doi("10.1038/s41598-017-11873-y").

Denti F, Doimo D, Laio A, Mira A (2022). "The generalized ratios intrinsic dimension estimator." Scientific Reports, 12(20005). ISSN 20452322, tools:::Rd_expr_doi("10.1038/s41598-022-20991-1").

Examples

Run this code

# dataset with 1000 observations and id = 2
X <- replicate(2,rnorm(1000))
twonn(X)
# dataset with 1000 observations and id = 3
Y <- replicate(3,runif(1000))
#  Bayesian and least squares estimate from distance matrix
dm <- as.matrix(dist(Y,method = "manhattan"))
twonn(dist_mat = dm,method = "bayes")
twonn(dist_mat = dm,method = "linfit")

Run the code above in your browser using DataLab