TWO-NN estimatorThe function can fit the two-nearest neighbor estimator within the maximum
likelihood and the Bayesian frameworks. Also, one can obtain the estimates
using least squares estimation, depending on the specification of the
argument method. This model has been originally presented in
Facco et al., 2017
. See also Denti et al., 2022
for more details.
twonn(
X = NULL,
dist_mat = NULL,
mus = NULL,
method = c("mle", "linfit", "bayes"),
alpha = 0.95,
c_trimmed = 0.01,
unbiased = TRUE,
a_d = 0.001,
b_d = 0.001,
...
)# S3 method for twonn_bayes
print(x, ...)
# S3 method for twonn_bayes
summary(object, ...)
# S3 method for summary.twonn_bayes
print(x, ...)
# S3 method for twonn_bayes
plot(x, plot_low = 0.001, plot_upp = NULL, by = 0.05, ...)
# S3 method for twonn_linfit
print(x, ...)
# S3 method for twonn_linfit
summary(object, ...)
# S3 method for summary.twonn_linfit
print(x, ...)
# S3 method for twonn_linfit
plot(x, ...)
# S3 method for twonn_mle
print(x, ...)
# S3 method for twonn_mle
summary(object, ...)
# S3 method for summary.twonn_mle
print(x, ...)
# S3 method for twonn_mle
plot(x, ...)
list characterized by a class type that depends on the method
chosen. Regardless of the method, the output list always contains the
object est, which provides the estimated intrinsic dimension along
with uncertainty quantification. The remaining objects vary with the
estimation method. In particular, if
method = "mle"the output reports the MLE and the relative confidence interval;
method = "linfit"the output includes the lm() object used for the computation;
method = "bayes"the output contains the (1 + alpha) / 2 and (1 - alpha) / 2 quantiles, mean, mode, and median of the posterior distribution of d.
data matrix with n observations and D variables.
distance matrix computed between the n observations.
vector of second to first NN distance ratios.
chosen estimation method. It can be
"mle"for maximum likelihood estimator;
"linfit"for estimation via the least squares approach;
"bayes"for estimation with the Bayesian approach.
the confidence level (for mle and least squares fit) or
posterior probability in the credible interval (bayes).
the proportion of trimmed observations.
logical, applicable when method = "mle".
If TRUE, the MLE is corrected to ensure unbiasedness.
shape parameter of the Gamma prior on the parameter d,
applicable when method = "bayes".
rate parameter of the Gamma prior on the parameter d,
applicable when method = "bayes".
ignored.
object of class twonn_mle, the output of the
twonn function when method = "mle".
object of class twonn_mle, obtained from the function
twonn_mle().
lower bound of the interval on which the posterior density is plotted.
upper bound of the interval on which the posterior density is plotted.
step-size at which the sequence spanning the interval is incremented.
Facco E, D'Errico M, Rodriguez A, Laio A (2017). "Estimating the intrinsic dimension of datasets by a minimal neighborhood information." Scientific Reports, 7(1). ISSN 20452322, tools:::Rd_expr_doi("10.1038/s41598-017-11873-y").
Denti F, Doimo D, Laio A, Mira A (2022). "The generalized ratios intrinsic dimension estimator." Scientific Reports, 12(20005). ISSN 20452322, tools:::Rd_expr_doi("10.1038/s41598-022-20991-1").
# dataset with 1000 observations and id = 2
X <- replicate(2,rnorm(1000))
twonn(X)
# dataset with 1000 observations and id = 3
Y <- replicate(3,runif(1000))
# Bayesian and least squares estimate from distance matrix
dm <- as.matrix(dist(Y,method = "manhattan"))
twonn(dist_mat = dm,method = "bayes")
twonn(dist_mat = dm,method = "linfit")
Run the code above in your browser using DataLab