
Last chance! 50% off unlimited learning
Sale ends in
Estimates the intrinsic dimension of a data set using models of translated Poisson distributions.
maxLikGlobalDimEst(data, k, dnoise = NULL, sigma = 0, n = NULL,
integral.approximation = 'Haro', unbiased = FALSE,
neighborhood.based = TRUE,
neighborhood.aggregation = 'maximum.likelihood', iterations = 5, K = 5)
maxLikPointwiseDimEst(data, k, dnoise = NULL, sigma = 0, n = NULL, indices = NULL,
integral.approximation = 'Haro', unbiased = FALSE, iterations = 5)
maxLikLocalDimEst(data, dnoise = NULL, sigma = 0, n = NULL,
integral.approximation = 'Haro',
unbiased = FALSE, iterations = 5)
data set with each row describing a data point.
the number of distances that should be used for each dimension estimation.
a function or a name of a function giving the translation density. If NULL, no noise is modeled, and the estimator turns into the Hill estimator (see References). Translation densities dnoiseGaussH
and dnoiseNcChi
are provided in the package. dnoiseGaussH
is an approximation of dnoiseNcChi
, but faster.
(estimated) standard deviation of the (isotropic) noise.
dimension of the noise.
the indices of the data points for which local dimension estimation should be made.
how to approximate the integrals in eq. (5) in Haro et al. (2008). Possible values: 'Haro'
, 'guaranteed.convergence'
, 'iteration'
. See Details.
if TRUE
, a factor k-2
is used instead of the factor k-1
that was used in Haro et al. (2008). This makes the estimator is unbiased in the case of data without noise or boundary.
if TRUE, dimension estimation is first made for neighborhoods around each data point and final value is aggregated from this. Otherwise dimension estimation is made once, based on distances in entire data set.
if neighborhood.based
, how should dimension estimates from different neighborhoods be combined. Possible values: 'maximum.liklihood'
follows Haro et al. (2008) in maximizing likelihood by using the harmonic mean, 'mean'
follows Levina and Bickel (2005) and takes the mean, 'robust'
takes the median, to remove influence from possible outliers.
for integral.approxmation = 'iteration'
, how many iterations should be made.
for neighborhood.based = FALSE
, how many distances for each data point should be considered when looking for the k
shortest distances in the entire data set.
For maxLikGlobalDimEst
and maxLikLocalDimEst
, a DimEst
object with one slot:
the dimension estimate
the dimension estimate for each data point. Row i
has the local dimension estimate at point data[indices[i], ]
.
The estimators are based on the referenced paper by Haro et al. (2008), using the assumption that there is a single manifold. The estimator in the paper is obtained using default parameters and dnoise = dnoiseGaussH
.
With integral.approximation = 'Haro'
the Taylor expansion approximation of r^(m-1)
that Haro et al. (2008) used are employed. With integral.approximation = 'guaranteed.convergence'
, r
is factored out and kept and r^(m-2)
is approximated with the corresponding Taylor expansion. This guarantees convergence of the integrals. Divergence might be an issue when the noise is not sufficiently small in comparison to the smallest distances. With integral.approximation = 'iteration'
, five iterations is used to determine m
.
maxLikLocalDimEst
assumes that the data set is local i.e. a piece of a data set cut out by a sphere with a radius such that the data set is well approximated by a hyperplane (meaning that the curvature should be low in the local data set). See localIntrinsicDimension
.
Haro, G., Randall, G. and Sapiro, G. (2008) Translated Poisson Mixture Model for Stratification Learning. Int. J. Comput. Vis., 80, 358-374.
Hill, B. M. (1975) A simple general approach to inference about the tail of a distribution. Ann. Stat., 3(5) 1163-1174.
Levina, E. and Bickel., P. J. (2005) Maximum likelihood estimation of intrinsic dimension. Advances in Neural Information Processing Systems 17, 777-784. MIT Press.
# NOT RUN {
data <- hyperBall(100, d = 7, n = 13, sd = 0.01)
maxLikGlobalDimEst(data, 10, dnoiseNcChi, 0.01, 13)
maxLikGlobalDimEst(data, 10, dnoiseGaussH, 0.01, 13)
maxLikGlobalDimEst(data, 10, dnoiseGaussH, 0.01, 13)
maxLikGlobalDimEst(data, 10, dnoiseGaussH, 0.01, 13, neighborhood.aggregation = 'robust')
maxLikGlobalDimEst(data, 10, dnoiseGaussH, 0.01, 13,
integral.approximation = 'guaranteed.convergence',
neighborhood.aggregation = 'robust')
maxLikGlobalDimEst(data, 10, dnoiseGaussH, 0.01, 13,
integral.approximation = 'iteration', unbiased = TRUE)
data <- hyperBall(1000, d = 7, n = 13, sd = 0.01)
maxLikGlobalDimEst(data, 500, dnoiseGaussH, 0.01, 13,
neighborhood.based = FALSE)
maxLikGlobalDimEst(data, 500, dnoiseGaussH, 0.01, 13,
integral.approximation = 'guaranteed.convergence',
neighborhood.based = FALSE)
maxLikGlobalDimEst(data, 500, dnoiseGaussH, 0.01, 13,
integral.approximation = 'iteration',
neighborhood.based = FALSE)
data <- hyperBall(100, d = 7, n = 13, sd = 0.01)
maxLikPointwiseDimEst(data, 10, dnoiseNcChi, 0.01, 13, indices=1:10)
data <- cutHyperPlane(50, d = 7, n = 13, sd = 0.01)
maxLikLocalDimEst(data, dnoiseNcChi, 0.1, 3)
maxLikLocalDimEst(data, dnoiseGaussH, 0.1, 3)
maxLikLocalDimEst(data, dnoiseNcChi, 0.1, 3,
integral.approximation = 'guaranteed.convergence')
# }
Run the code above in your browser using DataLab