tp: Dimension Estimation via Translated Poisson Distributions

Description

Estimates the intrinsic dimension of a data set using models of translated Poisson distributions.

Usage

maxLikGlobalDimEst(data, k, dnoise = NULL, sigma = 0, n = NULL,
        integral.approximation = 'Haro', unbiased = FALSE,
        neighborhood.based = TRUE,
        neighborhood.aggregation = 'maximum.likelihood', iterations = 5, K = 5)
maxLikPointwiseDimEst(data, k, dnoise = NULL, sigma = 0, n = NULL, indices = NULL,
             integral.approximation = 'Haro', unbiased = FALSE, iterations = 5)
maxLikLocalDimEst(data, dnoise = NULL, sigma = 0, n = NULL,
       integral.approximation = 'Haro',
       unbiased = FALSE, iterations = 5)

Arguments

data

data set with each row describing a data point.

the number of distances that should be used for each dimension estimation.

dnoise

a function or a name of a function giving the translation density. If NULL, no noise is modeled, and the estimator turns into the Hill estimator (see References). Translation densities dnoiseGaussH and dnoiseNcChi are provided in the package. dnoiseGaussH is an approximation of dnoiseNcChi, but faster.

sigma

(estimated) standard deviation of the (isotropic) noise.

dimension of the noise.

indices

the indices of the data points for which local dimension estimation should be made.

integral.approximation

how to approximate the integrals in eq. (5) in Haro et al. (2008). Possible values: 'Haro', 'guaranteed.convergence', 'iteration'. See Details.

unbiased

if TRUE, a factor k-2 is used instead of the factor k-1 that was used in Haro et al. (2008). This makes the estimator is unbiased in the case of data without noise or boundary.

neighborhood.based

if TRUE, dimension estimation is first made for neighborhoods around each data point and final value is aggregated from this. Otherwise dimension estimation is made once, based on distances in entire data set.

neighborhood.aggregation

if neighborhood.based, how should dimension estimates from different neighborhoods be combined. Possible values: 'maximum.liklihood' follows Haro et al. (2008) in maximizing likelihood by using the harmonic mean, 'mean' follows Levina and Bickel (2005) and takes the mean, 'robust' takes the median, to remove influence from possible outliers.

iterations

for integral.approxmation = 'iteration', how many iterations should be made.

for neighborhood.based = FALSE, how many distances for each data point should be considered when looking for the k shortest distances in the entire data set.

Value

For maxLikGlobalDimEst and maxLikLocalDimEst, a DimEst object with one slot:

dim.est

the dimension estimate

% For maxLikPointwiseDimEst, a DimEstPointwise object, inheriting data.frame, with one slot:

dim.est

the dimension estimate for each data point. Row i has the local dimension estimate at point data[indices[i], ].

Details

The estimators are based on the referenced paper by Haro et al. (2008), using the assumption that there is a single manifold. The estimator in the paper is obtained using default parameters and dnoise = dnoiseGaussH.

With integral.approximation = 'Haro' the Taylor expansion approximation of r^(m-1) that Haro et al. (2008) used are employed. With integral.approximation = 'guaranteed.convergence', r is factored out and kept and r^(m-2) is approximated with the corresponding Taylor expansion. This guarantees convergence of the integrals. Divergence might be an issue when the noise is not sufficiently small in comparison to the smallest distances. With integral.approximation = 'iteration', five iterations is used to determine m.

maxLikLocalDimEst assumes that the data set is local i.e. a piece of a data set cut out by a sphere with a radius such that the data set is well approximated by a hyperplane (meaning that the curvature should be low in the local data set). See localIntrinsicDimension.

References

Haro, G., Randall, G. and Sapiro, G. (2008) Translated Poisson Mixture Model for Stratification Learning. Int. J. Comput. Vis., 80, 358-374.

Hill, B. M. (1975) A simple general approach to inference about the tail of a distribution. Ann. Stat., 3(5) 1163-1174.

Levina, E. and Bickel., P. J. (2005) Maximum likelihood estimation of intrinsic dimension. Advances in Neural Information Processing Systems 17, 777-784. MIT Press.

Examples

Run this code

# NOT RUN {
data <- hyperBall(100, d = 7, n = 13, sd = 0.01)
maxLikGlobalDimEst(data, 10, dnoiseNcChi, 0.01, 13)
maxLikGlobalDimEst(data, 10, dnoiseGaussH, 0.01, 13)
maxLikGlobalDimEst(data, 10, dnoiseGaussH, 0.01, 13)
maxLikGlobalDimEst(data, 10, dnoiseGaussH, 0.01, 13, neighborhood.aggregation = 'robust')
maxLikGlobalDimEst(data, 10, dnoiseGaussH, 0.01, 13,
        integral.approximation = 'guaranteed.convergence',
        neighborhood.aggregation = 'robust')
maxLikGlobalDimEst(data, 10, dnoiseGaussH, 0.01, 13,
        integral.approximation = 'iteration', unbiased = TRUE)

data <- hyperBall(1000, d = 7, n = 13, sd = 0.01)
maxLikGlobalDimEst(data, 500, dnoiseGaussH, 0.01, 13,
        neighborhood.based = FALSE)
maxLikGlobalDimEst(data, 500, dnoiseGaussH, 0.01, 13,
        integral.approximation = 'guaranteed.convergence',
        neighborhood.based = FALSE)
maxLikGlobalDimEst(data, 500, dnoiseGaussH, 0.01, 13,
        integral.approximation = 'iteration',
        neighborhood.based = FALSE)
        
data <- hyperBall(100, d = 7, n = 13, sd = 0.01)
maxLikPointwiseDimEst(data, 10, dnoiseNcChi, 0.01, 13, indices=1:10)

data <- cutHyperPlane(50, d = 7, n = 13, sd = 0.01)
maxLikLocalDimEst(data, dnoiseNcChi, 0.1, 3)
maxLikLocalDimEst(data, dnoiseGaussH, 0.1, 3)
maxLikLocalDimEst(data, dnoiseNcChi, 0.1, 3,
       integral.approximation = 'guaranteed.convergence')

# }

Run the code above in your browser using DataLab