mrfDepth (version 1.0.12)

bagdistance: Bagdistance of points relative to a dataset

Description

Computes the bagdistance of \(p\)-dimensional points z relative to a \(p\)-dimensional dataset x. To compute the bagdistance of a point \(z_i\) the bag of x which is defined as the depth region containing the 50% observations with largest depth. Next, the ray from the halfspace median \(\theta\) through \(z_i\) is considered and \(c_z\) is defined as the intersection of this ray and the boundary of the bag. The bagdistance of \(z_i\) to x is then given by the ratio between the Euclidean distance of \(z_i\) to the halfspace median and the Euclidean distance of \(c_z\) to the halfspace median.

Usage

bagdistance(x, z = NULL, options = list())

Arguments

x

An \(n\) by \(p\) data matrix.

z

An optional \(m\) by \(p\) matrix containing rowwise the points \(z_i\) for which to compute the adjusted outlyingness. If z is not specified, it is set equal to x. is not specified, it is set equal to x.

options

A list of available options:

  • approx In two dimension one may choose to use an approximate algorithm or the exact algorithm to find the bag. Defaults to TRUE.

  • max.iter The maximum number of steps in the bisection algorithm to find the intersection point \(c_z\) (see Details). Defaults to \(100\).

  • All options may be specified that are passed to the hdepth function, see hdepth for details. Note that the option parameter approx is by default set to TRUE to save computation time.

Value

A list with components:

bagdistance

The bagdistance of the points of z with respect to the data matrix x.

cutoff

Points of z whose bagdistance exceeds this cutoff can be considered as outliers with respect to x.

flag

Points of z whose bagdistance exceeds the cutoff receive a flag equal to FALSE, otherwise they receive a flag TRUE.

converged

Vector of length m indicating for each point of z whether the bisection algorithm converged within the maximum number of steps specified by max.iter in the options list.

dimension

When the data x are lying in a lower dimensional subspace, the dimension of this subspace.

hyperplane

When the data x are lying in a lower dimensional subspace, a direction orthogonal to this subspace.

Details

The bagdistance has been introduced in Hubert et al. (2015). It does not assume symmetry and is affine invariant. Note that when the halfspace is not computed in an affine invariant wat, the bagdistance cannot be affine invariant either.

The function first computes the halfspace depth and the halfspace median of x. Additional options may be passed to the hdepth routine by specifying them in the option list argument.

It is first checked whether the data lie in a subspace of dimension smaller than \(p\). If so, a warning is given, as well as the dimension of the subspace and a direction which is orthogonal to it.

Depending on the dimensions different algorithms are used. For \(p=1\) the bagdistance is computed exactly. For \(p=2\) the default setting (options$approx=TRUE) uses an approximated algorithm. Exact computation, based on the exact algoritm to compute the contours of the bag (see the depthContour function), is obtained by setting options$approx to FALSE. Note that this may lead to an increase in computation time.

For the approximated algorithm, the intersection point \(c_z\) is approximated by searching on each ray the point whose depth is equal to the median of the depth values of x. As the halfspace depth is monotone decreasing along the ray, a bisection algorithm is used. Starting limits are obtained by projecting the data on the direction and considering the data point with univariate depth corresponding to the median of the halfspace depths of x. By definition the multivariate depth of this point has to be lower or equal than its univariate depth. A second limit is obtained by considering the deepest location estimate. The maximum number of iterations bisecting the current search interval can be specified through the options argument max.iter.

An observation from z is flagged as an outlier if its bagdistance exceeds a cutoff value. This cutoff is equal to the squareroot of the 0.99 quantile of the chi-squared distribution with \(p\) degrees of freedom.

References

Hubert M., Rousseeuw P.J., Segaert P. (2015). Multivariate functional outlier detection. Statistical Methods & Applications, 24, 177--202.

Hubert M., Rousseeuw P.J., Segaert P. (2017). Multivariate and functional classification using depth and distance. Advances in Data Analysis and Classification, 11, 445--466.

See Also

depthContour, hdepth, bagplot

Examples

Run this code
# NOT RUN {
# Generate some bivariate data
nObs <-500
N <- matrix(rnorm(nObs * 2), nrow = nObs, ncol = 2)
A <- matrix(c(1,1,.5,.1), ncol = 2, nrow = 2)
X <- N
# }
# NOT RUN {
<!-- %*%A -->
# }
# NOT RUN {
# In two dimensions we may either use the approximate
# or exact algorithm to compute the bag.
respons.exact <- bagdistance(x = X, options = list(approx = FALSE))
respons.approx <- bagdistance(x = X, options = list(approx = TRUE))
# The approximate algorithm leads to a good approximation.
plot(respons.exact$bagdistance, respons.approx$bagdistance)
abline(a = 0, b = 1)

# In Hubert et al. (2015) it was shown that for elliptical
# distributions the bagdistance^2 relates to the Mahalanobis
# distances. This may easily be illustrated.
mahDist <- mahalanobis(x = X, colMeans(X), cov(X))
plot(respons.exact$bagdistance^2, mahDist)

# Computation for the bagdistance relies on the calculation
# of halfspace depth using the hdepth function. Options for
# the hdepth routine can be passed down using the options
# arguments. Note that the affine invariance of the bagdistance
# depens on the affine invariant calculation of the halfspace
# depth. Choosing a different type for hdepth may lead to 
# unsatisfying results. 
options <-list(type = "Rotation",
               ndir = 375,
               approx = TRUE,
               seed = 78341)
respons.exact <- bagdistance(x = X, options = options)


# }

Run the code above in your browser using DataLab