ojaMedian: Oja Median

Description

Function to compute the Oja median. Several algorithms are possible.

Usage

ojaMedian(X, alg = "evolutionary", sp = 1, na.action = na.fail, 
          control = ojaMedianControl(...), ...)
          
ojaMedianEvo(X, control = ojaMedianControl(...), ...)
ojaMedianGrid(X, control = ojaMedianControl(...), ...)
ojaMedianEx(X, control = ojaMedianControl(...), ...)

Arguments

numeric data.frame or matrix.

alg

character string denoting the algorithm to be used for computing the Oja median. Options are "exact", "evolutionary" and "grid". Default is "evolutionary". See Details.

number of runs to average over.

na.action

a function which indicates what should happen when the data contain 'NA's. Default is to fail.

control

a list specifying the control parameters of the different algorithms; use the function ojaMedianControl and see its help page.

...

can be used to specify control parameters directly instead of via control.

Value

a numeric vector containing the Oja median.

Details

There are three possible algorithms to calculate the Oja median. The exact algorithm uses a gradient method. It follows intersection lines of hyperplanes until it reaches the minimum of an objective function. It is computationally a very intensive algorithm and it calculates the Oja median in acceptable time in the bivariate case for at least 1200 datapoints. For a 7-dimensional dataset it is possible to calculate it for 24 datapoints. With the evolutionary algorithm it is possible to calculate an approximative solution. It starts with a random point and mutates this temporary best solution in order to gain a better one. There are several options to control the mutation process. If you are interested in a fast calculation of the Oja median and you tolerate a higher error rate, you should set sigmaAdaption to 1. As a second possibility you could limit the number of subsets used to a small number. If you use all subsets, there are in total $n$ choose $k$, with $n$ number of datapoints and $k$ dimensions. If you are interested in a precise solution, the following options have turned out to be useful: initialSigma: 0.5, sigmaAdaptation: 20, adaptationFactor: 0.5, sigmaLog20Decrease: 10. Tests have been made in the bivariate case, but these values should work for every dimension. In the bivariate case it is possible to calculate the Oja median for more than $22*10^6$ datapoints. In the 10-dimensional case the algorithm is still able to calculate an approximative solution for $10^6$ datapoints. Before the algorithm starts itself we transform the data with ICS in order to get a more stable version (with respect to the location of the data) and improve the quality of the approximation. Another reason for this was to get an affine invariant way of the approximation. The third algorithm calculates the Oja median by means of a grid. The grid points are possible approximations of the Oja median. Every grid point is tested to be the Oja median. If the test results are not unique the algorithm will take a bigger sample of subsets into account and test it again. In comparison to the evolutionary algorithm it is slower and less precise. Only in special data situations it might be useful. The algorithm constitutes an earlier heuristical solution to the Oja median problem and is included mainly for historical reasons. The exact algorithm and the grid algorithm are also described in Ronkainen et al. (2002). A lot of calculation time in the ojaMedian function might be spend for checking the input and for transforming it. So if you do time-critical calculations, e.g. with loops, you might want to take the variants ojaMedianEx, ojaMedianEvo or ojaMedianGrid. Please use this only if you know what you are doing, because there are no checks, just the .Call to the algorithm itself. If the dimension of your data is too big or if there are too many observations, it is possible that the exact algorithm will crash R. On a common PC with a 32-bit operating system the following combinations of dimension:amount will work fine: 2:1200, 3:300, 4:100, 5:63, 6:38, 7:24. Bigger datasets might be possible, depending on your system. There is a demo available which demonstrates graphically the Oja median in simple data situations in the bivariate case. To view the demo run demo(ojaMedianDemo).

References

Oja, H. (1983), Descriptive statistics for multivariate distributions, Statistics and Probability Letters, 1, 327--332. Ronkainen, T., Oja, H. and Orponen, P. (2002), Computation of the multivariate Oja median, in Dutter R., Filzmoser P.,Gather U. and Rousseeuw, P. J.: Developments in Robust Statistics, Heidelberg: Springer, 344--359. Fischer, D. (2008), Diplomarbeit, Statistische Eigenschaften des Oja-Medians mit einer algorithmischen Betrachtung, Dortmund: Technische Universit�t{Universitat} Dortmund. In German.

Examples

Run this code

data(biochem)
X <- as.matrix(biochem[,1:2])
ojaMedian(X)
ojaMedian(X, alg = "grid")
ojaMedian(X, alg = "exact")

Run the code above in your browser using DataLab