Computes global and pairwise Mahalanobis distances for visualizing global and local multivariate outliers. The size of the neighborhood (number of neighbors) is varying, but the fraction of neighbors is fixed.
locoutNeighbor(dat, X, Y, propneighb = 0.1, variant = c("dist", "knn"), usemax = 1/3,
npoints = 50, chisqqu = 0.975, indices = NULL, xlab = NULL, ylab = NULL,
colall = gray(0.7), colsel = 1, ...)
indices of the (selected) observations being regular observations
indices of the (selected) observations being golbal outliers
multivariate data set (without coordinates)
X coordinates of the data points
Y coordinates of the data points
proportion of neighbors to be included in tolerance ellipse
either search for neighbors according to the Eucl.Distance, or according to kNN
for either variant: give fraction of points (max Dist) that is used for the plot
computation is done at most at npoints points
quantile of the chisquare distribution for splitting the plot
if this is not NULL, these should be indices of observations to be highlighted
x-axis label for plot
y-axis label for plot
color for lines if indices is NULL
color for lines if indices are selected
additional parameters for plotting
Peter Filzmoser <P.Filzmoser@tuwien.ac.at> http://cstat.tuwien.ac.at/filz/
For this diagnostic tool, the number of neighbors is varied up to a fraction of usemax observations. Then propneighb (called beta) is fixed, and for each observation we compute the degree of isolation from a fraction of 1-beta of its neighbors. Neighborhood can be defined either via the Euclidean distance or by k-Nearest-Neighbors. For computational reasons, all computations are done at most for npoints points. The critical value for outliers is the quantile chisqqu of the chisquare distribution. One can also provide indices of observations (for indices). Then the corresponding lines in the plots will be highlighted.
P. Filzmoser, A. Ruiz-Gazen, and C. Thomas-Agnan: Identification of local multivariate outliers. Submitted for publication, 2012.
locoutPercent, locoutSort
# use data from illustrative example in paper:
data(X)
data(Y)
data(dat)
res <- locoutNeighbor(dat,X,Y,variant="knn",usemax=1,chisqqu=0.975,indices=c(1,11,24,36),
propneighb=0.1,npoints=100)
Run the code above in your browser using DataLab