Learn R Programming

extremevalues (version 1.0)

getOutliers: Detect outliers

Description

Detects outliers in one dimensional data, based on the assumption that the bulk of (the right side of) the observed data distribution can be adequately described by a model distribution.

Usage

getOutliers(y, rho=0.1, pval=c(0.5,0.9), method="lognormal")

Arguments

y
Vector of one-dimensional nonnegative data
rho
A value $y_i$ is an outlier if it is above the limit where less then rho observations are expected. Must be >=0.
pval
c(pmin,pmax) quantile limits indicating which data should be used to fit the model distribution. Must obey 0 < pmin < pmax < 1.
method
Model distributiun used to estimate the limit. Choose from "lognormal" (default), "exponential", "pareto", "weibull" or "normal".

Value

  • iOutIndex vector indicating where y > limit
  • nOutNumber of outliers. The largest nOut values of y are outliers
  • limitOutlier limit. Elements of y larger then or equal to limit are considered outliers
  • NpopLength of y
  • methodmethod
  • rhoThe rho-value
  • pminpval[1]
  • pmaxpval[2]
  • NfitNumber of values used in the fit
  • R2R-squared value for the fit
  • lambda(exponential distribution) Estimated location (and spread) parameter for $f(y)=\lambda\exp(-\lambda y)$
  • mu(lognormal distribution) Estimated $E(\ln(y))$ for lognormal distribution
  • sigma(lognormal distribution) Estimated $Var(ln(y))$ for lognormal distribution
  • ym(pareto distribution) Estimated location parameter (mode) for pareto distribution
  • alpha(pareto distribution) Estimated spread parameter for pareto distribution
  • k(weibull distribution) estimated shape parameter $k$ for weibull distribution
  • lambda(weibull distribution) estimated scale parameter $\lambda$ for weibull distribution
  • mu(normal distribution) Estimated $E(y)$ for normal distribution
  • sigma(normal distribution) Estimated $Var(y)$ for normal distribution

Details

The function sorts the values of y and uses (log)linear regression to fit the values between the pmin and pmax quantile to the cdf of a model distribution. Given a model cdf $F$, the outlier limit $l$ is the value above which less than $\rho$ values are expected, conditional on the total number of observations in $y$: $l=F^{-1}(1-\rho/N|\hat{\theta})$. Here, $\theta$ are the cdf's estimated parameters.

References

An outlier detection method for economic data, M.P.J. van der Loo, Submitted to The Journal of Official Statistics (November 2009) The file /R-/library/extremevalues/extremevalues.pdf contains a worked example. It can also be downloaded from my website.

Examples

Run this code
y <- c(10^rnorm(50),500);
L <- getOutliers(y,rho=0.5);
outlierPlot(y,L);

Run the code above in your browser using DataLab