Learn R Programming

MAINT.Data (version 1.1.1)

getIdtOutl: Get Interval Data Outliers

Description

Identifies outliers in a data set of Interval-valued variables

Usage

getIdtOutl(Idt, IdtE=NULL, muE=NULL, SigE=NULL,
  eta=0.025, Rewind=IdtE$RewghtdSet, m=length(Rewind),
  RefDist=c("ChiSq","HardRockeAdjF","HardRockeAsF","CerioliBetaF"),
  multiCmpCor=c("never","always","iterstep"), 
  outlin=c("MidPandLogR","MidP","LogR"))

Arguments

Idt

An IData object representing interval-valued entities.

IdtE

Ao object of class ‘IdtSngNDRE’ or ‘IdtSngNDE’ containing mean and covariance estimates.

muE

Vector with the mean estimates used to find Mahalanobis distances.

SigE

Matrix with the covariance estimates used to find Mahalanobis distances.

eta

Nominal size of the null hypothesis that a given observation is not an outlier.

Rewind

A vector with the subset of entities used to compute trimmed mean and covariance estimates. Only used when the ‘RefDist’ argument is set to “CerioliBetaF.”

m

Number of entities used to compute trimmed mean and covariance estimates. Not used when the ‘RefDist’ argument is set to “ChiSq.”

multiCmpCor

Whether a multicomparison correction of the nominal size (eta) for the outliers tests should be performed. Alternatives are: ‘never’ -- ignoring the multicomparisons and testing all entities at ‘eta’. ‘always’ -- testing all n entitites at 1.- (1.-‘eta’^(1/n); and ‘iterstep’ -- as sugested by Cerioli (2010), make an initial set of tests using the nominal size 1.- (1-‘eta’^(1/n), and if no outliers were detected stop. Otherwise, make a second step testing for outliers at ‘eta’.

RefDist

The assumed reference distributions used to find cutoffs defining the observations assumed as outliers. Alternatives are “ChiSq”,“HardRockeAsF”, “HardRockeAdjF” and “CerioliBetaF”, respectivelly for the usual Qui-squared, the asymptotic and adjusted scaled F distributions proposed by Hardin and Rocke (2005), and the Beta and F distributions proposed by Cerioli (2010).

outlin

The type of outliers to be considered. “MidPandLogR” if outliers may be present in both MidPpoints and LogRanges, “MidP” if outliers are only present in MidPpoints, or “LogR” if outliers are only present in LogRanges.

Value

A vector with the indices of the entities identified as outliers.

References

Cerioli, A. (2010), Multivariate Outlier Detection with High-Breakdown Estimators. Journal of the American Statistical Association 105 (489), 147--156.

Hardin, J. and Rocke, A. (2005), The Distribution of Robust Distances. Journal of Computational and Graphical Statistics 14, 910--927.

See Also

fasttle, fulltle