getIdtOutl: Get Interval Data Outliers

Description

Identifies outliers in a data set of Interval-valued variables

Usage

getIdtOutl(Idt, IdtE=NULL, muE=NULL, SigE=NULL,
  eta=0.025, Rewind=NULL, m=length(Rewind),
  RefDist=c("ChiSq","HardRockeAdjF","HardRockeAsF","CerioliBetaF"),
  multiCmpCor=c("never","always","iterstep"), 
  outlin=c("MidPandLogR","MidP","LogR"))

Arguments

Idt

An IData object representing interval-valued entities.

IdtE

Ao object of class '>IdtSngNDRE or '>IdtSngNDE containing mean and covariance estimates.

muE

Vector with the mean estimates used to find Mahalanobis distances. When specified, it overrides the mean estimate supplied in “IdtE”.

SigE

Matrix with the covariance estimates used to find Mahalanobis distances. When specified, it overrides the covariance estimate supplied in “IdtE”.

eta

Nominal size of the null hypothesis that a given observation is not an outlier.

Rewind

A vector with the subset of entities used to compute trimmed mean and covariance estimates when using a reweighted MCD. Only used when the ‘RefDist’ argument is set to “CerioliBetaF.”

Number of entities used to compute trimmed mean and covariance estimates when using a reweighted MCD. Not used when the ‘RefDist’ argument is set to “ChiSq.”

multiCmpCor

Whether a multicomparison correction of the nominal size (eta) for the outliers tests should be performed. Alternatives are: ‘never’ -- ignoring the multicomparisons and testing all entities at the ‘eta’ nominal level. ‘always’ -- testing all n entitites at 1.- (1.-‘eta’^(1/n)); and ‘iterstep’ -- use the iterated rule proposed by Cerioli (2010), i.e., make an initial set of tests using the nominal size 1.- (1-‘eta’^(1/n)), and if no outliers are detected stop. Otherwise, make a second step testing for outliers at the ‘eta’ nominal level.

RefDist

The assumed reference distributions used to find cutoffs defining the observations assumed as outliers. Alternatives are “ChiSq”,“HardRockeAsF”, “HardRockeAdjF” and “CerioliBetaF”, respectivelly for the usual Chi-squared, the asymptotic and adjusted scaled F distributions proposed by Hardin and Rocke (2005), and the Beta and F distributions proposed by Cerioli (2010).

outlin

The type of outliers to be considered. “MidPandLogR” if outliers may be present in both MidPpoints and LogRanges, “MidP” if outliers are only present in MidPpoints, or “LogR” if outliers are only present in LogRanges.

Value

A vector with the indices of the entities identified as outliers.

References

Cerioli, A. (2010), Multivariate Outlier Detection with High-Breakdown Estimators. Journal of the American Statistical Association 105 (489), 147--156.

Duarte Silva, A.P., Filzmoser, P. and Brito, P. (2017), Outlier detection in interval data. Advances in Data Analysis and Classification, 1--38.

Hardin, J. and Rocke, A. (2005), The Distribution of Robust Distances. Journal of Computational and Graphical Statistics 14, 910--927.

Description

Usage

Arguments

Value

References

See Also