Identifies outliers in a data set of Interval-valued variables
getIdtOutl(Idt, IdtE=NULL, muE=NULL, SigE=NULL,
eta=0.025, Rewind=NULL, m=length(Rewind),
RefDist=c("ChiSq","HardRockeAdjF","HardRockeAsF","CerioliBetaF"),
multiCmpCor=c("never","always","iterstep"),
outlin=c("MidPandLogR","MidP","LogR"))
An IData object representing interval-valued entities.
Vector with the mean estimates used to find Mahalanobis distances. When specified, it overrides the mean estimate supplied in “IdtE”.
Matrix with the covariance estimates used to find Mahalanobis distances. When specified, it overrides the covariance estimate supplied in “IdtE”.
Nominal size of the null hypothesis that a given observation is not an outlier.
A vector with the subset of entities used to compute trimmed mean and covariance estimates when using a reweighted MCD. Only used when the ‘RefDist’ argument is set to “CerioliBetaF.”
Number of entities used to compute trimmed mean and covariance estimates when using a reweighted MCD. Not used when the ‘RefDist’ argument is set to “ChiSq.”
Whether a multicomparison correction of the nominal size (eta) for the outliers tests should be performed. Alternatives are: ‘never’ -- ignoring the multicomparisons and testing all entities at the ‘eta’ nominal level. ‘always’ -- testing all n entitites at 1.- (1.-‘eta’^(1/n)); and ‘iterstep’ -- use the iterated rule proposed by Cerioli (2010), i.e., make an initial set of tests using the nominal size 1.- (1-‘eta’^(1/n)), and if no outliers are detected stop. Otherwise, make a second step testing for outliers at the ‘eta’ nominal level.
The assumed reference distributions used to find cutoffs defining the observations assumed as outliers. Alternatives are “ChiSq”,“HardRockeAsF”, “HardRockeAdjF” and “CerioliBetaF”, respectivelly for the usual Chi-squared, the asymptotic and adjusted scaled F distributions proposed by Hardin and Rocke (2005), and the Beta and F distributions proposed by Cerioli (2010).
The type of outliers to be considered. “MidPandLogR” if outliers may be present in both MidPpoints and LogRanges, “MidP” if outliers are only present in MidPpoints, or “LogR” if outliers are only present in LogRanges.
A vector with the indices of the entities identified as outliers.
Cerioli, A. (2010), Multivariate Outlier Detection with High-Breakdown Estimators. Journal of the American Statistical Association 105 (489), 147--156.
Duarte Silva, A.P., Filzmoser, P. and Brito, P. (2017), Outlier detection in interval data. Advances in Data Analysis and Classification, 1--38.
Hardin, J. and Rocke, A. (2005), The Distribution of Robust Distances. Journal of Computational and Graphical Statistics 14, 910--927.