Identifies outliers in a data set of Interval-valued variables
getIdtOutl(Idt, IdtE=NULL, muE=NULL, SigE=NULL,
eta=0.025, Rewind=IdtE$RewghtdSet, m=length(Rewind),
RefDist=c("ChiSq","HardRockeAdjF","HardRockeAsF","CerioliBetaF"),
multiCmpCor=c("never","always","iterstep"),
outlin=c("MidPandLogR","MidP","LogR"))
An IData object representing interval-valued entities.
Ao object of class ‘IdtSngNDRE’ or ‘IdtSngNDE’ containing mean and covariance estimates.
Vector with the mean estimates used to find Mahalanobis distances.
Matrix with the covariance estimates used to find Mahalanobis distances.
Nominal size of the null hypothesis that a given observation is not an outlier.
A vector with the subset of entities used to compute trimmed mean and covariance estimates. Only used when the ‘RefDist’ argument is set to “CerioliBetaF.”
Number of entities used to compute trimmed mean and covariance estimates. Not used when the ‘RefDist’ argument is set to “ChiSq.”
Whether a multicomparison correction of the nominal size (eta) for the outliers tests should be performed. Alternatives are: ‘never’ -- ignoring the multicomparisons and testing all entities at ‘eta’. ‘always’ -- testing all n entitites at 1.- (1.-‘eta’^(1/n); and ‘iterstep’ -- as sugested by Cerioli (2010), make an initial set of tests using the nominal size 1.- (1-‘eta’^(1/n), and if no outliers were detected stop. Otherwise, make a second step testing for outliers at ‘eta’.
The assumed reference distributions used to find cutoffs defining the observations assumed as outliers. Alternatives are “ChiSq”,“HardRockeAsF”, “HardRockeAdjF” and “CerioliBetaF”, respectivelly for the usual Qui-squared, the asymptotic and adjusted scaled F distributions proposed by Hardin and Rocke (2005), and the Beta and F distributions proposed by Cerioli (2010).
The type of outliers to be considered. “MidPandLogR” if outliers may be present in both MidPpoints and LogRanges, “MidP” if outliers are only present in MidPpoints, or “LogR” if outliers are only present in LogRanges.
A vector with the indices of the entities identified as outliers.
Cerioli, A. (2010), Multivariate Outlier Detection with High-Breakdown Estimators. Journal of the American Statistical Association 105 (489), 147--156.
Hardin, J. and Rocke, A. (2005), The Distribution of Robust Distances. Journal of Computational and Graphical Statistics 14, 910--927.