The diagnostic plots are based on the robust distances of the observations. In a multivariate sample \(X_n=\{\mathbf{x}_1,...,\mathbf{x}_n\}\),
the robust distance \(d_i\) of observation \(i\) is given by
\(d_i^2=(\mathbf{x}_i-\hat{\mu})'\hat{\Sigma}^{-1}(\mathbf{x}_i-\hat{\mu})\).
where \(\hat{\mu}\) and \(\hat{\Sigma}\) are robust estimates of location and covariance.
Observations with large robust distance are considered as outlying.
The default diagnostic plot in the multivariate regresssion setting (i.e. for objects of type FRBmultireg
and Xdist=TRUE
),
shows the residual distances (i.e. the robust distances of the multivariate residuals) based on the estimates in x
,
versus the distances within the space of the explanatory variables. The latter are based on robust estimates of location and scatter for the
data matrix x$X
(without intercept). Computing these robust estimates may take an appreciable amount of time. The estimator used
corresponds to the one which was used in obtaining Xmultireg
(with the same breakdown point, for example, and the same control parameters).
On the vertical axis a cutoff line is drawn at the square root of the .975 quantile of the chi-squared distribution with degrees of
freedom equal to the number of response variables. On the horizontal axis the same quantile is drawn but now with degrees of freedom
equal to the number of covariates (not including intercept).
Those points to the right of the cutoff can be viewed as high-leverage points. These can be classified into so-called
'bad' or 'good' leverage points depending on whether they are above or below the cutoff. Points above the cutoff but to the
left of the vertical cutoff are sometimes called vertical outliers.
See also Van Aelst and Willems (2005) for example.
To avoid the additional computation time, one can choose Xdist=FALSE
, in which case the residual distances are simply plotted
versus the index of the observation.
The default plot in the context of PCA (i.e. for objects of type FRBpca
and EIF=FALSE
)
is a plot proposed by Pison and Van Aelst (2004). It shows the robust distance versus a measure of the overall empirical influence
of the observation on the (classical) principal components. The empirical influences are obtained by using the influence function of
the eigenvectors of the empirical or classical shape estimator at the normal model, and by
substituting therein the robust estimates for the population parameters.
The overall influence value is then defined by averaging the squared influence
over all coefficients in the eigenvectors.
The vertical line on the plot is an indicative cutoff value, obtained through simulation. This last part takes
a few moments of computation time.
Again, to avoid the additional computation time, one can choose EIF=FALSE
, in which case the robust distances are simply plotted
versus the index of the observation.
For the result of the robust Hotelling test (i.e. for objects of type FRBhot
), the method plots the robust
distance versus the index. In case of a two-sample test, the indices are within-sample and a vertical line separates
the two groups. In the two-sample case, each group has its own location estimate \(\hat{\mu}\) and a common
covariance estimate \(\hat{\Sigma}\).