diagplot: Plot Method for Objects of class 'FRBmultireg'

Description

Diagnostic plots for objects of class FRBmultireg, FRBpca and FRBhot. It shows robust distances and allows detection of multivariate outliers.

Usage

# S3 method for FRBmultireg
diagplot(x, Xdist = TRUE, ...)
# S3 method for FRBpca
diagplot(x, EIF = TRUE, ...)
# S3 method for FRBhot
diagplot(x, ...)

Value

Returns invisibly the first argument.

Arguments

x: an R object of class FRBmultireg (typically created by FRBmultiregS, FRBmultiregMM or FRBmultiregGS or by Sest_multireg, MMest_multireg or GSest_multireg) or an R object of class FRBpca (typically created by FRBpcaS or FRBpcaMM) or an R object of class FRBhot (typically created by FRBhotellingS or FRBhotellingMM)
Xdist: logical: if TRUE, the plot shows the robust distance versus the distance in the space of the explanatory variables; if FALSE, it plots the robust distance versus the index of the observation
EIF: logical: if TRUE, the plot shows the robust distance versus an influence measure for each point; if FALSE, it plots the robust distance versus the index of the observation
...: potentially more arguments to be passed

Author

Gert Willems and Ella Roelant

Details

The diagnostic plots are based on the robust distances of the observations. In a multivariate sample $X_n=\{\mathbf{x}_1,...,\mathbf{x}_n\}$, the robust distance $d_i$ of observation $i$ is given by $d_i^2=(\mathbf{x}_i-\hat{\mu})'\hat{\Sigma}^{-1}(\mathbf{x}_i-\hat{\mu})$. where $\hat{\mu}$ and $\hat{\Sigma}$ are robust estimates of location and covariance. Observations with large robust distance are considered as outlying.

The default diagnostic plot in the multivariate regresssion setting (i.e. for objects of type FRBmultireg and Xdist=TRUE), shows the residual distances (i.e. the robust distances of the multivariate residuals) based on the estimates in x, versus the distances within the space of the explanatory variables. The latter are based on robust estimates of location and scatter for the data matrix x$X (without intercept). Computing these robust estimates may take an appreciable amount of time. The estimator used corresponds to the one which was used in obtaining Xmultireg (with the same breakdown point, for example, and the same control parameters). On the vertical axis a cutoff line is drawn at the square root of the .975 quantile of the chi-squared distribution with degrees of freedom equal to the number of response variables. On the horizontal axis the same quantile is drawn but now with degrees of freedom equal to the number of covariates (not including intercept). Those points to the right of the cutoff can be viewed as high-leverage points. These can be classified into so-called 'bad' or 'good' leverage points depending on whether they are above or below the cutoff. Points above the cutoff but to the left of the vertical cutoff are sometimes called vertical outliers. See also Van Aelst and Willems (2005) for example.

To avoid the additional computation time, one can choose Xdist=FALSE, in which case the residual distances are simply plotted versus the index of the observation.

The default plot in the context of PCA (i.e. for objects of type FRBpca and EIF=FALSE) is a plot proposed by Pison and Van Aelst (2004). It shows the robust distance versus a measure of the overall empirical influence of the observation on the (classical) principal components. The empirical influences are obtained by using the influence function of the eigenvectors of the empirical or classical shape estimator at the normal model, and by substituting therein the robust estimates for the population parameters. The overall influence value is then defined by averaging the squared influence over all coefficients in the eigenvectors. The vertical line on the plot is an indicative cutoff value, obtained through simulation. This last part takes a few moments of computation time.

Again, to avoid the additional computation time, one can choose EIF=FALSE, in which case the robust distances are simply plotted versus the index of the observation.

For the result of the robust Hotelling test (i.e. for objects of type FRBhot), the method plots the robust distance versus the index. In case of a two-sample test, the indices are within-sample and a vertical line separates the two groups. In the two-sample case, each group has its own location estimate $\hat{\mu}$ and a common covariance estimate $\hat{\Sigma}$.

References

G. Pison and S. Van Aelst (2004). Diagnostic Plots for Robust Multivariate Methods. Journal of Computational and Graphical Statistics, 13, 310--329.
S. Van Aelst and G. Willems (2005). Multivariate Regression S-Estimators for Robust Estimation and Inference. Statistica Sinica, 15, 981--1001.
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1--32. tools:::Rd_expr_doi("10.18637/jss.v053.i03").

Examples

Run this code


    ## for multivariate regression:
    # \donttest{
        data(schooldata)
        MMres <- MMest_multireg(cbind(reading,mathematics,selfesteem)~., data=schooldata)
        diagplot(MMres)
        ## a large 'bad leverage' outlier should be noticeable (observation 59)
    # }
        
    ## for PCA:
    # \donttest{
        data(ForgedBankNotes)
        MMres <- FRBpcaMM(ForgedBankNotes)
        diagplot(MMres)
    # }
    
    ## a group of 15 fairly strong outliers can be seen which apparently would have
    ## a large general influence on a classical PCA analysis
    
    ## for Hotelling tests (two-sample)
    # \donttest{
        data(hemophilia, package="rrcov")
        MMres <- FRBhotellingMM(cbind(AHFactivity, AHFantigen) ~ gr, data=hemophilia)
        diagplot(MMres)
    # }
    
    ## the data seem practically outlier-free

Run the code above in your browser using DataLab