# robpca

##### ROBust PCA algorithm

ROBPCA algorithm of Hubert et al. (2005) including reweighting (Engelen et al., 2005) and possible extension to skewed data (Hubert et al., 2009).

- Keywords
- multivariate, robust

##### Usage

```
robpca (x, k = 0, kmax = 10, alpha = 0.75, h = NULL, mcd = FALSE,
ndir = "all", skew = FALSE, ...)
```

##### Arguments

- x
An \(n\) by \(p\) matrix or data matrix with observations in the rows and variables in the columns.

- k
Number of principal components that will be used. When

`k=0`

(default), the number of components is selected using the criterion in Hubert et al. (2005).- kmax
Maximal number of principal components that will be computed, default is 10.

- alpha
Robustness parameter, default is 0.75.

- h
The number of outliers the algorithm should resist is given by \(n-h\). Any value for

`h`

between \(n/2\) and \(n\) may be specified. Default is`NULL`

which uses`h=ceiling(alpha*n)+1`

. Do not specify`alpha`

and`h`

at the same time.- mcd
Logical indicating if the MCD adaptation of ROBPCA may be applied when the number of variables is sufficiently small (see Details). If

`mcd=FALSE`

(default), the full ROBPCA algorithm is always applied.- ndir
Number of directions used when computing the outlyingness (or the adjusted outlyingness when

`skew=TRUE`

), see`outlyingness`

and`adjOutl`

for more details.- skew
Logical indicating if the version for skewed data (Hubert et al., 2009) is applied, default is

`FALSE`

.- ...
Other arguments to pass to methods.

##### Details

This function is based extensively on `PcaHubert`

from rrcov and there are two main differences:

The outlyingness measure that is used for non-skewed data (`skew=FALSE`

) is the Stahel-Donoho measure as described in Hubert et al. (2005) which is also used in `PcaHubert`

. The implementation in mrfDepth (which is used here) is however much faster than the one in `PcaHubert`

and hence more, or even all, directions can be considered when computing the outlyingness measure.

Moreover, the extension for skewed data of Hubert et al. (2009) (`skew=TRUE`

) is also implemented here, but this is not included in `PcaHubert`

.

For an extensive description of the ROBPCA algorithm we refer to Hubert et al. (2005) and to `PcaHubert`

.

When `mcd=TRUE`

and \(n<5 \times p\), we do not apply the full ROBPCA algorithm. The loadings and eigenvalues
are then computed as the eigenvectors and eigenvalues of the MCD estimator applied to the data set after the SVD step.

##### Value

A list with components:

Loadings matrix containing the robust loadings (eigenvectors), a numeric matrix of size \(p\) by \(k\).

Numeric vector of length \(k\) containing the robust eigenvalues.

Scores matrix (computed as \((X-center) \cdot loadings)\), a numeric matrix of size \(n\) by \(k\).

Numeric vector of length \(k\) containing the centre of the data.

Number of (chosen) principal components.

Logical vector of size \(n\) indicating if an observation is in the initial h-subset.

Logical vector of size \(n\) indicating if an observation is kept in the reweighting step.

The robustness parameter \(\alpha\) used throughout the algorithm.

The \(h\)-parameter used throughout the algorithm.

Numeric vector of size \(n\) containing the robust score distances within the robust PCA subspace.

Numeric vector of size \(n\) containing the orthogonal distances to the robust PCA subspace.

Cut-off value for the robust score distances.

Cut-off value for the orthogonal distances.

Numeric vector of size \(n\) containing the SD-flags of the observations. The observations whose score distance is larger than `cutoff.sd`

receive an SD-flag equal to zero. The other observations receive an SD-flag equal to 1.

Numeric vector of size \(n\) containing the OD-flags of the observations. The observations whose orthogonal distance is larger than `cutoff.od`

receive an OD-flag equal to zero. The other observations receive an OD-flag equal to 1.

Numeric vector of size \(n\) containing the flags of the observations. The observations whose score distance is larger than `cutoff.sd`

or whose orthogonal distance is
larger than `cutoff.od`

can be considered as outliers and receive a flag equal to zero.
The regular observations receive flag 1.

##### References

Hubert, M., Rousseeuw, P. J., and Vanden Branden, K. (2005), ``ROBPCA: A New Approach to Robust Principal Component Analysis,'' *Technometrics*, 47, 64--79.

Engelen, S., Hubert, M. and Vanden Branden, K. (2005), ``A Comparison of Three Procedures for Robust PCA in
High Dimensions", *Austrian Journal of Statistics*, 34, 117--126.

Hubert, M., Rousseeuw, P. J., and Verdonck, T. (2009), ``Robust PCA for Skewed Data and Its Outlier Map," *Computational Statistics & Data Analysis*, 53, 2264--2274.

##### See Also

##### Examples

```
# NOT RUN {
X <- dataGen(m=1, n=100, p=10, eps=0.2, bLength=4)$data[[1]]
resR <- robpca(X, k=2)
diagPlot(resR)
# }
```

*Documentation reproduced from package rospca, version 1.0.4, License: GPL (>= 2)*