corHuber(x, y,
type = c("bivariate", "adjusted", "univariate"),
standardized = FALSE, centerFun = median,
scaleFun = mad, const = 2, prob = 0.95,
tol = .Machine$double.eps^0.5, ...)
"univariate"
for univariate winsorization,
"adjusted"
for adjusted univariate winsorization,
or "bivariate"
for bivmedian
). Ignored if
standardized
is TRUE
.mad
). Ignored if
standardized
is TRUE
.robStandardize
.const
, thus a symmetric
distribution is assumed. In adjusted univariate
winsorization, the borders for the two diagonally
opposing quadrants containing the minority of the data
are shrunken by a factor that depends on the ratio
between the number of observations in the major and minor
quadrants. It is thus possible to better account for the
bivariate structure of the data while maintaining fast
computation. In bivariate winsorization, a bivariate
normal distribution is assumed and the data are shrunken
towards the boundary of a tolerance ellipse with coverage
probability prob
. The boundary of this ellipse is
thereby given by all points that have a squared
Mahalanobis distance equal to the quantile of the
$\chi^{2}$ distribution given by
prob
. Furthermore, the initial correlation matrix
required for the Mahalanobis distances is computed based
on adjusted univariate winsorization.winsorize
## generate data
library("mvtnorm")
set.seed(1234) # for reproducibility
Sigma <- matrix(c(1, 0.6, 0.6, 1), 2, 2)
xy <- rmvnorm(100, sigma=Sigma)
x <- xy[, 1]
y <- xy[, 2]
## introduce outlier
x[1] <- x[1] * 10
y[1] <- y[1] * (-5)
## compute correlation
cor(x, y)
corHuber(x, y)
Run the code above in your browser using DataLab