Learn R Programming

modi (version 0.1.2)

Winsimp: Winsorization followed by imputation

Description

Winsorization of outliers according to the Mahalanobis distance followed by an imputation under the multivariate normal model. Only the outliers are winsorized. The Mahalanobis distance MDmiss allows for missing values.

Usage

Winsimp(data, center, scatter, outind, seed = 1000003)

Value

Winsimp returns a list whose first component output is a sublist with the following components:

cutpoint

Cutpoint for outliers

proc.time

Processing time

n.missing.before

Number of missing values before imputation

n.missing.after

Number of missing values after imputation

The further component returned by winsimp is:

imputed.data

Imputed data set

Arguments

data

a data frame with the data.

center

(robust) estimate of the center (location) of the observations.

scatter

(robust) estimate of the scatter (covariance-matrix) of the observations.

outind

logical vector indicating outliers with 1 or TRUE for outliers.

seed

seed for random number generator.

Author

Beat Hulliger

Details

It is assumed that center, scatter and outind stem from a multivariate outlier detection algorithm which produces robust estimates and which declares outliers observations with a large Mahalanobis distance. The cutpoint is calculated as the least (unsquared) Mahalanobis distance among the outliers. The winsorization reduces the weight of the outliers: $$\hat{y}_i = \mu_R + (y_i - \mu_R) \cdot c/d_i$$ where \(\mu_R\) is the robust center and \(d_i\) is the (unsquared) Mahalanobis distance of observation i.

References

Hulliger, B. (2007), Multivariate Outlier Detection and Treatment in Business Surveys, Proceedings of the III International Conference on Establishment Surveys, Montréal.

See Also

MDmiss. Uses imp.norm.

Examples

Run this code
data(bushfirem, bushfire.weights)
det.res <- TRC(bushfirem, weight = bushfire.weights)
imp.res <- Winsimp(bushfirem, det.res$center, det.res$scatter, det.res$outind)
print(imp.res$n.missing.after)

Run the code above in your browser using DataLab