The Detection-Imputation algorithm computes cellwise robust estimates of the center and covariance matrix of a data set X. The algorithm alternates between the detection of cellwise outliers and their imputation combined with re-estimation of the center and covariance matrix. By default, it starts by calling checkDataSet to clean the data.
DI(X, initEst = "DDCWcov", crit = 0.01, maxits = 10, quant = 0.99,
maxCol = 0.25, checkPars = list())X is the input data, and must be an \(n\) by \(d\) matrix or a data frame.
An initial estimator for the center and covariance matrix. Should be one of "DDCWcov" or "TSGS", where the latter refers to the function GSE::TSGS. The default option "DDCWcov" uses the proposal of Raymaekers and Rousseeuw (2020)  which is much faster for increasing dimension.
The algorithm converges when the subsequent estimates of the center and covariance matrix do not differ more than crit in squared Euclidean norm.
Maximum number of DI-iterations.
The cutoff used to detect cellwise outliers.
The maximum number of cellwise outliers allowed in a column.
Optional list of parameters used in the call to
 checkDataSet. The options are:
coreOnly 
      If TRUE, skip the execution of checkDataset. Defaults to FALSE
numDiscrete
  A column that takes on numDiscrete or fewer values
  will be considered discrete and not retained in the cleaned data.
  Defaults to \(5\).
fracNA Only retain columns and rows with fewer NAs than this fraction. Defaults to \(0.15\).
precScale 
  Only consider columns whose scale is larger than precScale.
  Here scale is measured by the median absolute deviation.
  Defaults to \(1e-12\).
silent
  Whether or not the function progress messages should be suppressed.
  Defaults to FALSE.
A list with components:
center 
    The final estimate of the center of the data.
cov 
    The final estimate of the covariance matrix.
nits 
    Number of DI-iterations executed to reach convergence.
Ximp 
    The imputed data.
indcells 
    Indices of the cells which were flagged in the analysis.
indNAs 
    Indices of the NAs in the data.
Zres 
    Matrix with standardized cellwise residuals of the flagged cells. Contains zeroes in the unflagged cells.
Zres_denom 
    Denominator of the standardized cellwise residuals.
cellPaths 
    Matrix with the same dimensions as X, in which each row contains the path of least angle regression through the cells of that row, i.e. the order of the coordinates in the path (1=first, 2=second,...)
checkDataSet_out 
    Output of the call to checkDataSet which is used to clean the data.
J. Raymaekers and P.J. Rousseeuw (2020). Handling cellwise outliers by sparse regression and robust covariance. Arxiv: 1912.12446. (link to open access pdf)
# NOT RUN {
mu <- rep(0, 3)
Sigma <- diag(3) * 0.1 + 0.9
X <- MASS::mvrnorm(100, mu, Sigma)
DI.out <- DI(X)
DI.out$cov
# For more examples, we refer to the vignette:
vignette("DI_examples")
# }
Run the code above in your browser using DataLab