mrct.sparse: Sparse minimum regularized covariance trace estimator

Description

Robust outlier detection for sparse functional data as a generalization of the minimum regularized covariance trace (MRCT) estimator oguamalam2023minimummrct. At first the observations are smoothed by a B-spline basis and afterwards the MRCT algorithm is performed with the matrix of basis coefficients.

Usage

mrct.sparse(
  data,
  nbasis = dim(data)[2],
  new.p = dim(data)[2],
  h = 0.75,
  alpha = 0.01,
  initializations = 5,
  seed = 123,
  scaling.iterations = 10,
  scaling.tolerance = 10^(-4),
  criterion = "sum",
  sum.percentage = 0.75
)

Value

A list with two entries

mrct.output: List. The same output as the function mrct(). For more details, see there.
data.smooth: Numeric matrix. Collection of the smoothed curves of data with dim(data)[1] rows and new.p columns. Each row corresponds to one observation.

Arguments

data: Numeric matrix of a functional data set for which the esimator has to be calculated. Each row contains an observation. They are assumed to be observed on the same (probably sparse) regular grid. The number of grid points must be at least nbasis.
nbasis: Integer. Number of B-spline basis functions for smoothing. The basis will be of order \(4\) and therefore, cannot contain less than \(4\) functions. The default value will be set to dim(data)[2]. i.e. the number of time points with a maximum of \(15\).
new.p: Integer. Length of the grid of the smoothed curves. The resulting grid will be an equidistant partition of [rangeval[1],rangeval[length(rangeval)]]. Default value is dim(data)[2]
h: Numeric value between \(0.5\) and \(1\). Ratio of the data which the estimator is based on. Default is set to \(0.75\), i.e. \(75\%\) of the data will be used for the estimator.
alpha: Numeric (default is \(0.01\)). Tikhonov regularization parameter \(\alpha\).
initializations: Integer (default is \(5\)). Number of random initial subsets.
seed: Integer (default is \(123\)). Random seed for reproducibility.
scaling.iterations: Integer (default is \(5\)). The maximum number of times \(k_1\) is re-scaled if the error between subsequent scalingparameters does not fall below scaling.tolerance.
scaling.tolerance: Numeric (default is \(10^{-4}\)). The error tolerance for re-scaling. If the error falls below this value, the re-scaling procedure stops.
criterion: Character. Criterion based on which the optimal subset is chosen among the final subsets. Possible options are: "cluster" and the default "sum".
sum.percentage: Numeric value between \(0.5\) and \(1\). Corresponding to the "sum" criterion. Determines the fraction of observations up to which the sum over the sorted functional Mahalanobis distances is calculated (in ascending order). Default is set to \(0.75\), i.e. the sum of the smallest \(75\%\) of Mahalanobis distances is calculated. If outliers are present, this value should not be to high, in order not to include any outlying curves.

References

oguamalam2023minimummrct.

Examples

Run this code

# Fix seed for reproducibility
set.seed(123)

# Sample outlying indices
cont.ind <- sample(1:50,size=10)

# Generate 50 sparse curves on the interval [0,1] at 10 timepoints with 20% outliers
y <- mrct.rgauss(x.grid=seq(0,1,length.out=10), N=50, model=1,
                 outliers=cont.ind, method="linear")

# Visualize curves (regular curves grey, outliers black)
colormap <- rep("grey",50); colormap[cont.ind] <- "black"
matplot(x = seq(0,1,length.out=10), y = t(y), type="l", lty="solid",
        col=colormap, xlab="t",ylab="")

# Run sparse MRCT
sparse.mrct.y <- mrct.sparse(data = y, nbasis = 10, h = 0.75, new.p = 50,
                             alpha = 0.1, initializations = 10, criterion = "sum" )

# Visualize smoothed functions
matplot(x=seq(0,1,length.out=50), y=t(sparse.mrct.y$data.smooth),
        type="l", lty="solid", col=colormap, xlab="t", ylab="")

# Visualize alpha-Mahalanobis distance with cutoff (horizontal black line)
# Colors correspond to simulated outliers, shapes to estimated (sparse MRCT) ones
# (circle regular and triangle irregular curves)
shapemap <- rep(1,50); shapemap[sparse.mrct.y$mrct.output$theoretical.w] <- 2
plot(x = 1:50, y = sparse.mrct.y$mrct.output$aMHD.w, col=colormap, pch = shapemap,
     xlab = "Index", ylab = expression(alpha*"-MHD"))
abline(h = sparse.mrct.y$mrct.output$quant.w)

# If you dont have any information on possible outliers,
# alternatively you could use the S3 method plot.mrctsparse()
mrct.sparse.plot(mrct.sparse.object = sparse.mrct.y)

Run the code above in your browser using DataLab