Learn R Programming

SpatialBSS (version 0.16-0)

gen_glob_outl: Contamination with Global Outliers

Description

Generates synthetic global outliers and contaminates a given p-variate random field

Usage

gen_glob_outl(x, alpha = 0.05, h = 10, random_sign = FALSE)

Value

gen_glob_outl returns a data.frame containing the contaminated fields as \(p\) first columns. The column \(p + 1\) contains a logical indicator whether the observation is outlier or not.

Arguments

x

a numeric matrix of dimension c(n, p) where the p columns correspond to the entries of the random field and the n rows are the observations.

alpha

a numerical value between 0 and 1 giving the proportion of observations to contaminate.

h

a numerical constant to determine how large the contaminated outliers are, see details.

random_sign

logical. If TRUE, the sign of each component of the outlier is randomly selected. Default is FALSE. See more in details.

Details

gen_glob_outl generates outliers for a given field by selecting randomly round(alpha * n) observations \(x_i\) to be the outliers and contaminating them by setting \(x^{out}_i = (c^i)'x_i\), where the elements \(c^i_j\) of vector \(c^i\) are determined by the parameter random_sign. If random_sign = TRUE, \(c^i_j\) is either \(h\) or \(-h\) with \(P(c^i_j = h) = P(c^i_j = -h) = 0.5\). If random_sign = FALSE, \(c^i_j=h\) for all \(j=1,...p\), \(i=1,...,n\). The parameter alpha determines the contamination rate \(\alpha\) and the parameter h determines the size of the outliers.

See Also

gen_loc_outl

Examples

Run this code
# simulate coordinates
coords <- runif(1000 * 2) * 20
dim(coords) <- c(1000, 2)
coords_df <- as.data.frame(coords)
names(coords_df) <- c("x", "y")
# simulate random field
if (!requireNamespace('gstat', quietly = TRUE)) {
  message('Please install the package gstat to run the example code.')
} else {
  library(gstat)
  model_1 <- gstat(formula = z ~ 1, locations = ~ x + y, dummy = TRUE, beta = 0, 
                   model = vgm(psill = 0.025, range = 1, model = 'Exp'), nmax = 20)
  model_2 <- gstat(formula = z ~ 1, locations = ~ x + y, dummy = TRUE, beta = 0, 
                   model = vgm(psill = 0.025, range = 1, kappa = 2, model = 'Mat'), 
                   nmax = 20)
  model_3 <- gstat(formula = z ~ 1, locations = ~ x + y, dummy = TRUE, beta = 0, 
                   model = vgm(psill = 0.025, range = 1, model = 'Gau'), nmax = 20)
  field_1 <- predict(model_1, newdata = coords_df, nsim = 1)$sim1
  field_2 <- predict(model_2, newdata = coords_df, nsim = 1)$sim1
  field_3 <- predict(model_3, newdata = coords_df, nsim = 1)$sim1
  field <- cbind(field_1, field_2, field_3)
  # Generate 10 % global outliers to data, with size h=15.
  field_cont <- gen_glob_outl(field, alpha = 0.1, h = 15)
  
  # Generate 5 % global outliers to data, with size h = 10 and random sign.
  field_cont2 <- gen_glob_outl(field, alpha = 0.05, h = 10, random_sign = TRUE)
}

Run the code above in your browser using DataLab