simulation_model2: Convenience function for generating functional data

Description

This model generates non-persistent magnitude outliers, i.e., the outliers are magnitude outliers for only a portion of the domain of the functional data. The main model is of the form: $$X_i(t) = \mu t + e_i(t),$$ with contamination model of the form: $$X_i(t) = \mu t + qk_iI_{T_i \le t\le T_i+l } + e_i(t)$$ where: $t\in [0,1]$, $e_i(t)$ is a Gaussian process with zero mean and covariance function of the form: $$\gamma(s,t) = \alpha\exp(-\beta|t-s|^\nu),$$ $k_i \in \{-1, 1\}$ with $P(k_i = -1) = P(k_i=1) = 0.5$, $q$ is a constant controlling how far the outliers are from the mass of the data, $I$ is an indicator function, $T_i$ is a uniform random variable between an interval $[a, b] \subset [0,1]$, and $l$ is a constant specifying for how much of the domain the outliers are away from the mean function. Please see the simulation models vignette with vignette("simulation_models", package = "fdaoutlier") for more details.

Usage

simulation_model2(
  n = 100,
  p = 50,
  outlier_rate = 0.05,
  mu = 4,
  q = 8,
  kprob = 0.5,
  a = 0.1,
  b = 0.9,
  l = 0.05,
  cov_alpha = 1,
  cov_beta = 1,
  cov_nu = 1,
  deterministic = TRUE,
  seed = NULL,
  plot = F,
  plot_title = "Simulation Model 2",
  title_cex = 1.5,
  show_legend = T,
  ylabel = "",
  xlabel = "gridpoints"
)

Value

A list containing:

data: a matrix of size n by p containing the simulated data set
true_outliers: a vector of integers indicating the row index of the outliers in the generated data.

Arguments

n: The number of curves to generate. Set to $100$ by default.
p: The number of evaluation points of the curves. Curves are usually generated over the interval $[0, 1]$. Set to $50$ by default.
outlier_rate: A value between $[0, 1]$ indicating the percentage of outliers. A value of $0.06$ indicates about $6\%$ of the observations will be outliers depending on whether the parameter deterministic is TRUE or not. Set to $0.05$ by default.
mu: The mean value of the functions. Set to 4 by default.
q: A value indicating the shift of the outliers from the mean function. Used to control how far the outliers are from the mean function. Set to 8 by default.
kprob: A value between $0$ and $1$ indicating the probability that an outlier will be above or below the mean function. Can be used to control the amount of outliers above or below the mean. Set to $0.5$ by default.
a, b: values values specifying the interval $[a,b]$ for the uniform distribution from which $T_i$ is drawn in the contamination model.
l: the value of $l$ in the contamination model
cov_alpha: A value indicating the coefficient of the exponential function of the covariance matrix, i.e., the $\alpha$ in the covariance function. Set to $1$ by default.
cov_beta: A value indicating the coefficient of the terms inside the exponential function of the covariance matrix, i.e., the $\beta$ in the covariance function. Set to $1$ by default.
cov_nu: A value indicating the power to which to raise the terms inside the exponential function of the covariance matrix, i.e., the $\nu$ in the covariance function. Set to $1$ by default.
deterministic: A logical value. If TRUE, the function will always return round(n*outlier_rate) outliers and consequently the number of outliers is always constant. If FALSE, the number of outliers are determined using n Bernoulli trials with probability outlier_rate, and consequently the number of outliers returned is random. TRUE by default.
seed: A seed to set for reproducibility. NULL by default in which case a seed is not set.
plot: A logical value indicating whether to plot data.
plot_title: Title of plot if plot is TRUE
title_cex: Numerical value indicating the size of the plot title relative to the device default. Set to 1.5 by default. Ignored if plot = FALSE.
show_legend: A logical indicating whether to add legend to plot if plot = TRUE.
ylabel: The label of the y-axis. Set to "" by default.
xlabel: The label of the x-axis if plot = TRUE. Set to "gridpoints" by default.

Examples

Run this code

dtt <- simulation_model2(plot = TRUE)
dtt$true_outliers
dim(dtt$data)

Run the code above in your browser using DataLab