This models generates shape outliers that have a different shape for a portion of the domain.
The main model is of the form: $$X_i(t) = \mu t + e_i(t),$$ with
contamination model of the form:
$$X_i(t) = \mu t + (-1)^u q + (-1)^{(1-u)}(\frac{1}{\sqrt{r\pi}})\exp(-z(t-v)^w) + e_i(t)$$
where: \(t\in [0,1]\), \(e_i(t)\) is a Gaussian process with zero mean
and covariance function of the form: $$\gamma(s,t) = \alpha\exp(-\beta|t-s|^\nu),$$
\(u\) follows Bernoulli distribution with probability \(P(u = 1) = 0.5\);
\(q\), \(r\), \(z\) and \(w\) are constants, and \(v\) follows
a Uniform distribution between an interval \([a, b]\) and \(m\) is a constant.
Please see the simulation models vignette with
vignette("simulation_models", package = "fdaoutlier")
for more details.
simulation_model6(
n = 100,
p = 50,
outlier_rate = 0.1,
mu = 4,
q = 1.8,
kprob = 0.5,
a = 0.25,
b = 0.75,
cov_alpha = 1,
cov_beta = 1,
cov_nu = 1,
pi_coeff = 0.02,
exp_pow = 2,
exp_coeff = 50,
deterministic = TRUE,
seed = NULL,
plot = F,
plot_title = "Simulation Model 6",
title_cex = 1.5,
show_legend = T,
ylabel = "",
xlabel = "gridpoints"
)
A list containing:
a matrix of size n
by p
containing the simulated data set
a vector of integers indicating the row index of the outliers in the generated data.
The number of curves to generate. Set to \(100\) by default.
The number of evaluation points of the curves. Curves are usually generated over the interval \([0, 1]\). Set to \(50\) by default.
A value between \([0, 1]\) indicating the percentage of outliers.
A value of \(0.06\) indicates about \(6\%\) of the observations will be outliers
depending on whether the parameter deterministic
is TRUE
or not.
Set to \(0.05\) by default.
The mean value of the functions in the main and contamination model.
Set to 4
by default.
The constant term \(q\) in the contamination model. Set to \(1.8\) by default.
The probability \(P(u = 1)\). Set to \(0.5\) by default.
Values specifying the interval of from which \(v\) in the contamination model is drawn. Set to \(0.25\) and \(0.75\) respectively.
A value indicating the coefficient of the exponential function of the covariance matrix, i.e., the \(\alpha\) in the covariance function. Set to \(1\) by default.
A value indicating the coefficient of the terms inside the exponential function of the covariance matrix, i.e., the \(\beta\) in the covariance function. Set to \(1\) by default.
A value indicating the power to which to raise the terms inside the exponential function of the covariance matrix, i.e., the \(\nu\) in the covariance function. Set to \(1\) by default.
The constant \(r\) in the contamination model i.e., the coefficient of of \(pi\). Set to \(0.02\) by default.
The constant \(w\) in the contamination model i.e., the power of the term in the exponential function of the contamination model. Set to \(2\).
The constant \(z\) in the contamination model i.e., the coefficient term in the exponential function of the contamination model. Set to \(50\) by default.
A logical value. If TRUE
, the function will always return
round(n*outlier_rate)
outliers and consequently the number of outliers is always constant.
If FALSE
, the number of outliers are determined using n
Bernoulli trials with
probability outlier_rate
, and consequently the number of outliers returned is random.
TRUE
by default.
A seed to set for reproducibility. NULL
by default in which case a seed
is not set.
A logical value indicating whether to plot data.
Title of plot if plot
is TRUE
Numerical value indicating the size of the plot title relative to the device default.
Set to 1.5 by default. Ignored if plot = FALSE
.
A logical indicating whether to add legend to plot if plot = TRUE
.
The label of the y-axis. Set to ""
by default.
The label of the x-axis if plot = TRUE
. Set to
"gridpoints"
by default.
dt <- simulation_model6(n = 50, plot = TRUE)
dim(dt$data)
dt$true_outliers
Run the code above in your browser using DataLab