Generate functional data with coefficients distributed according to a finite mixture of contamined normal distributions such that for the \(\textit{k}\)th cluster we have the multivariate contaminated normal distribution with density $$ f(\gamma_i;\theta_k)=\alpha_k\phi(\gamma_i;\mu_k,\Sigma_k)+(1-\alpha_k)\phi(\gamma_i;\mu_k,\eta_k\Sigma_k)$$ where \(\alpha_k\in (0.5,1)\) represents the proportion of uncontaminated data, \(\eta_k>1\) is the inflation coefficient due to outliers, and \(\phi(\gamma_i;\mu_k,\Sigma_k)\) is the density for the multivariate normal distribution \(N(\mu_k,\Sigma_k)\).
genModelFD(ncurves=1000, nsplines=35, alpha=c(0.9,0.9,0.9),
eta=c(10, 5, 15))
A functional data object representing the simulated data.
Group classifications for each curve.
The number of curves total for the simulation.
The number of splines to fit to the simulated data.
The proportion of uncontaminated data in each group.
The inflation coefficient that measures the increase in variability due to the outliers.
Cristina Anton and Iain Smith
The data are generate from the model \(FCLM[a_k, b_k,{\bf{Q}}_k,d_k,\alpha_k,\eta_k]\). The number of clusters is fixed to \(K=3\) and the mixing proportions are equal \(\pi_1=\pi_2=\pi_3=1/3\). We consider the following values of the parameters
Group 1:\(d=5\), \(a=150\), \(b=5\), \(\mu=(1,0,50,100,0,\ldots,0)\)
Group 2: \(d=20\), \(a=15\), \(b=8\), \(\mu=(0,0,80,0,40,2,0,\ldots,0)\)
Group 3: \(d=10\), \(a=30\), \(b=10\), \(\mu=(0,\ldots,0,20,0,80,0,0,100)\),
where \(d\) is the intrinsic dimension of the subgroups, \(\mu\) is the mean vector of size 70, \(a\) is the values of the \(d\)-first diagonal elements of \(\mathbf{D}\), and \(b\) the value of the last \(70-d\)- elements. Curves as smoothed using 35 Fourier basis functions.
- Amovin-Assagba M, Gannaz I, Jacques J (2022) Outlier detection in multivariate functional data through a contaminated mixture model. Comput Stat Data Anal 174. - Cristina Anton, Iain Smith Model-based clustering of functional data via mixtures of \(t\) distributions. Advances in Data Analysis and Classification (to appear).
# Univariate Contaminated Data
data <- genModelFD(ncurves=300, nsplines=35, alpha=c(0.9,0.9,0.9),
eta=c(10, 7, 17))
plot(data$fd, col = data$groupd)
clm <- data$groupd
Run the code above in your browser using DataLab