Simulation used as example of a classification task based on a separation of two normal multivariate distributions with different vector of means and differerent covariate matrices. For the label \(A\) the \(\mathbf{X}_{A}\) are sampled from a normal distribution \({MVN}\left(\mu_{A}\mathbf{1}_{p},\sigma_{A}^{2}\mathbf{I}_{p}\right)\) while for label \(B\) the samples \(\mathbf{X}_{B}\) are from a normal distribution \({MVN} \left(\mu_{B}\mathbf{1}_{p},\sigma_{B}^{2}\mathbf{I}_{p}\right)\). For more details see Ara et. al (2021), and Breiman L (1998).
sim_class(
n,
p = 2,
ratio = 0.5,
mu_a = 0,
sigma_a = 1,
mu_b = 1,
sigma_b = 1
)
A simulated data.frame with two predictors for a binary classification problem
Sample size
Number of predictors
Ratio between class A and class B
Mean of \(X_{1}\).
Standard deviation of \(X_{1}\).
Mean of \(X_{2}\)
Standard devation of \(X_{2}\)
Mateus Maia: mateusmaia11@gmail.com, Anderson Ara: ara@ufpr.br
Ara, Anderson, et al. "Random machines: A bagged-weighted support vector model with free kernel choice." Journal of Data Science 19.3 (2021): 409-428.
Breiman, L. (1998). Arcing classifier (with discussion and a rejoinder by the author). The annals of statistics, 26(3), 801-849.
library(randomMachines)
sim_data <- sim_class(n = 100)
Run the code above in your browser using DataLab