Learn R Programming

randomMachines (version 0.1.1)

sim_class: Generate a binary classification data set from normal distribution

Description

Simulation used as example of a classification task based on a separation of two normal multivariate distributions with different vector of means and differerent covariate matrices. For the label \(A\) the \(\mathbf{X}_{A}\) are sampled from a normal distribution \({MVN}\left(\mu_{A}\mathbf{1}_{p},\sigma_{A}^{2}\mathbf{I}_{p}\right)\) while for label \(B\) the samples \(\mathbf{X}_{B}\) are from a normal distribution \({MVN} \left(\mu_{B}\mathbf{1}_{p},\sigma_{B}^{2}\mathbf{I}_{p}\right)\). For more details see Ara et. al (2021), and Breiman L (1998).

Usage

sim_class(
  n,
  p = 2,
  ratio = 0.5,
  mu_a = 0,
  sigma_a = 1,
  mu_b = 1,
  sigma_b = 1
)

Value

A simulated data.frame with two predictors for a binary classification problem

Arguments

n

Sample size

p

Number of predictors

ratio

Ratio between class A and class B

mu_a

Mean of \(X_{1}\).

sigma_a

Standard deviation of \(X_{1}\).

mu_b

Mean of \(X_{2}\)

sigma_b

Standard devation of \(X_{2}\)

Author

Mateus Maia: mateusmaia11@gmail.com, Anderson Ara: ara@ufpr.br

References

Ara, Anderson, et al. "Random machines: A bagged-weighted support vector model with free kernel choice." Journal of Data Science 19.3 (2021): 409-428.

Breiman, L. (1998). Arcing classifier (with discussion and a rejoinder by the author). The annals of statistics, 26(3), 801-849.

Examples

Run this code
library(randomMachines)
sim_data <- sim_class(n = 100)

Run the code above in your browser using DataLab