mixture_generator: Gaussiam mixture dataset generator with regression between the covariates

Description

Gaussiam mixture dataset generator with regression between the covariates

Usage

mixture_generator(n = 130, p = 100, ratio = 0.4, max_compl = 1,
  valid = 1000, positive = 0.6, sigma_Y = 10, sigma_X = 0.25,
  meanvar = NULL, sigmavar = NULL, lambda = 3, Amax = 15,
  lambdapois = 5, gamma = F, gammashape = 1, gammascale = 0.5,
  tp1 = 1, tp2 = 1, tp3 = 1, pb = 0, nonlin = 0, pnonlin = 2,
  scale = TRUE)

Arguments

the number of individuals in the learning dataset

the number of covariates (without the response)

ratio

the ratio of explained covariates (dependent)

max_compl

the number of covariates in each subregression

valid

the size of the validation sample

positive

the ratio of positive coefficients in both the regression and the subregressions

sigma_Y

standard deviation for the noise of the regression

sigma_X

standard deviation for the noise of the subregression (all). ignored if gamma=T

gamma

boolean to generate a p-sized vector sigma_X gamma-distributed

gammashape

shape parameter of the gamma distribution (if needed)

gammascale

scale parameter of the gamma distribution (if needed)

meanvar

vector of means for the covariates.

sigmavar

standard deviation of the covariates.

lambda

paramater of the law that define the number of components in gaussian mixture models

Amax

the maximum number of covariates with non-zero coefficients in the regression

tp1

the ratio of right-side covariates allowed to have a non-zero coefficient in the regression

tp2

the ratio of left-side covariates allowed to have a non-zero coefficient in the regression

tp3

the ratio of strictly independent covariates allowed to have a non-zero coefficient in the regression

lambdapois

parameter used to generate the coefficient in the subregressions. poisson distribution.

generates Y in an heuristic way that will give some issues with correlations.

nonlin

to use non linear structure (half squared , half log). if not null, it is the proba to use power pnonlin instead of log

pnonlin

the power used if non linear structure

scale

boolean to scale X before computing Y