Usage
mixture_generator(n = 130, p = 100, ratio = 0.4, max_compl = 1,
valid = 1000, positive = 0.6, sigma_Y = 10, sigma_X = 0.25,
meanvar = NULL, sigmavar = NULL, lambda = 3, Amax = 15,
lambdapois = 5, gamma = F, gammashape = 1, gammascale = 0.5,
tp1 = 1, tp2 = 1, tp3 = 1, pb = 0, nonlin = 0, pnonlin = 2,
scale = TRUE)
Arguments
n
the number of individuals in the learning
dataset
p
the number of covariates (without the response)
ratio
the ratio of explained covariates
(dependent)
max_compl
the number of covariates in each
subregression
valid
the size of the validation sample
positive
the ratio of positive coefficients in
both the regression and the subregressions
sigma_Y
standard deviation for the noise of the
regression
sigma_X
standard deviation for the noise of the
subregression (all). ignored if gamma=T
gamma
boolean to generate a p-sized vector sigma_X
gamma-distributed
gammashape
shape parameter of the gamma
distribution (if needed)
gammascale
scale parameter of the gamma
distribution (if needed)
meanvar
vector of means for the covariates.
sigmavar
standard deviation of the covariates.
lambda
paramater of the law that define the number
of components in gaussian mixture models
Amax
the maximum number of covariates with
non-zero coefficients in the regression
tp1
the ratio of right-side covariates allowed to
have a non-zero coefficient in the regression
tp2
the ratio of left-side covariates allowed to
have a non-zero coefficient in the regression
tp3
the ratio of strictly independent covariates
allowed to have a non-zero coefficient in the regression
lambdapois
parameter used to generate the
coefficient in the subregressions. poisson distribution.
pb
generates Y in an heuristic way that will give
some issues with correlations.
nonlin
to use non linear structure (half squared ,
half log). if not null, it is the proba to use power
pnonlin instead of log
pnonlin
the power used if non linear structure
scale
boolean to scale X before computing Y