sam.gen.ncpen: sam.gen.ncpen: generate a simulated dataset.
Description
Generate a synthetic dataset based on the correlation structure from generalized linear models.
Usage
sam.gen.ncpen(n = 100, p = 50, q = 10, k = 3, r = 0.3,
cf.min = 0.5, cf.max = 1, corr = 0.5, seed = NULL,
family = c("gaussian", "binomial", "multinomial", "cox", "poisson"))
Arguments
n
(numeric) the number of samples.
p
(numeric) the number of variables.
q
(numeric) the number of nonzero coefficients.
k
(numeric) the number of classes for multinomial.
r
(numeric) the ratio of censoring for cox.
cf.min
(numeric) value of the minimum coefficient.
cf.max
(numeric) value of the maximum coefficient.
corr
(numeric) strength of correlations in the correlation structure.
seed
(numeric) seed number for random generation. Default does not use seed.
family
(character) model type.
Value
An object with list class containing
x.mat
design matrix.
y.vec
responses.
b.vec
true coefficients.
Details
A design matrix for regression models is generated from the multivariate normal distribution with a correlation structure.
Then the response variables are computed with a specific model based on the true coefficients (see references).
Note the censoring indicator locates at the last column of x.mat for cox.
References
Kwon, S., Lee, S. and Kim, Y. (2016). Moderately clipped LASSO.
Computational Statistics and Data Analysis, 92C, 53-67.
Kwon, S. and Kim, Y. (2012). Large sample properties of the SCAD-penalized maximum likelihood estimation on high dimensions.
Statistica Sinica, 629-653.