Generates synthetic data for sparse linear regression problems. Returns training and test sets along with model parameters.
simulate_spareg_data(
n,
p,
ntest,
a = min(100, p/4),
snr = 10,
rho = 0.5,
mu = 1,
beta_vals = NULL,
seed = NULL
)A list with the following components:
Training design matrix (n x p).
Training response vector (length n).
Test design matrix (ntest x p).
Test response vector (length ntest).
Intercept used in data generation.
True coefficient vector (length p).
Noise variance used in data generation. Equals beta' Sigma beta / snr.
Integer. Number of training samples.
Integer. Number of predictors (features).
Integer. Number of test samples.
Integer. Number of non-zero coefficients in the true beta vector. Default is min(100, p/4).
Numeric. Signal-to-noise ratio. Default is 10.
Numeric between 0 and 1. Pairwise correlation coefficient among predictors. Default is 0.5. A compound symmetry correlation matrix is used. The variance of the predictors is fixed to 1.
Numeric. Intercept term (mean of response). Default is 1.
Numeric. Possible values for non-zero coefficients in the true beta vector. Default to NULL, in which case the values -3, -2, -1, 1, 2, 3 will be used.
Integer. Random seed for reproducibility. Default is NULL.
set.seed(123)
data <- simulate_spareg_data(n = 200, p = 2000, ntest = 100)
str(data)
Run the code above in your browser using DataLab