sim.data.ppls: Simulate Data for Penalized Partial Least Squares (PPLS)
Description
Generates a training and test dataset with non-linear relationships between predictors and response, as used in PPLS simulation studies.
Usage
sim.data.ppls(ntrain, ntest, stnr, p, a = NULL, b = NULL)
Value
A list with the following components:
Xtrain
ntrain x p matrix of training predictors (uniform in [-1, 1]).
ytrain
Numeric vector of training responses.
Xtest
ntest x p matrix of test predictors.
ytest
Numeric vector of test responses.
sigma
Standard deviation of the added noise.
a
Linear coefficients used in the simulation.
b
Nonlinear sine coefficients used in the simulation.
Arguments
ntrain
Integer. Number of training observations.
ntest
Integer. Number of test observations.
stnr
Numeric. Signal-to-noise ratio (higher means less noise).
p
Integer. Number of predictors (must be >= 5).
a
Optional numeric vector of length 5. Linear coefficients for the first 5 variables. If NULL, drawn uniformly from [-1, 1].
b
Optional numeric vector of length 5. Nonlinear sine coefficients. If NULL, drawn uniformly from [-1, 1].
Details
The function simulates a response variable y as a combination of additive linear and sinusoidal effects of the first 5 predictors:
$$f(x) = \sum_{j=1}^{5} a_j x_j + \sin(6 b_j x_j)$$
The response y is then generated by adding Gaussian noise scaled to match the specified signal-to-noise ratio (stnr).
Remaining variables (p - 5) are included as noise variables, making the dataset suitable to evaluate selection or regularization methods.