Simulates example data for pffr from a variety of terms.
Supports both a formula-based interface (recommended) and a legacy
scenario-based interface for backward compatibility.
pffr_simulate(
formula = NULL,
scenario = NULL,
n = 100,
nxgrid = 40,
nygrid = 60,
yind = NULL,
xind = NULL,
data = NULL,
effects = list(),
intercept = "beta",
SNR = 10,
family = gaussian(),
propmissing = 0,
limits = NULL,
seed = NULL,
wiggliness = 1,
k_truth = list()
)A data frame (or list if propmissing > 0) with simulated
data and attributes:
Evaluation points for functional covariates
Evaluation points for the response
List containing eta (linear predictor),
etaTerms (individual term contributions), beta
(coefficient functions), and epsilon (noise)
A formula specifying the model structure (e.g.,
Y ~ ff(X1) + xlin). If provided, the scenario argument is
ignored.
Deprecated. Character string or vector specifying
predefined scenarios. Use the formula argument instead.
Number of observations.
Number of evaluation points for functional covariates.
Ignored if xind is provided.
Number of evaluation points for the functional response.
Ignored if yind is provided.
Numeric vector of evaluation points for the response.
Defaults to seq(0, 1, length.out = nygrid).
Numeric vector of evaluation points for functional covariates.
Defaults to seq(0, 1, length.out = nxgrid).
Optional data frame with pre-generated covariates.
Named list mapping term labels to effect specifications. Each entry can be a preset name (e.g., "cosine"), a function, or a numeric value. See Details.
Intercept specification: preset name ("beta", "constant",
"sine", "zero"), a function of t, or a numeric value.
Signal-to-noise ratio: var(eta) / var(epsilon).
A family object for the response distribution. Defaults to
gaussian().
Proportion of missing data in the response (0 to 1).
A function defining integration limits for ff() terms,
e.g., function(s, t) s < t.
Optional random seed for reproducibility.
Controls smoothness for the "random" preset (default: 1). Higher values produce more wiggly truth functions. Typical range: 0.001 (very smooth) to 10 (very wiggly).
Named list of basis dimensions for random truth generation.
Defaults: list(ff_s = 8, ff_t = 8, smooth_z = 8, smooth_t = 8,
linear = 8, intercept = 8, concurrent = 8).
For ff() terms: "cosine", "product", "gaussian", "separable", "historical", "random"
For s() terms: "beta", "dnorm", "sine", "cosine", "polynomial", "step", "random"
For c() terms: "constant", "gaussian_2d", "linear"
For intercept: "constant", "beta", "sine", "zero", "random"
For linear terms: "dnorm", "sine", "cosine", "constant", "linear", "random"
For concurrent terms: "dnorm", "sine", "cosine", "constant", "linear", "random"
The "random" preset generates reproducible random truth functions by
sampling from P-spline priors. Use wiggliness to control the
smoothness (curvature).
When specifying custom effects via the effects argument, keys are
matched to formula terms using a 6-level fallback chain:
Exact term label (as produced by terms.formula())
Whitespace-normalized term label
Exact deparsed call (e.g., "ff(X1, xind = s)")
Whitespace-normalized deparsed call
Variable name (first argument, e.g., "X1" for ff(X1))
Type-specific default preset
This means you can specify effects at different levels of specificity:
# Match by variable name (most common, matches level 5):
pffr_simulate(Y ~ ff(X1) + xlin,
effects = list(X1 = "cosine", xlin = "dnorm"))# Match by exact term label (level 1):
pffr_simulate(Y ~ ff(X1, xind = s) + s(xsmoo),
effects = list("ff(X1, xind = s)" = "product",
"s(xsmoo)" = "sine"))
# Match by variable name with custom function (level 5):
pffr_simulate(Y ~ xlin,
effects = list(xlin = function(t) sin(2 * pi * t)))
**Formula Interface (Recommended):** Specify a pffr-style formula and optional effect specifications:
pffr_simulate(Y ~ ff(X1) + xlin + s(xsmoo), n = 100,
effects = list(X1 = "cosine", xlin = "dnorm"))
**Scenario Interface (Deprecated):** Scenario "all" generates data from a complex multivariate model $$Y_i(t) = \mu(t) + \int X_{1i}(s)\beta_1(s,t)ds + xlin \beta_3(t) + f(xte1, xte2) + f(xsmoo, t) + \beta_4 xconst + f(xfactor, t) + \epsilon_i(t)$$
Scenarios "int", "ff", "lin", "te", "smoo", "const", "factor" generate data from simpler models containing only the respective terms.
Sparse/irregular response trajectories can be generated by setting
propmissing > 0.
# Formula interface
dat <- pffr_simulate(Y ~ ff(X1) + xlin + s(xsmoo), n = 50,
effects = list(X1 = "cosine", xlin = "dnorm", xsmoo = "sine"))
# Legacy scenario interface (deprecated)
dat_legacy <- suppressWarnings(pffr_simulate(scenario = "ff", n = 50))
# Access true coefficients
truth <- attr(dat, "truth")
str(truth$beta)
Run the code above in your browser using DataLab