pffr_simulate: Simulate example data for pffr

Description

Simulates example data for pffr from a variety of terms. Supports both a formula-based interface (recommended) and a legacy scenario-based interface for backward compatibility.

Usage

pffr_simulate(
  formula = NULL,
  scenario = NULL,
  n = 100,
  nxgrid = 40,
  nygrid = 60,
  yind = NULL,
  xind = NULL,
  data = NULL,
  effects = list(),
  intercept = "beta",
  SNR = 10,
  family = gaussian(),
  propmissing = 0,
  limits = NULL,
  seed = NULL,
  wiggliness = 1,
  k_truth = list()
)

Value

A data frame (or list if propmissing > 0) with simulated data and attributes:

xindex: Evaluation points for functional covariates
yindex: Evaluation points for the response
truth: List containing eta (linear predictor), etaTerms (individual term contributions), beta (coefficient functions), and epsilon (noise)

Arguments

formula: A formula specifying the model structure (e.g., Y ~ ff(X1) + xlin). If provided, the scenario argument is ignored.
scenario: Deprecated. Character string or vector specifying predefined scenarios. Use the formula argument instead.
n: Number of observations.
nxgrid: Number of evaluation points for functional covariates. Ignored if xind is provided.
nygrid: Number of evaluation points for the functional response. Ignored if yind is provided.
yind: Numeric vector of evaluation points for the response. Defaults to seq(0, 1, length.out = nygrid).
xind: Numeric vector of evaluation points for functional covariates. Defaults to seq(0, 1, length.out = nxgrid).
data: Optional data frame with pre-generated covariates.
effects: Named list mapping term labels to effect specifications. Each entry can be a preset name (e.g., "cosine"), a function, or a numeric value. See Details.
intercept: Intercept specification: preset name ("beta", "constant", "sine", "zero"), a function of t, or a numeric value.
SNR: Signal-to-noise ratio: var(eta) / var(epsilon).
family: A family object for the response distribution. Defaults to gaussian().
propmissing: Proportion of missing data in the response (0 to 1).
limits: A function defining integration limits for ff() terms, e.g., function(s, t) s < t.
seed: Optional random seed for reproducibility.
wiggliness: Controls smoothness for the "random" preset (default: 1). Higher values produce more wiggly truth functions. Typical range: 0.001 (very smooth) to 10 (very wiggly).
k_truth: Named list of basis dimensions for random truth generation. Defaults: list(ff_s = 8, ff_t = 8, smooth_z = 8, smooth_t = 8, linear = 8, intercept = 8, concurrent = 8).

Effect Presets

For ff() terms: "cosine", "product", "gaussian", "separable", "historical", "random"

For s() terms: "beta", "dnorm", "sine", "cosine", "polynomial", "step", "random"

For c() terms: "constant", "gaussian_2d", "linear"

For intercept: "constant", "beta", "sine", "zero", "random"

For linear terms: "dnorm", "sine", "cosine", "constant", "linear", "random"

For concurrent terms: "dnorm", "sine", "cosine", "constant", "linear", "random"

The "random" preset generates reproducible random truth functions by sampling from P-spline priors. Use wiggliness to control the smoothness (curvature).

Effect key matching

When specifying custom effects via the effects argument, keys are matched to formula terms using a 6-level fallback chain:

Exact term label (as produced by terms.formula())
Whitespace-normalized term label
Exact deparsed call (e.g., "ff(X1, xind = s)")
Whitespace-normalized deparsed call
Variable name (first argument, e.g., "X1" for ff(X1))
Type-specific default preset

This means you can specify effects at different levels of specificity:


# Match by variable name (most common, matches level 5):
pffr_simulate(Y ~ ff(X1) + xlin,
              effects = list(X1 = "cosine", xlin = "dnorm"))
# Match by exact term label (level 1):
pffr_simulate(Y ~ ff(X1, xind = s) + s(xsmoo),
              effects = list("ff(X1, xind = s)" = "product",
                             "s(xsmoo)" = "sine"))
# Match by variable name with custom function (level 5):
pffr_simulate(Y ~ xlin,
              effects = list(xlin = function(t) sin(2 * pi * t)))

Details

**Formula Interface (Recommended):** Specify a pffr-style formula and optional effect specifications:


pffr_simulate(Y ~ ff(X1) + xlin + s(xsmoo), n = 100,
              effects = list(X1 = "cosine", xlin = "dnorm"))

**Scenario Interface (Deprecated):** Scenario "all" generates data from a complex multivariate model $$Y_i(t) = \mu(t) + \int X_{1i}(s)\beta_1(s,t)ds + xlin \beta_3(t) + f(xte1, xte2) + f(xsmoo, t) + \beta_4 xconst + f(xfactor, t) + \epsilon_i(t)$$

Scenarios "int", "ff", "lin", "te", "smoo", "const", "factor" generate data from simpler models containing only the respective terms.

Sparse/irregular response trajectories can be generated by setting propmissing > 0.

Examples

Run this code

# Formula interface
dat <- pffr_simulate(Y ~ ff(X1) + xlin + s(xsmoo), n = 50,
                     effects = list(X1 = "cosine", xlin = "dnorm", xsmoo = "sine"))

# Legacy scenario interface (deprecated)
dat_legacy <- suppressWarnings(pffr_simulate(scenario = "ff", n = 50))

# Access true coefficients
truth <- attr(dat, "truth")
str(truth$beta)

Run the code above in your browser using DataLab