Learn R Programming

refund (version 0.1-40)

pffr_simulate: Simulate example data for pffr

Description

Simulates example data for pffr from a variety of terms. Supports both a formula-based interface (recommended) and a legacy scenario-based interface for backward compatibility.

Usage

pffr_simulate(
  formula = NULL,
  scenario = NULL,
  n = 100,
  nxgrid = 40,
  nygrid = 60,
  yind = NULL,
  xind = NULL,
  data = NULL,
  effects = list(),
  intercept = "beta",
  SNR = 10,
  family = gaussian(),
  propmissing = 0,
  limits = NULL,
  seed = NULL,
  wiggliness = 1,
  k_truth = list()
)

Value

A data frame (or list if propmissing > 0) with simulated data and attributes:

xindex

Evaluation points for functional covariates

yindex

Evaluation points for the response

truth

List containing eta (linear predictor), etaTerms (individual term contributions), beta (coefficient functions), and epsilon (noise)

Arguments

formula

A formula specifying the model structure (e.g., Y ~ ff(X1) + xlin). If provided, the scenario argument is ignored.

scenario

Deprecated. Character string or vector specifying predefined scenarios. Use the formula argument instead.

n

Number of observations.

nxgrid

Number of evaluation points for functional covariates. Ignored if xind is provided.

nygrid

Number of evaluation points for the functional response. Ignored if yind is provided.

yind

Numeric vector of evaluation points for the response. Defaults to seq(0, 1, length.out = nygrid).

xind

Numeric vector of evaluation points for functional covariates. Defaults to seq(0, 1, length.out = nxgrid).

data

Optional data frame with pre-generated covariates.

effects

Named list mapping term labels to effect specifications. Each entry can be a preset name (e.g., "cosine"), a function, or a numeric value. See Details.

intercept

Intercept specification: preset name ("beta", "constant", "sine", "zero"), a function of t, or a numeric value.

SNR

Signal-to-noise ratio: var(eta) / var(epsilon).

family

A family object for the response distribution. Defaults to gaussian().

propmissing

Proportion of missing data in the response (0 to 1).

limits

A function defining integration limits for ff() terms, e.g., function(s, t) s < t.

seed

Optional random seed for reproducibility.

wiggliness

Controls smoothness for the "random" preset (default: 1). Higher values produce more wiggly truth functions. Typical range: 0.001 (very smooth) to 10 (very wiggly).

k_truth

Named list of basis dimensions for random truth generation. Defaults: list(ff_s = 8, ff_t = 8, smooth_z = 8, smooth_t = 8, linear = 8, intercept = 8, concurrent = 8).

Effect Presets

For ff() terms: "cosine", "product", "gaussian", "separable", "historical", "random"

For s() terms: "beta", "dnorm", "sine", "cosine", "polynomial", "step", "random"

For c() terms: "constant", "gaussian_2d", "linear"

For intercept: "constant", "beta", "sine", "zero", "random"

For linear terms: "dnorm", "sine", "cosine", "constant", "linear", "random"

For concurrent terms: "dnorm", "sine", "cosine", "constant", "linear", "random"

The "random" preset generates reproducible random truth functions by sampling from P-spline priors. Use wiggliness to control the smoothness (curvature).

Effect key matching

When specifying custom effects via the effects argument, keys are matched to formula terms using a 6-level fallback chain:

  1. Exact term label (as produced by terms.formula())

  2. Whitespace-normalized term label

  3. Exact deparsed call (e.g., "ff(X1, xind = s)")

  4. Whitespace-normalized deparsed call

  5. Variable name (first argument, e.g., "X1" for ff(X1))

  6. Type-specific default preset

This means you can specify effects at different levels of specificity:


# Match by variable name (most common, matches level 5):
pffr_simulate(Y ~ ff(X1) + xlin,
              effects = list(X1 = "cosine", xlin = "dnorm"))

# Match by exact term label (level 1): pffr_simulate(Y ~ ff(X1, xind = s) + s(xsmoo), effects = list("ff(X1, xind = s)" = "product", "s(xsmoo)" = "sine"))

# Match by variable name with custom function (level 5): pffr_simulate(Y ~ xlin, effects = list(xlin = function(t) sin(2 * pi * t)))

Details

**Formula Interface (Recommended):** Specify a pffr-style formula and optional effect specifications:


pffr_simulate(Y ~ ff(X1) + xlin + s(xsmoo), n = 100,
              effects = list(X1 = "cosine", xlin = "dnorm"))

**Scenario Interface (Deprecated):** Scenario "all" generates data from a complex multivariate model $$Y_i(t) = \mu(t) + \int X_{1i}(s)\beta_1(s,t)ds + xlin \beta_3(t) + f(xte1, xte2) + f(xsmoo, t) + \beta_4 xconst + f(xfactor, t) + \epsilon_i(t)$$

Scenarios "int", "ff", "lin", "te", "smoo", "const", "factor" generate data from simpler models containing only the respective terms.

Sparse/irregular response trajectories can be generated by setting propmissing > 0.

Examples

Run this code
# Formula interface
dat <- pffr_simulate(Y ~ ff(X1) + xlin + s(xsmoo), n = 50,
                     effects = list(X1 = "cosine", xlin = "dnorm", xsmoo = "sine"))

# Legacy scenario interface (deprecated)
dat_legacy <- suppressWarnings(pffr_simulate(scenario = "ff", n = 50))

# Access true coefficients
truth <- attr(dat, "truth")
str(truth$beta)

Run the code above in your browser using DataLab