gendata: Generate Synthetic Group Factor Model Data

Description

Generates synthetic time series data with a multi-group factor structure, along with associated covariates. Useful for Monte Carlo simulation. the FACT and COR algorithms.

Usage

gendata(
  seed = 1,
  T = 100,
  N = c(100, 100, 100, 100),
  r0 = 2,
  r = c(2, 2, 2, 2),
  M = 4,
  sigma = 1,
  p = 10,
  mu = 3,
  type_F = "Independent",
  type_X = "Uniform",
  type_noise = "Gaussian"
)

Value

A list containing:

Y: A $T \times N$ numeric matrix of time series, where $N = \sum N_m$.
X: A $N \times p$ numeric matrix of covariates.
G: The $T \times r_0$ matrix of true global factors.
r0: Number of global factors.
r: Vector of local factor counts per group.
group: Integer vector of length $N$ indicating true group membership (values 1 through M).

Arguments

seed

Integer. Random seed for reproducibility. Default: 1.

T

Integer. Number of time periods (rows in Y). Default: 100.

N

Integer vector of length M. Number of time series per group, such that sum(N) equals the total number of series. Default: c(100, 100, 100, 100).

r0

Integer. Number of global factors shared across all groups. Default: 2.

r

Integer vector of length M. Number of local (group-specific) factors for each group. Default: c(2, 2, 2, 2).

M

Integer. Number of groups. Default: 4.

sigma

Numeric. Standard deviation of the idiosyncratic noise. Default: 1.

p

Integer. Number of covariates (columns in X). Default: 10.

mu

Numeric. Controls separation between group covariate distributions when type_X = "Gaussian". Larger values yield better-separated groups. Default: 3.

type_F

Character. Correlation structure for local factors:

"Independent": Local factors are independent across groups (default). Each follows an AR(1) process.

"Correlated"

Local factors share a common correlation structure across groups.

type_X

Character. Distribution for generating covariates:

"Uniform": Groups differ by support on the real line (default).

"Gaussian"

Groups differ by mean shifts.

type_noise

Character. Distribution for idiosyncratic errors:

"Gaussian": Normal errors (default).

"t3"

Heavy-tailed errors from a t-distribution with 3 degrees of freedom, scaled to have the same variance.

Details

The data generating process follows a group factor model: $$Y_m = G \Lambda_m' + F_m \Gamma_m' + E_m, \quad m = 1, \ldots, M$$

where:

$G$: $T \times r_0$ matrix of global factors (shared across groups)
$\Lambda_m$: $N_m \times r_0$ global factor loadings for group $m$
$F_m$: $T \times r_m$ matrix of local factors for group $m$
$\Gamma_m$: $N_m \times r_m$ local factor loadings for group $m$
$E_m$: $T \times N_m$ idiosyncratic error matrix

Both global and local factors follow AR(1) processes with coefficient 0.5. Factor loadings are drawn from standard normal distributions.

Examples

Run this code

data <- gendata(seed = 123, T = 200, N = c(100, 50, 50, 200), r0 = 1, r = c(2, 2, 2, 3), M = 4)
Y <- data$Y
X <- data$X