Learn R Programming

factree (version 0.1.0)

gendata: Generate Synthetic Group Factor Model Data

Description

Generates synthetic time series data with a multi-group factor structure, along with associated covariates. Useful for Monte Carlo simulation. the FACT and COR algorithms.

Usage

gendata(
  seed = 1,
  T = 100,
  N = c(100, 100, 100, 100),
  r0 = 2,
  r = c(2, 2, 2, 2),
  M = 4,
  sigma = 1,
  p = 10,
  mu = 3,
  type_F = "Independent",
  type_X = "Uniform",
  type_noise = "Gaussian"
)

Value

A list containing:

Y

A \(T \times N\) numeric matrix of time series, where \(N = \sum N_m\).

X

A \(N \times p\) numeric matrix of covariates.

G

The \(T \times r_0\) matrix of true global factors.

r0

Number of global factors.

r

Vector of local factor counts per group.

group

Integer vector of length \(N\) indicating true group membership (values 1 through M).

Arguments

seed

Integer. Random seed for reproducibility. Default: 1.

T

Integer. Number of time periods (rows in Y). Default: 100.

N

Integer vector of length M. Number of time series per group, such that sum(N) equals the total number of series. Default: c(100, 100, 100, 100).

r0

Integer. Number of global factors shared across all groups. Default: 2.

r

Integer vector of length M. Number of local (group-specific) factors for each group. Default: c(2, 2, 2, 2).

M

Integer. Number of groups. Default: 4.

sigma

Numeric. Standard deviation of the idiosyncratic noise. Default: 1.

p

Integer. Number of covariates (columns in X). Default: 10.

mu

Numeric. Controls separation between group covariate distributions when type_X = "Gaussian". Larger values yield better-separated groups. Default: 3.

type_F

Character. Correlation structure for local factors:

"Independent"

Local factors are independent across groups (default). Each follows an AR(1) process.

"Correlated"

Local factors share a common correlation structure across groups.

type_X

Character. Distribution for generating covariates:

"Uniform"

Groups differ by support on the real line (default).

"Gaussian"

Groups differ by mean shifts.

type_noise

Character. Distribution for idiosyncratic errors:

"Gaussian"

Normal errors (default).

"t3"

Heavy-tailed errors from a t-distribution with 3 degrees of freedom, scaled to have the same variance.

Details

The data generating process follows a group factor model: $$Y_m = G \Lambda_m' + F_m \Gamma_m' + E_m, \quad m = 1, \ldots, M$$

where:

  • \(G\): \(T \times r_0\) matrix of global factors (shared across groups)

  • \(\Lambda_m\): \(N_m \times r_0\) global factor loadings for group \(m\)

  • \(F_m\): \(T \times r_m\) matrix of local factors for group \(m\)

  • \(\Gamma_m\): \(N_m \times r_m\) local factor loadings for group \(m\)

  • \(E_m\): \(T \times N_m\) idiosyncratic error matrix

Both global and local factors follow AR(1) processes with coefficient 0.5. Factor loadings are drawn from standard normal distributions.

See Also

FACT for building factor-augmented clustering trees, COR for correlation-based clustering.

Examples

Run this code
data <- gendata(seed = 123, T = 200, N = c(100, 50, 50, 200), r0 = 1, r = c(2, 2, 2, 3), M = 4)
Y <- data$Y
X <- data$X

Run the code above in your browser using DataLab