Generates synthetic time series data with a multi-group factor structure,
along with associated covariates. Useful for Monte Carlo simulation.
the FACT and COR algorithms.
gendata(
seed = 1,
T = 100,
N = c(100, 100, 100, 100),
r0 = 2,
r = c(2, 2, 2, 2),
M = 4,
sigma = 1,
p = 10,
mu = 3,
type_F = "Independent",
type_X = "Uniform",
type_noise = "Gaussian"
)A list containing:
YA \(T \times N\) numeric matrix of time series, where \(N = \sum N_m\).
XA \(N \times p\) numeric matrix of covariates.
GThe \(T \times r_0\) matrix of true global factors.
r0Number of global factors.
rVector of local factor counts per group.
groupInteger vector of length \(N\) indicating
true group membership (values 1 through M).
Integer. Random seed for reproducibility. Default: 1.
Integer. Number of time periods (rows in Y). Default: 100.
Integer vector of length M. Number of time series per group,
such that sum(N) equals the total number of series.
Default: c(100, 100, 100, 100).
Integer. Number of global factors shared across all groups.
Default: 2.
Integer vector of length M. Number of local (group-specific)
factors for each group. Default: c(2, 2, 2, 2).
Integer. Number of groups. Default: 4.
Numeric. Standard deviation of the idiosyncratic noise.
Default: 1.
Integer. Number of covariates (columns in X). Default: 10.
Numeric. Controls separation between group covariate distributions
when type_X = "Gaussian". Larger values yield better-separated groups.
Default: 3.
Character. Correlation structure for local factors:
"Independent"Local factors are independent across groups (default). Each follows an AR(1) process.
"Correlated"Local factors share a common correlation structure across groups.
Character. Distribution for generating covariates:
"Uniform"Groups differ by support on the real line (default).
"Gaussian"Groups differ by mean shifts.
Character. Distribution for idiosyncratic errors:
"Gaussian"Normal errors (default).
"t3"Heavy-tailed errors from a t-distribution with 3 degrees of freedom, scaled to have the same variance.
The data generating process follows a group factor model: $$Y_m = G \Lambda_m' + F_m \Gamma_m' + E_m, \quad m = 1, \ldots, M$$
where:
\(G\): \(T \times r_0\) matrix of global factors (shared across groups)
\(\Lambda_m\): \(N_m \times r_0\) global factor loadings for group \(m\)
\(F_m\): \(T \times r_m\) matrix of local factors for group \(m\)
\(\Gamma_m\): \(N_m \times r_m\) local factor loadings for group \(m\)
\(E_m\): \(T \times N_m\) idiosyncratic error matrix
Both global and local factors follow AR(1) processes with coefficient 0.5. Factor loadings are drawn from standard normal distributions.
FACT for building factor-augmented clustering trees,
COR for correlation-based clustering.
data <- gendata(seed = 123, T = 200, N = c(100, 50, 50, 200), r0 = 1, r = c(2, 2, 2, 3), M = 4)
Y <- data$Y
X <- data$X
Run the code above in your browser using DataLab