FACT: Factor-Augmented Clustering Tree

Description

Builds a binary tree for clustering time series data based on covariates, using a group factor model framework. The splitting criterion evaluates whether child nodes exhibit distinct factor structures.

Usage

FACT(
  X,
  Y,
  r_a = 8,
  r_b = 4,
  method = c("threshold", "permutation"),
  control = list()
)

Value

An object of class "FACT" containing:

frame: A data frame describing the tree structure, with one row per node. Includes split variable, split value, test statistic, and p-value (if applicable). A smaller test statistic indicates stronger evidence of heterogeneous factor structures between child nodes.
membership: An integer vector of length \(N\) indicating the terminal node assignment for each observation.
control: The control parameters used.
terms: Metadata including covariate names, data dimensions, and the values of r_a and r_b.
method: The splitting method used.

Arguments

X

A numeric matrix of covariates with dimension \(N \times p\), where \(N\) is the number of time series and \(p\) is the number of features. Each row corresponds to the covariates for one time series.

Y

A numeric matrix of time series data with dimension \(T \times N\), where \(T\) is the length of each series. Each column represents one time series.

r_a

A positive integer specifying the number of singular vectors to extract from each child node for constructing the projection matrices, default is 8.

r_b

A positive integer specifying the number of leading singular values to sum for the split statistic. Must satisfy r_b <= r_a, default is 2.

method

Character string specifying the splitting decision rule:

"threshold": Uses a data-adaptive threshold based on signal-to-noise ratio estimation. Faster but may be less accurate. Suitable for large datasets.

"permutation"

Uses a permutation test for hypothesis testing. More rigorous but computationally intensive.

control

A list of control parameters for tree construction:

minsplit: Minimum number of observations required to attempt a split. Default: 90.

minbucket

Minimum number of observations in any terminal node. Default: 30.

alpha

Significance level for the permutation test (used only when method = "permutation"). Default: 0.01.

R

Number of permutations for the hypothesis test (used only when method = "permutation"). Default: 199.

sep

Controls the density of candidate split points. If "auto" (default), subsamples candidates when \(n > 800\). If numeric, evaluates every sep candidate point.

parallel

Logical; if TRUE, enables parallel computation. Default: FALSE.

n_cores

Number of cores for parallel processing. If NULL (default), uses detectCores() - 1.

Details

The FACT algorithm clusters time series by recursively partitioning them based on their underlying factor structures. At each node, the method:

Searches for the optimal split across all covariates and candidate points.
Computes a test statistic based on the overlap of factor spaces between the two child nodes.
Decides whether to split using either a threshold rule or permutation test.

References

Hu, J., Li, T., Luo, Z., & Wang, X. Factor-Augmented Clustering Tree for Time Series.

Examples

Run this code

# \donttest{
data <- gendata(seed = 123, T = 200, N = c(50, 50, 50, 50))
tree1 <- FACT(data$X, data$Y, r_a = 8, r_b = 4, method = "threshold")
print(tree1)
# }

Run the code above in your browser using DataLab