Builds a binary tree for clustering time series data based on covariates, using a group factor model framework. The splitting criterion evaluates whether child nodes exhibit distinct factor structures.
FACT(
X,
Y,
r_a = 8,
r_b = 4,
method = c("threshold", "permutation"),
control = list()
)An object of class "FACT" containing:
frameA data frame describing the tree structure, with one row per node. Includes split variable, split value, test statistic, and p-value (if applicable). A smaller test statistic indicates stronger evidence of heterogeneous factor structures between child nodes.
membershipAn integer vector of length \(N\) indicating the terminal node assignment for each observation.
controlThe control parameters used.
termsMetadata including covariate names, data dimensions,
and the values of r_a and r_b.
methodThe splitting method used.
A numeric matrix of covariates with dimension \(N \times p\), where \(N\) is the number of time series and \(p\) is the number of features. Each row corresponds to the covariates for one time series.
A numeric matrix of time series data with dimension \(T \times N\), where \(T\) is the length of each series. Each column represents one time series.
A positive integer specifying the number of singular vectors to extract from each child node for constructing the projection matrices, default is 8.
A positive integer specifying the number of leading singular values
to sum for the split statistic. Must satisfy r_b <= r_a, default is 2.
Character string specifying the splitting decision rule:
"threshold"Uses a data-adaptive threshold based on signal-to-noise ratio estimation. Faster but may be less accurate. Suitable for large datasets.
"permutation"Uses a permutation test for hypothesis testing. More rigorous but computationally intensive.
A list of control parameters for tree construction:
minsplitMinimum number of observations required to attempt
a split. Default: 90.
minbucketMinimum number of observations in any terminal node.
Default: 30.
alphaSignificance level for the permutation test
(used only when method = "permutation"). Default: 0.01.
RNumber of permutations for the hypothesis test
(used only when method = "permutation"). Default: 199.
sepControls the density of candidate split points.
If "auto" (default), subsamples candidates when \(n > 800\).
If numeric, evaluates every sep candidate point.
parallelLogical; if TRUE, enables parallel computation.
Default: FALSE.
n_coresNumber of cores for parallel processing.
If NULL (default), uses detectCores() - 1.
The FACT algorithm clusters time series by recursively partitioning them based on their underlying factor structures. At each node, the method:
Searches for the optimal split across all covariates and candidate points.
Computes a test statistic based on the overlap of factor spaces between the two child nodes.
Decides whether to split using either a threshold rule or permutation test.
Hu, J., Li, T., Luo, Z., & Wang, X. Factor-Augmented Clustering Tree for Time Series.
COR for correlation-based clustering,
gendata for generating synthetic data,
print.FACT and plot.FACT for visualization.
# \donttest{
data <- gendata(seed = 123, T = 200, N = c(50, 50, 50, 50))
tree1 <- FACT(data$X, data$Y, r_a = 8, r_b = 4, method = "threshold")
print(tree1)
# }
Run the code above in your browser using DataLab