Learn R Programming

factree (version 0.1.0)

COR: Correlation-Based Clustering Tree

Description

Builds a binary tree for clustering time series data based on covariates. The splitting criterion minimizes the average absolute Pearson correlation between time series across child nodes.

Usage

COR(X, Y, control = list())

Value

An object of class "FACT" containing:

frame

A data frame describing the tree structure, with one row per node containing split variable, split value, test statistic, and p-value. A smaller test statistic suggests more heterogeneity between child nodes.

membership

An integer vector of length \(N\) indicating the terminal node assignment for each observation.

control

The control parameters used.

terms

Metadata including covariate names and data dimensions.

Arguments

X

A numeric matrix of covariates with dimension \(N \times p\), where \(N\) is the number of time series and \(p\) is the number of features. Each row corresponds to the covariates for one time series.

Y

A numeric matrix of time series data with dimension \(T \times N\), where \(T\) is the length of each series. Each column represents one time series.

control

A list of control parameters for tree construction:

minsplit

Minimum number of observations required to attempt a split. Default: 90.

minbucket

Minimum number of observations in any terminal node. Default: 30.

alpha

Significance level for the permutation test. Default: 0.01.

R

Number of permutations for the hypothesis test. Default: 199.

parallel

Logical; if TRUE, enables parallel computation for permutation tests. Default: FALSE.

n_cores

Number of cores for parallel processing. If NULL (default), uses detectCores() - 1.

Details

The algorithm recursively partitions the data by finding splits that minimize the average absolute correlation between time series in different child nodes. Statistical significance of each split is assessed via a permutation test.

At each node, the optimal split is found by exhaustively searching over all covariates and candidate split points. The permutation test shuffles the time series labels to generate a null distribution for the test statistic.

See Also

FACT for factor model-based clustering, gendata for generating synthetic data, print.FACT and plot.FACT for visualization.

Examples

Run this code
# Generate synthetic data
data <- gendata(seed = 42, T = 100, N = c(50, 50, 50, 50))

# Build correlation-based tree
result <- COR(data$X, data$Y, control = list(R = 99, alpha = 0.05))

# Examine results
print(result)
plot(result)
table(result$membership, data$group)

Run the code above in your browser using DataLab