sync.cluster: Time Series Clustering based on Trend Synchronism

Description

Cluster time series with a common parametric trend using the sync.test function Lyubchich_Gel_2016_synchronism,Ghahari_etal_2017_MBDCEfuntimes.

Usage

sync.cluster(formula, rate = 1, alpha = 0.05, ...)

Arguments

formula

an object of class "formula", specifying the type of common trend for clustering the time series in a \(T\) by \(N\) matrix of time series (time series in columns) which is passed to sync.test. Variable \(t\) should be used to specify the form of the trend, where \(t\) is specified within the function automatically as a regular sequence of length \(T\) on the interval (0,1]. See `Examples'.

rate

rate of removal of time series. Default is 1 (i.e., if hypothesis of synchronism is rejected one time series is removed at a time to re-test the remaining time series). Integer values above 1 are treated as number of time series to be removed. Values from 0 to 1 are treated as a fraction of time series to be removed.

alpha

significance level for testing hypothesis of a common trend (using sync.test) of the parametric form specified in formula.

...

arguments to be passed to sync.test, for example, number of bootstrap replications (B).

Value

A list with the elements:

cluster

an integer vector indicating the cluster to which each time series is allocated. A label '0' is assigned to time series which do not have a common trend with other time series (that is, all time series labeled with '0' are separate one-element clusters).

elements

a list with names of the time series in each cluster.

The further elements combine results of sync.test for each cluster with at least two elements (that is, single-element clusters labeled with '0' are excluded):

estimate

a list with common parametric trend estimates obtained by sync.test for each cluster. The length of this list is max(cluster).

pval

a list of \(p\)-values of sync.test for each cluster. The length of this list is max(cluster).

statistic

a list with values of sync.test test statistic for each cluster. The length of this list is max(cluster).

ar_order

a list of AR filter orders used in sync.test for each time series. The results are grouped by cluster in the list of length max(cluster).

window_used

a list of local windows used in sync.test for each time series. The results are grouped by cluster in the list of length max(cluster).

all_considered_windows

a list of all windows considered in sync.test and corresponding test results, for each cluster. The length of this list is max(cluster).

WAVK_obs

a list of WAVK test statistics obtained in sync.test for each time series. The results are grouped by cluster in the list of length max(cluster).

Details

The sync.cluster function recursively clusters time series having a pre-specified common parametric trend until there are no time series left. Starting with the given \(N\) time series, the sync.test function is used to test for a common trend. If null hypothesis of common trend is not rejected by sync.test, the time series are grouped together (i.e., assigned to a cluster). Otherwise, the time series with the largest contribution to the test statistics are temporarily removed (the number of time series to remove depends on the rate of removal) and sync.test is applied again. The contribution to the test statistic is assessed by the WAVK test statistic calculated for each time series.

References

Examples

Run this code

# NOT RUN {
## Simulate 4 autoregressive time series, 
## 3 having a linear trend and 1 without a trend:
set.seed(123)
T = 100 #length of time series
N = 4 #number of time series
X = sapply(1:N, function(x) arima.sim(n = T, 
           list(order = c(1, 0, 0), ar = c(0.6))))
X[,1] <- 5 * (1:T)/T + X[,1]
plot.ts(X)

# Finding clusters with common linear trends:
LinTrend <- sync.cluster(X ~ t) 
  
## Sample Output:
##[1] "Cluster labels:"
##[1] 0 1 1 1
##[1] "Number of single-element clusters (labeled with '0'): 1"

## plotting the time series of the cluster obtained
for(i in 1:max(LinTrend$cluster)) {
    plot.ts(X[, LinTrend$cluster == i], 
            main = paste("Cluster", i))
}


## Simulating 7 autoregressive time series, 
## where first 4 time series have a linear trend added 
set.seed(234)
T = 100 #length of time series
a <- sapply(1:4, function(x) -10 + 0.1 * (1:T) + 
            arima.sim(n = T, list(order = c(1, 0, 0), ar = c(0.6))))
b <- sapply(1:3, function(x) arima.sim(n = T, 
            list(order = c(1, 0, 0), ar = c(0.6))))
Y <- cbind(a, b)
plot.ts(Y)

## Clustering based on linear trend with rate of removal = 2 
# and confidence level for the synchronism test 90%
LinTrend7 <- sync.cluster(Y ~ t, rate = 2, alpha = 0.1, B = 99)
   
## Sample output:
##[1] "Cluster labels:"
##[1] 1 1 1 0 2 0 2
##[1] "Number of single-element clusters (labeled with '0'): 2"
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab