CWindowCluster: Window-level time series clustering

Description

This function clusters time series at a window level, based on the Algorithm 2 of Ciampi et al. (2010).

Usage

CWindowCluster(X, Alpha = NULL, Beta = NULL, Delta = NULL, Theta = 0.8,  p, w, s, Epsilon = 1)

Arguments

a matrix of time series to be clustered (time series in columns).

Alpha

lower limit of the time series domain, passed to CSlideCluster.

Beta

upper limit of the time series domain, passed to CSlideCluster.

Delta

closeness parameter, passed to CSlideCluster.

Theta

connectivity parameter, passed to CSlideCluster.

number of layers (time series observations) in each slide.

number of slides in each window.

step to shift a window, calculated in number of slides. Recommended values are 1 (overlapping windows) or w (non-overlapping windows).

Epsilon

a real value in [0,1] used to identify each pair of time series that are clustered together over at least w*Epsilon slides within a window (see Definition 7 by Ciampi et al., 2010). Default is 1.

Value

A vector (if X contains only one window) or matrix with cluster labels for each time series (columns) and window (rows).

Details

This is the upper-level function for time series clustering. It exploits the function CSlideCluster to cluster time series within each slide based on closeness and homogeneity measures. Then, it uses slide-level cluster assignments to cluster time series within each window.

The total length of time series (number of levels, i.e., nrow(X)) should be divisible by p.

References

Ciampi, A., Appice, A. and Malerba, D. (2010) Discovering trend-based clusters in spatially distributed data streams. In International Workshop of Mining Ubiquitous and Social Environments, pages 107--122.

Examples

Run this code

#For example, weekly data come in slides of 4 weeks
p <- 4 #number of layers in each slide (data come in a slide)

#We want to analyze the trend clusters within a window of 1 year
w <- 13 #number of slides in each window
s <- w  #step to shift a window

#Simulate 26 autoregressive time series with two years of weekly data (52*2 weeks), 
#with a 'burn-in' period of 300.
N <- 26
T <- 2*p*w

set.seed(123) 
phi <- c(0.5) #parameter of autoregression
X <- sapply(1:N, function(x) arima.sim(n=T+300, 
  list(order=c(length(phi),0,0),ar=phi)))[301:(T+300),]
colnames(X) <- paste("TS", c(1:dim(X)[2]), sep="")

tmp <- CWindowCluster(X, Delta=NULL, Theta=0.8, p=p, w=w, s=s, Epsilon=1)

#Time series were simulated with the same parameters, but based on the clustering parameters,
#not all time series join the same cluster. We can plot the main cluster for each window, and 
#time series out of the cluster:
par(mfrow=c(2,2))
ts.plot(X[c(1:(p*w)),tmp[1,]==1], ylim=c(-4,4), 
  main="Time series cluster 1 in window 1")
ts.plot(X[c(1:(p*w)),tmp[1,]!=1], ylim=c(-4,4), 
  main="The rest of the time series in window 1")
ts.plot(X[c(1:(p*w))+s*p,tmp[2,]==1], ylim=c(-4,4), 
  main="Time series cluster 1 in window 2")
ts.plot(X[c(1:(p*w))+s*p,tmp[2,]!=1], ylim=c(-4,4), 
  main="The rest of the time series in window 2")

Run the code above in your browser using DataLab