e.agglo: ENERGY AGGLOMERATIVE

Description

An agglomerative hierarchical estimation algorithm for multiple change point analysis.

Usage

e.agglo(X, member=1:nrow(X), alpha=1, penalty=function(cps){0})

Arguments

A T x d matrix containing the length T time series with d-dimensional observations.

member

Initial membership vector for the time series.

alpha

Moment index used for determining the distance between and within clusters.

penalty

Function used to penalize the obtained goodness-of-fit statistics. This function takes as its input a vector of change point locations (cps).

Value

Returns a list with the following components.
mergedA (T-1) x 2 matrix indicating which segments were merged at each step of the agglomerative procedure.
fitVector showing the progression of the penalized goodness-of-fit statistic.
progressionA T x (T+1) matrix showing the progression of the set of change points.
clusterThe estimated cluster membership vector.
estimatesThe location of the estimated change points.

Details

Homogeneous clusters are created based on the initial clustering provided by the member argument. In each iteration, clusters are merged so as to maximize a goodness-of-fit statistic. The computational complexity of this method is O(T^2), where T is the number of observations.

References

Matteson D.S., James N.A. (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.

Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"

Rizzo M.L., Szekely G.L. (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification. pp. 151 - 183.

Rizzo M.L., Szekely G.L. (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics. pp. 1034 - 1055.

Examples

Run this code

set.seed(100)
mem = rep(c(1,2,3,4),times=c(10,10,10,10))
x = as.matrix(c(rnorm(10,0,1),rnorm(20,2,1),rnorm(10,-1,1)))
y = e.agglo(X=x,member=mem,alpha=1,penalty=function(cp,Xts) 0)
y$estimates


# Multivariate spatio-temporal example
# You will need the following packages:
#	mvtnorm, combinat, and MASS
library(mvtnorm); library(combinat); library(MASS)
set.seed(2013)
lambda = 1500 #overall arrival rate per unit time
muA = c(-7,-7) ; muB = c(0,0) ; muC = c(5.5,0)
covA = 25*diag(2) ; covB = matrix(c(9,0,0,1),2) ; covC = matrix(c(9,.9,.9,9),2)
time.interval = matrix(c(0,1,3,4.5,1,3,4.5,7),4,2)
#mixing coefficents
mixing.coef = rbind(c(1/3,1/3,1/3),c(.2,.5,.3), c(.35,.3,.35), c(.2,.3,.5))
stppData = NULL
for(i in 1:4){
	count = rpois(1, lambda* diff(time.interval[i,]))
	Z = rmultz2(n = count, p = mixing.coef[i,])
	S = rbind(rmvnorm(Z[1],muA,covA), rmvnorm(Z[2],muB,covB),rmvnorm(Z[3],muC,covC))
	X = cbind(rep(i,count), runif(n = count, time.interval[i,1], time.interval[i,2]), S)
	stppData = rbind(stppData, X[order(X[,2]),])
}
member = as.numeric(cut(stppData[,2], breaks = seq(0,7,by=1/12)))
output = e.agglo(X=stppData[,3:4],member=member,alpha=1,penalty=function(cp,Xts) 0)

Run the code above in your browser using DataLab