# diss.NCD

##### Normalized Compression Distance

Computes the distance based on the sizes of the compressed time series.

##### Usage

`diss.NCD(x, y, type = "min")`

##### Arguments

- x
Numeric vector containing the first of the two time series.

- y
Numeric vector containing the second of the two time series.

- type
Character string, the type of compression. May be abbreviated to a single letter, defaults to the first of the alternatives.

##### Details

The compression based dissimilarity is calculated: $$ d(x,y) = C(xy) - max(C(x),C(y))/ min(C(x),C(y)) $$ where \(C(x)\), \(C(y)\) are the sizes in bytes of the compressed series \(x\) and \(y\).
\(C(xy)\) is the size in bytes of the series \(x\) and \(y\) concatenated. The algorithm used for compressing the series is chosen with `type`

.
`type`

can be "gzip", "bzip2" or "xz", see `memCompress`

. "min" selects the best separately for `x`

, `y`

and the concatenation.
Since the compression methods are character-based, a symbolic representation can be used, see details for an example using SAX as the symbolic representation.
The series are transformed to a text representation prior to compression using `as.character`

, so small numeric differences may produce significantly different text representations.
While this dissimilarity is asymptotically symmetric, for short series the differences between `diss.NCD(x,y)`

and `diss.NCD(y,x)`

may be noticeable.

##### Value

The computed distance.

##### References

Cilibrasi, R., & Vit<U+00E1>nyi, P. M. (2005). Clustering by compression. *Information Theory, IEEE Transactions on*, **51(4)**, 1523-1545.

Keogh, E., Lonardi, S., & Ratanamahatana, C. A. (2004). Towards parameter-free data mining. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 206-215).

Montero, P and Vilar, J.A. (2014) *TSclust: An R Package for Time Series Clustering.* Journal of Statistical Software, 62(1), 1-43. http://www.jstatsoft.org/v62/i01/.

##### See Also

##### Examples

```
# NOT RUN {
n = 50
x <- rnorm(n) #generate sample series, white noise and a wiener process
y <- cumsum(rnorm(n))
diss.NCD(x, y)
z <- rnorm(n)
w <- cumsum(rnorm(n))
series = rbind(x, y, z, w)
diss(series, "NCD", type="bzip2")
################################################################
#####symbolic representation prior to compression, using SAX####
####simpler symbolization, such as round() could also be used###
################################################################
#normalization function, required for SAX
z.normalize = function(x) {
(x - mean(x)) / sd(x)
}
sx <- convert.to.SAX.symbol( z.normalize(x), alpha=4 )
sy <- convert.to.SAX.symbol( z.normalize(y), alpha=4 )
sz <- convert.to.SAX.symbol( z.normalize(z), alpha=4 )
sw <- convert.to.SAX.symbol( z.normalize(w), alpha=4 )
diss(rbind(sx, sy, sz, sw), "NCD", type="bzip2")
# }
```

*Documentation reproduced from package TSclust, version 1.2.4, License: GPL-2*