clusters: Identify Clusters of Exceedences

Description

Identify clusters of exceedences.

Usage

clusters(data, u, r = 1, ulow = -Inf, rlow = 1, cmax = FALSE, keep.names
    = TRUE, plot = FALSE, xdata = seq(along = data), lvals = TRUE, lty =
    1, lwd = 1, pch = par("pch"), col = if(n > 250) NULL else "grey",
    xlab = "Index", ylab = "Data", ...)

Arguments

data

A numeric vector, which may contain missing values.

A single value giving the threshold, unless a time varying threshold is used, in which case u should be a vector of thresholds, typically with the same length as data (or else the usual recycling rules are applied).

A postive integer denoting the clustering interval length. By default the interval length is one.

ulow

A single value giving the lower threshold, unless a time varying lower threshold is used, in which case ulow should be a vector of lower thresholds, typically with the same length as data (or else the usual recycling

rlow

A postive integer denoting the lower clustering interval length. The lower clustering interval length is only relevant if it is less than the clustering interval length r and if there exists a lower threshold (greater than -

cmax

Logical; if FALSE (the default), a list containing the clusters of exceedences is returned. If TRUE a numeric vector containing the cluster maxima is returned.

keep.names

Logical; if FALSE, the function makes no attempt to retain the names/indices of the observations within the returned object. If data contains a large number of observations, this can make the function run much fas

plot

Logical; if TRUE a plot is given that depicts the identified clusters, and the clusters (if cmax is FALSE) or cluster maxima (if cmax is TRUE) are returned invisibly. If FA

xdata

A numeric vector with the same length as data, giving the values to be plotted on the x-axis.

lvals

Logical; should the values below the threshold and the line depicting the lower threshold be plotted?

lty, lwd

Line type and width for the lines depicting the threshold and the lower threshold.

pch

Plotting character.

col

Strips of colour col are used to identify the clusters. An observation is contained in the cluster if the centre of the corresponding plotting character is contained in the coloured strip. If col is NULL

xlab, ylab

Labels for the x and y axis.

...

Other graphics parameters.

Value

If cmax is FALSE (the default), a list with one component for each identified cluster. If cmax is TRUE, a numeric vector containing the cluster maxima. In any case, the returned object has an attribute acs, giving the average cluster size (where the cluster size is defined as the number of exceedences within a cluster), which will be NaN if there are no values above the threshold (and hence no clusters). If plot is TRUE, the list of clusters, or vector of cluster maxima, is returned invisibly.

Details

The clusters of exceedences are identified as follows. The first exceedence of the threshold initiates the first cluster. The first cluster then remains active until either r consecutive values fall below (or are equal to) the threshold, or until rlow consecutive values fall below (or are equal to) the lower threshold. The next exceedence of the threshold (if it exists) then initiates the second cluster, and so on. Missing values are allowed, in which case they are treated as falling below (or equal to) the threshold, but falling above the lower threshold.

Examples

Run this code

data(portpirie)
clusters(portpirie, 4.2, 3)
clusters(portpirie, 4.2, 3, cmax = TRUE)
clusters(portpirie, 4.2, 3, 3.8, plot = TRUE)
clusters(portpirie, 4.2, 3, 3.8, plot = TRUE, lvals = FALSE)
tvu <- c(rep(4.2, 20), rep(4.1, 25), rep(4.2, 20))
clusters(portpirie, tvu, 3, plot = TRUE)