Learn R Programming

SpatialVx (version 0.1-7)

clusterer: Cluster Analysis Verification

Description

Perform Cluster Analysis (CA) verifcation per Marzban and Sandgathe (2006).

Usage

clusterer(X, Y = NULL, ...)

## S3 method for class 'default': clusterer(X, Y = NULL, ..., xloc = NULL, xyp = TRUE, threshold = 1e-08, linkage.method = "complete", stand = TRUE, trans = "identity", a = NULL, verbose = FALSE)

## S3 method for class 'SpatialVx': clusterer(X, Y = NULL, ..., time.point = 1, model = 1, xyp = TRUE, threshold = 1e-08, linkage.method = "complete", stand = TRUE, trans = "identity", verbose = FALSE)

## S3 method for class 'clusterer': plot(x, ..., set.pw = FALSE, icol = c("gray", tim.colors(64)), horizontal = FALSE, loc.byrow = TRUE)

## S3 method for class 'summary.clusterer': plot(x, ...)

## S3 method for class 'clusterer': print(x, ...)

## S3 method for class 'clusterer': summary(object, ...)

Arguments

X,Y
clusterer default method, these are m by n matrices giving the verification and forecast fields, resp.

SpatialVx method function, X is an object of class SpatialVx and Y is not used

object,x
list object of class clusterer as returned by clusterer (or summary.clusterer in the case of plot.summary.clusterer).
xloc
(optional) numeric mn by 2 matrix giving the gridpoint locations. If NULL, this will be created using 1:m and 1:n.
xyp
logical, should the cluster analysis be performed on the locations and intensities (TRUE) or only the locations (FALSE)?
threshold
numeric of length one or two giving the threshold to apply to each field (>=). If length is two, the first value corresponds to the threshold for the verification field, and the second to the foreast field.
linkage.method
character naming a valid linkage method accepted by hclust.
stand
logical, should the data matrices consisting of xloc and each field first be standardized before performing cluster analysis?
trans
character naming a function to be applied to the field intensities before performing the CA. Only used if xyp is TRUE. Default applies no transformation.
time.point
numeric or character indicating which time point from the SpatialVx verification set to select for analysis.
model
numeric indicating which forecast model to select for the analysis.
a
(optional) list giving object attributes associated with a SpatialVx class object. The clusterer method for SpatialVx objects calls the default method function, and uses this argument to pass the attributes
set.pw
logical, should a panel of plots be determined and set by the function.
icol
color vector for image plots of fields after applying the threshold(s).
horizontal
logical, should the image plot color legend be placed horizontally or vertically? Only for image plot sof the fields.
loc.byrow
logical, only used if field is a projection, this determines how the locations should be put into matrices.
verbose
logical, should progress information be printed to the screen?
...
optional arguments to the hclust function. In the case of the summary method function, z and/or sigma giving a numeric value used to find the cut-off given by median + z*sigma for detemining matched obj

Value

  • A list object of class clusterer is returned with components:
  • linkage.methodcharacter vector of length one or two giving the linkage method as passed into the function. The length is two only if the McQuitty method is chosen in which case this method is used for the CA, but not for the inter-cluster differencs across fields (average is used for that instead).
  • transcharacter naming the transformation function applied to the intensities.
  • Nnumeric giving the size of the fields.
  • thresholdnumeric of length two giving the threshold applied to each field.
  • NCo,NCfnumeric vectors giving the number of clusters at each iteration of the CA for the verification and forecast fields, resp.
  • cluster.identifiersa list with components X and Y giving lists of lists identifying specific CA components at each level of the CA for both fields.
  • idX,idYlogical vectors describing which grid points were included in the CA for each field (i.e., which grid points were >= threshold and had non-missing values).
  • cluster.objectsa list with components X and Y giving the objects returned by hclust for each field.
  • inter.cluster.dista list of list objects with NCf by NCo matrix components giving the inter-cluster distances (between verification and forecast fields) for each iteration of CA for each field.
  • min.intercluster.distsnumeric vector givng the minimum values inter.cluster.dist at each iteration. Used to determine the cut-off for matched objects.
  • The summary method function returns a list with the same components as above, but also the components:
  • cutoffThe cut-off value used for determining matches.
  • csi,AvgErrNCo by NCf numeric matrix giving the critical success index (CSI) and average intercluster error (distance) based on matched/un-matched objects.
  • HMFNCo by NCf by 3 array giving the hits, misses and false alarms based on matched/un-matched objects.
  • If the argument a is not NULL, then these are returned as attributes of the returned object. In the case of SpatialVx objects, the attributes are preserved.

    plot and print methods do not return anything.

Warning

Although some effort has been put into making the functions in this package as computationally efficient as possible, there is a lot of bookeeping involved with this approach, and the current functions are probably not as efficient as they could be. In any case, they will likely be slow for large data sets. The function can work quickly on large fields if an adequately high threshold is used (e.g., if threshold=10 is replaced for 16 in the not run example below, the function is VERY slow). Performing the actual cluster analysis on each field is fast because the hclust function from the fastcluster package is used, which works very well. However, bookeeping after the CA is done employs a lot of loops within loops, which possibly can be made more efficient (and maybe someday will be), but for now...

If it is desired to simply look at the CA for the two fields, the function hclust from fastcluster can be used, which essentially replaces the hclust function from the stats package with a faster version, but otherwise operates the same as far as what is returned, etc., and the same method functions can be employed.

Details

This function performs cluster analysis (CA) on positive values from each of two fields in a verification set using the hclust function from package fastcluster. Inter-cluster distances are computed between each cluster of each field at every level of the CA. The function clusterer performs CA on both fields, and finds the inter-cluster distances across fields for every possible combination of objects at each iteration of each CA. The summary method function finishes the analysis by determining hits, misses and false alarms as well as the numbers of clusters. It also computes CSI for each number of cluster combinations. This is the verification approach described in Marzban and Sandgathe (2006).

The plot method function creates a 4 by 2 panel of plots. The top two plots give image plots of the verification and forecast fields with grid points below the threshold(s) showing zero. The next two plots are dendrograms as performed by the plot method function for hclust (dendrogram) objects. The next row gives a histogram of the minimum inter-cluster distances, then box plots showing the hits, misses and false alarms for every possible combination of levels of each CA. Finally, the bottom two plots show, for each combination of CA level (i.e., numbers of clusters), the CSI and average error (inter-cluster distance) for all matched objects. These last three plots are the ones made by the plot method for values returned from the summary method function.

print is currently not very useful here, but it prevents printing a big mess to the screen.

References

Marzban, C. and Sandgathe, S. (2006) Cluster analysis for verification of precipitation fields. Wea. Forecasting, 21, 824--838.

See Also

hclust, hclust, as.dendrogram, cutree, make.SpatialVx, CSIsamples

Examples

Run this code
data(UKobs6)
data(UKfcst6)
look <- clusterer(X=UKobs6, Y=UKfcst6, threshold=16, trans="log", verbose=TRUE)
plot(look, set.pw=TRUE)

data(UKloc)

# Now, do the same thing, but using a "SpatialVx" object.
hold <- make.SpatialVx(UKobs6, UKfcst6, loc=UKloc, map=TRUE,
    field.type="Rainfall", units="mm/h",
    data.name=c("Nimrod", "obs 6", "fcst 6"))

look2 <- clusterer(hold, threshold=16, trans="log", verbose=TRUE)
plot(look2, set.pw=TRUE)
# Note that values differ because now we're using the
# actual locations instead of integer indicators of
# positions.

Run the code above in your browser using DataLab