nsRFA (version 0.7-16)

traceWminim: Cluster analysis: disjoint regions

Description

Formation of disjoint regions for Regional Frequency Analysis.

Usage

traceWminim (X, centers)
 sumtraceW (clusters, X)
 nearest (clusters, X)

Value

traceWminim gives a vector defining the subdivision of elements characterized by X in n=centers clusters.

sumtraceW gives \(W\) (it is used by traceWminim).

nearest gives the nearest site to the centers of mass of clusters (it is used by traceWminim).

Arguments

X

a numeric matrix of characteristics, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns)

centers

the number of clusters

clusters

a numeric vector containing the subdivision of X in clusters

Details

The Euclidean distance is used. Given \(p\) different classification variables, the distance between two elements \(i\) and \(j\) is: $$d_{i j} = \sqrt{\frac{1}{p} \sum_{h=1}^{p} (x_{h i} - x_{h j})^2}$$ where \(x_{h i}\) is the value of the \(h\)-th variable of the \(i\)-th element.

The function traceWminim is a composition of a jerarchical algorithm, the Ward (1963) one, and an optimisation procedure consisting in the minimisation of: $$W = \sum_{i=1}^k \left( \sum_{j=1}^{n_i} \delta_{i j}^2 \right)$$ where \(k\) is the number of clusters (obtained initially with Ward's algorithm), \(n_i\) is the number of sites in the \(i\)-th cluster and \(\delta_{i j}\) is the Euclidean distance between the \(j\)-th element of the \(i\)-th group and the center of mass of the \(i\)-th cluster. \(W\) is calculated with sumtraceW. The algorithm consist in moving a site from one cluster to another if this makes \(W\) decrease.

See Also

roi, AD.dist.

Examples

Run this code
data(hydroSIMN)
parameters
summary(parameters)

# traceWminim
param <- parameters[c("Hm","Ybar")]
n <- dim(param)[1]; k <- dim(param)[2]
param.norm <- (param - matrix(apply(param,2,mean),nrow=n,ncol=k,
               byrow=TRUE))/matrix(apply(param,2,sd),
               nrow=n,ncol=k,byrow=TRUE)
clusters <- traceWminim(param.norm,4); 
names(clusters) <- parameters["cod"][,]
clusters

annualflows
summary(annualflows)
x <- annualflows["dato"][,]
cod <- annualflows["cod"][,]

fac <- factor(annualflows["cod"][,],
              levels=names(clusters[clusters==1]))
x1 <- annualflows[!is.na(fac),"dato"]
cod1 <- annualflows[!is.na(fac),"cod"]
#HW.tests(x1,cod1)          # it takes some time

fac <- factor(annualflows["cod"][,],
              levels=names(clusters[clusters==3]))
x3 <- annualflows[!is.na(fac),"dato"]
cod3 <- annualflows[!is.na(fac),"cod"]
#HW.tests(x3,cod3)          # it takes some time

Run the code above in your browser using DataLab