nsRFA (version 0.7-15)

traceWminim: Cluster analysis: disjoint regions

Description

Formation of disjoint regions for Regional Frequency Analysis.

Usage

traceWminim (X, centers)
 sumtraceW (clusters, X)
 nearest (clusters, X)

Arguments

X

a numeric matrix of characteristics, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns)

centers

the number of clusters

clusters

a numeric vector containing the subdivision of X in clusters

Value

traceWminim gives a vector defining the subdivision of elements characterized by X in n=centers clusters.

sumtraceW gives \(W\) (it is used by traceWminim).

nearest gives the nearest site to the centers of mass of clusters (it is used by traceWminim).

Details

The Euclidean distance is used. Given \(p\) different classification variables, the distance between two elements \(i\) and \(j\) is: $$d_{i j} = \sqrt{\frac{1}{p} \sum_{h=1}^{p} (x_{h i} - x_{h j})^2}$$ where \(x_{h i}\) is the value of the \(h\)-th variable of the \(i\)-th element.

The function traceWminim is a composition of a jerarchical algorithm, the Ward (1963) one, and an optimisation procedure consisting in the minimisation of: $$W = \sum_{i=1}^k \left( \sum_{j=1}^{n_i} \delta_{i j}^2 \right)$$ where \(k\) is the number of clusters (obtained initially with Ward's algorithm), \(n_i\) is the number of sites in the \(i\)-th cluster and \(\delta_{i j}\) is the Euclidean distance between the \(j\)-th element of the \(i\)-th group and the center of mass of the \(i\)-th cluster. \(W\) is calculated with sumtraceW. The algorithm consist in moving a site from one cluster to another if this makes \(W\) decrease.

See Also

roi, AD.dist.

Examples

Run this code
# NOT RUN {
data(hydroSIMN)
parameters
summary(parameters)

# traceWminim
param <- parameters[c("Hm","Ybar")]
n <- dim(param)[1]; k <- dim(param)[2]
param.norm <- (param - matrix(apply(param,2,mean),nrow=n,ncol=k,
               byrow=TRUE))/matrix(apply(param,2,sd),
               nrow=n,ncol=k,byrow=TRUE)
clusters <- traceWminim(param.norm,4); 
names(clusters) <- parameters["cod"][,]
clusters

annualflows
summary(annualflows)
x <- annualflows["dato"][,]
cod <- annualflows["cod"][,]

fac <- factor(annualflows["cod"][,],
              levels=names(clusters[clusters==1]))
x1 <- annualflows[!is.na(fac),"dato"]
cod1 <- annualflows[!is.na(fac),"cod"]
#HW.tests(x1,cod1)          # it takes some time

fac <- factor(annualflows["cod"][,],
              levels=names(clusters[clusters==3]))
x3 <- annualflows[!is.na(fac),"dato"]
cod3 <- annualflows[!is.na(fac),"cod"]
#HW.tests(x3,cod3)          # it takes some time
# }

Run the code above in your browser using DataLab