gowdis measures the Gower (1971) dissimilarity for mixed variables, including asymmetric binary variables. Variable weights can be specified. gowdis implements Podani's (1999) extension to ordinal variables.
gowdis(x, w, asym.bin = NULL, ord = c("podani", "metric", "classic"))vector listing the weights for the variables in x. Can be missing, in which case all variables have equal weights.
vector listing the asymmetric binary variables in x.
character string specifying the method to be used for ordinal variables (i.e. ordered). "podani" refers to Eqs. 2a-b of Podani (1999), while "metric" refers to his Eq. 3 (see ‘details’); both options convert ordinal variables to ranks. "classic" simply treats ordinal variables as continuous variables. Can be abbreviated.
an object of class dist with the following attributes: Labels, Types (the variable types, where 'C' is continuous/numeric, 'O' is ordinal, 'B' is symmetric binary, 'A' is asymmetric binary, and 'N' is nominal), Size, Metric.
gowdis computes the Gower (1971) similarity coefficient exactly as described by Podani (1999), then converts it to a dissimilarity coefficient by using \(D = 1 - S\). It integrates variable weights as described by Legendre and Legendre (1998).
Let \(\mathbf{X} = \{x_{ij}\} \) be a matrix containing \(n\) objects (rows) and \(m\) columns (variables). The similarity \(G_{jk}\) between objects \(j\) and \(k\) is computed as
$$G_{jk} = \frac{\sum_{i=1}^{n} w_{ijk} s_{ijk}}{\sum_{i=1}^{n} w_{ijk}}$$,
where \(w_{ijk}\) is the weight of variable \(i\) for the \(j\)-\(k\) pair, and \(s_{ijk}\) is the partial similarity of variable \(i\) for the \(j\)-\(k\) pair,
and where \(w_{ijk} = 0\) if objects \(j\) and \(k\) cannot be compared because \(x_{ij}\) or \(x_{ik}\) is unknown (i.e. NA).
For binary variables, \(s_{ijk} = 0\) if \(x_{ij} \neq x_{ik}\), and \(s_{ijk} = 1\) if \(x_{ij} = x_{ik} = 1\) or if \(x_{ij} = x_{ik} = 0\).
For asymmetric binary variables, same as above except that \(w_{ijk} = 0\) if \(x_{ij} = x_{ik} = 0\).
For nominal variables, \(s_{ijk} = 0\) if \(x_{ij} \neq x_{ik}\) and \(s_{ijk} = 1\) if \(x_{ij} = x_{ik}\).
For continuous variables,
$$s_{ijk} = 1 - \frac{|x_{ij} - x_{ik}|} {x_{i.max} - x_{i.min}} $$
where \(x_{i.max}\) and \(x_{i.min}\) are the maximum and minimum values of variable \(i\), respectively.
For ordinal variables, when ord = "podani" or ord = "metric", all \(x_{ij}\) are replaced by their ranks \(r_{ij}\) determined over all objects (such that ties are also considered), and then
if ord = "podani"
\(s_{ijk} = 1\) if \(r_{ij} = r_{ik}\), otherwise
$$ s_{ijk} = 1 - \frac{|r_{ij} - r_{ik}| - (T_{ij} - 1)/2 - (T_{ik} - 1)/2 }{r_{i.max} - r_{i.min} - (T_{i.max} - 1)/2 - (T_{i.min}-1)/2 }$$
where \(T_{ij}\) is the number of objects which have the same rank score for variable \(i\) as object \(j\) (including \(j\) itself), \(T_{ik}\) is the number of objects which have the same rank score for variable \(i\) as object \(k\) (including \(k\) itself), \(r_{i.max}\) and \(r_{i.min}\) are the maximum and minimum ranks for variable \(i\), respectively, \(T_{i,max}\) is the number of objects with the maximum rank, and \(T_{i.min}\) is the number of objects with the minimum rank.
if ord = "metric"
$$s_{ijk} = 1 - \frac{|r_{ij} - r_{ik}|}{r_{i.max} - r_{i.min}} $$
When ord = "classic", ordinal variables are simply treated as continuous variables.
Gower, J. C. (1971) A general coefficient of similarity and some of its properties. Biometrics 27:857-871.
Legendre, P. and L. Legendre (1998) Numerical Ecology. 2nd English edition. Amsterdam: Elsevier.
Podani, J. (1999) Extending Gower's general coefficient of similarity to ordinal characters. Taxon 48:331-340.
daisy is similar but less flexible, since it does not include variable weights and does not treat ordinal variables as described by Podani (1999). Using ord = "classic" reproduces the behaviour of daisy.
# NOT RUN {
ex1 <- gowdis(dummy$trait)
ex1
# check attributes
attributes(ex1)
# to include weights
w <- c(4,3,5,1,2,8,3,6)
ex2 <- gowdis(dummy$trait, w)
ex2
# variable 7 as asymmetric binary
ex3 <- gowdis(dummy$trait, asym.bin = 7)
ex3
# example with trait data from New Zealand vascular plant species
ex4 <- gowdis(tussock$trait)
# }
Run the code above in your browser using DataLab