# gower.dist

##### Computes the Gower's Distance

This function computes the Gower's distance (dissimilarity) among units in a dataset or among observations in two distinct datasets.

- Keywords
- multivariate, cluster

##### Usage

`gower.dist(data.x, data.y=data.x, rngs=NULL, KR.corr=TRUE)`

##### Arguments

- data.x
- A matrix or a data frame containing variables that should be used in the computation of the distance.
Columns of mode
`numeric`

will be considered as interval scaled variables; columns of mode`character`

or class`fact`

- data.y
- A numeric matrix or data frame with the same variables, of the same type, as those in
`data.x`

. Dissimilarities between rows of`data.x`

and rows of`data.y`

will be computed. If not provided, by default it is assumed equa - rngs
- A vector with the ranges to scale the variables. Its length must be equal to number of variables in
`data.x`

. In correspondence of nonnumeric variables, just put 1 or`NA`

. When`rngs=NULL`

(default) the range of a numeric - KR.corr
- When
`TRUE`

(default) the extension of the Gower's dissimilarity measure proposed by Kaufman and Rousseeuw (1990) is used. Otherwise, when`KR.corr=FALSE`

, the Gower's (1971) formula is considered.

##### Details

This function computes distances among records when variables of different type (categorical and continuous) have been observed. In order to handle different types of variables, the Gower's dissimilarity coefficient (Gower, 1971) is used. By default (`KR.corr=TRUE`

) the Kaufman and Rousseeuw (1990) extension of the Gower's dissimilarity coefficient is used.

The final dissimilarity between the *i*th and *j*th unit is obtained as a weighted sum of dissimilarities for each variable:
$$d(i,j) = \frac{\sum_k{\delta_{ijk} d_{ijk}}}{\sum_k{\delta_{ijk}}}$$

In particular, $d_{ijk}$ represents the distance between the *i*th and *j*th unit computed considering the *k*th variable. It depends on the nature of the variable:

`logical`

columns are considered as asymmetric binary variables, for such case$d_{ijk}=0$if$x_{ik} = x_{jk} = \code{TRUE}$, 1 otherwise;`factor`

or`character`

columns are considered as categorical nominal variables and$d_{ijk}=0$if$x_{ik}=x_{jk}$, 1 otherwise;`numeric`

columns are considered as interval-scaled variables and$$d_{ijk}=\frac{\left|x_{ik}-x_{jk}\right|}{R_k}$$being$R_k$the range of the*k*th variable. The range is the one supplied with the argument`rngs`

(`rngs[k]`

) or the one computed on available data (when`rngs=NULL`

);`ordered`

columns are considered as categorical ordinal variables and the values are substituted with the corresponding position index,$r_{ik}$in the factor levels. When`KR.corr=FALSE`

these position indexes (that are different from the output of the R function`rank`

) are transformed in the following manner$$z_{ik}=\frac{(r_{ik}-1)}{max\left(r_{ik}\right) - 1}$$These new values,$z_{ik}$, are treated as observations of an interval scaled variable.

As far as the weight $\delta_{ijk}$ is concerned:

- $\delta_{ijk}=0$if$x_{ik} = \code{NA}$or$x_{jk} = \code{NA}$;
- $\delta_{ijk}=0$if the variable is asymmetric binary and$x_{ik}=x_{jk}=0$or$x_{ik} = x_{jk} = \code{FALSE}$;
- $\delta_{ijk}=1$in all the other cases.

In practice, `NAs`

and couple of cases with $x_{ik}=x_{jk}=\code{FALSE}$ do not contribute to distance computation.

##### Value

- A
`matrix`

object with distances among rows of`data.x`

and those of`data.y`

.

##### References

Gower, J. C. (1971), *Biometrics*, **27**, 623--637.

Kaufman, L. and Rousseeuw, P.J. (1990), *Finding Groups in Data: An Introduction to Cluster Analysis.* Wiley, New York.

##### See Also

##### Examples

```
x1 <- as.logical(rbinom(10,1,0.5))
x2 <- sample(letters, 10, replace=TRUE)
x3 <- rnorm(10)
x4 <- ordered(cut(x3, -4:4, include.lowest=TRUE))
xx <- data.frame(x1, x2, x3, x4, stringsAsFactors = FALSE)
# matrix of distances among observations in xx
gower.dist(xx)
# matrix of distances among first obs. in xx
# and the remaining ones
gower.dist(data.x=xx[1:3,], data.y=xx[4:10,])
```

*Documentation reproduced from package StatMatch, version 1.2.0, License: EUPL*

### Community examples

**pooja10838@gmail.com**at Dec 11, 2017 StatMatch v1.2.5

mat = matrix(data = c(3.4,1,2,5.2,0,3,2.1,1,1),byrow = TRUE,nrow = 3,ncol = 3 ) mat_df = as.data.frame(mat) mat_gower= gower.dist(mat_df) head(mat_gower)

**pooja10838@gmail.com**at Dec 11, 2017 StatMatch v1.2.5

mat = matrix(data = c(3.4,1,2,5.2,0,3,2.1,1,1),byrow = TRUE,nrow = 3,ncol = 3 ) mat_df = as.data.frame(mat) mat_gower= gower.dist(mat_df)