Learn R Programming

CluMix (version 2.3.1)

similarity.variables: Similarity matrix for variables

Description

Get similarity matrix for variables of mixed types

Usage

similarity.variables(data, method = c("associationMeasures", "distcor"), 
associationFun = association, check.psd = TRUE, make.psd = TRUE)

Arguments

data

data frame with variables of interest

method

method to calculate distances: combination of association measures ('associationMeasures') or distance correlation ('distcor')

associationFun

only applies if method = 'associationMeasures': appropriate association measures are chosen for each pair of variables, see association for details. But the user can also define a function that for any two variables calculates a similarity measure.

check.psd

only applies if method = 'associationMeasures': if TRUE, it is checked if the variable's similarity matrix S is positive semi-definite (p.s.d.), and if not it is transformed to a p.s.d. one by nearPD.

make.psd

only applies if method = 'associationMeasures': if TRUE, and if the similarity matrix is not positive semi-definite, it is transformed to a p.s.d. one by nearPD. Ignored if check.psd = FALSE

Value

Matrix of similarity values for each pair of variables

Details

A similarity matrix for variables can be derived by combining different measures of association or by a distance correlation approach. For the association measure approach, for each pair of variables, similarity coefficients s_ij are calculated, see association for details. If the similarity matrix is (made) positive semi-definite, distances d_ij = sqrt(1 - s_ij) have metric properties (Gower, 1971), which means for instance that the triangular inequality holds. The distance correlation approach uses generalized distance correlation based on Gower's similarity coefficient between sample elements.

References

Hummel M, Edelmann D, Kopp-Schneider A (2017). Clustering of samples and variables with mixed-type data. PLOS ONE, 12(11):e0188274.

Gower J (1971). A general coefficient of similarity and some of its properties. Biometrics, 27:857-871.

Szekely GJ, Rizzo ML, Bakirov NK (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35.6:2769-2794.

Lyons R (2013). Distance covariance in metric spaces. The Annals of Probability, 41.5:3284-3305.

See Also

association, dist.variables, dendro.variables, dist.subjects, mix.heatmap

Examples

Run this code
# NOT RUN {
data(mixdata)

S1 <- similarity.variables(mixdata)
S2 <- similarity.variables(mixdata, method="distcor")
# }

Run the code above in your browser using DataLab