dist_mixed: Compute Gower dissimilarity for mixed-type data

Description

Internal helper function to compute pairwise dissimilarities for datasets containing a mix of continuous, binary, and categorical variables using Gower's method gower1971generaldbrobust.

Usage

dist_mixed(
  x,
  continuous_cols = NULL,
  binary_cols = NULL,
  categorical_cols = NULL,
  binary_asym = FALSE
)

Value

A symmetric numeric matrix of pairwise dissimilarities in [0,1].

Arguments

x: A data frame with rows as observations and columns as variables.
continuous_cols: Optional numeric indices or column names for continuous variables.
binary_cols: Optional numeric indices or column names for binary variables.
categorical_cols: Optional numeric indices or column names for categorical/multiclass variables.
binary_asym: Logical; if TRUE, binary variables are treated as asymmetric (only 1/1 counts as match).

Details

Continuous, binary, and categorical columns can be automatically detected, or explicitly specified by the user via continuous_cols, binary_cols, and categorical_cols.

Continuous, binary, and categorical columns are combined into a single dissimilarity measure following Gower's approach.
Continuous variables are scaled by their range.
Binary variables can be treated as symmetric (0/0 and 1/1 count as match) or asymmetric (only 1/1 counts as match).
Categorical variables are compared using simple matching.
Missing values are ignored pairwise.

Advantages:

Low computational cost.
Works naturally with mixed-type data.

Limitations:

Neglects potential correlations among quantitative variables.
Sensitive to outliers, which can affect robustness.
May overemphasize categorical differences in mixed-data settings.

References

gower1971generaldbrobust

Examples

Run this code

# Small example: Compute classical Gower for a simulated data frame
df <- data.frame(
  height = c(170, 160, 180),
  gender = factor(c("M", "F", "M")),
  smoker = c(1, 0, 1)
)

# Compute Gower dissimilarities automatically detecting types
dbrobust::dist_mixed(df)

# Manual type specification
cont_cols <- "height"
cat_cols <- NULL
bin_cols <- c("gender","smoker")
dbrobust::dist_mixed(
  x = df,
  continuous_cols = cont_cols,
  categorical_cols = cat_cols,
  binary_cols = bin_cols
)

Run the code above in your browser using DataLab