Learn R Programming

dbrobust (version 1.0.0)

dist_mixed: Compute Gower dissimilarity for mixed-type data

Description

Internal helper function to compute pairwise dissimilarities for datasets containing a mix of continuous, binary, and categorical variables using Gower's method gower1971generaldbrobust.

Usage

dist_mixed(
  x,
  continuous_cols = NULL,
  binary_cols = NULL,
  categorical_cols = NULL,
  binary_asym = FALSE
)

Value

A symmetric numeric matrix of pairwise dissimilarities in [0,1].

Arguments

x

A data frame with rows as observations and columns as variables.

continuous_cols

Optional numeric indices or column names for continuous variables.

binary_cols

Optional numeric indices or column names for binary variables.

categorical_cols

Optional numeric indices or column names for categorical/multiclass variables.

binary_asym

Logical; if TRUE, binary variables are treated as asymmetric (only 1/1 counts as match).

Details

Continuous, binary, and categorical columns can be automatically detected, or explicitly specified by the user via continuous_cols, binary_cols, and categorical_cols.

  • Continuous, binary, and categorical columns are combined into a single dissimilarity measure following Gower's approach.

  • Continuous variables are scaled by their range.

  • Binary variables can be treated as symmetric (0/0 and 1/1 count as match) or asymmetric (only 1/1 counts as match).

  • Categorical variables are compared using simple matching.

  • Missing values are ignored pairwise.

Advantages:

  • Low computational cost.

  • Works naturally with mixed-type data.

Limitations:

  • Neglects potential correlations among quantitative variables.

  • Sensitive to outliers, which can affect robustness.

  • May overemphasize categorical differences in mixed-data settings.

References

gower1971generaldbrobust

Examples

Run this code
# Small example: Compute classical Gower for a simulated data frame
df <- data.frame(
  height = c(170, 160, 180),
  gender = factor(c("M", "F", "M")),
  smoker = c(1, 0, 1)
)

# Compute Gower dissimilarities automatically detecting types
dbrobust::dist_mixed(df)

# Manual type specification
cont_cols <- "height"
cat_cols <- NULL
bin_cols <- c("gender","smoker")
dbrobust::dist_mixed(
  x = df,
  continuous_cols = cont_cols,
  categorical_cols = cat_cols,
  binary_cols = bin_cols
)

Run the code above in your browser using DataLab