Learn R Programming

GAPR (version 0.1.0)

computeProximity: Compute Proximity Matrix

Description

This function takes a numeric matrix and computes a square proximity matrix (similarity or distance) based on a specified method.

Usage

computeProximity(data, proxType, side, isContainMissingValue)

Value

A square matrix representing the proximity between rows or columns, depending on the selected side.

Arguments

data

A numeric matrix with n rows and p columns. Each row typically represents an observation.

proxType

An integer specifying the type of proximity measure to use.

side

An integer indicating the direction for computing proximity.

isContainMissingValue

An integer indicating whether the input data contains missing values.

Details

proxType

Available proxType options include:

  • 0: Euclidean

  • 1: Pearson correlation

  • 2: Kendall correlation

  • 3: Spearman correlation

  • 4: Adjusted tangent correlation (atancorr)

  • 5: City-block (Manhattan) distance

  • 6: Absolute Pearson correlation

  • 7: Uncentered correlation

  • 8: Absolute uncentered correlation

  • 20: Hamman similarity (binary)

  • 21: Jaccard index (binary)

  • 22: Phi coefficient (binary)

  • 23: Rao coefficient (binary)

  • 24: Rogers-Tanimoto similarity (binary)

  • 25: Simple matching coefficient (binary)

  • 26: Sneath coefficient (binary)

  • 27: Yule's Q (binary)

Ensure the data type matches the selected method. For example, binary methods should only be used on binary (0/1) data.

side

Use 0 for row-wise proximity and 1 for column-wise proximity.

isContainMissingValue

Set to 1 if the input data includes missing values; otherwise, use 0.

Examples

Run this code
# =======================
# Example 1: Crabs dataset with distance method (Euclidean distance)
# =======================
# Step 1: Compute proximity matrix
if (requireNamespace("MASS", quietly = TRUE)) {
  df_crabs <- as.matrix(MASS::crabs[, -c(1:3)])  # Use continuous variables only
  row_prox_crabs <- computeProximity(
    data = df_crabs,
    proxType = 0,               # 0 = Euclidean distance
    side = 0,                   # 0 = row-wise proximity
    isContainMissingValue = 0
  )

  # Step 2: Obtain R2E ordering
  r2e_order_crabs <- ellipse_sort(row_prox_crabs)  # R2E ordering

  # Step 3: Apply AVG-R2E ordering
  hctree_result_crabs <- hctree_sort(
    row_prox_crabs,                   # use distance matrix directly
    externalOrder = r2e_order_crabs,  # apply r2e order
    orderType = 2,                    # 2 = Average-linkage
    flipType = 1                      # 1 = Flip based on externalOrder
  )

  avg_r2e_order_crabs <- hctree_result_crabs$order + 1

  # Inspect results
  avg_r2e_order_crabs
}

# =======================
# Example 2: Crabs dataset with distance method (Pearson correlation)
# =======================
if (requireNamespace("MASS", quietly = TRUE)) {
  df_crabs <- as.matrix(MASS::crabs[, -c(1:3)])  # Use continuous variables only
  row_prox_pearson <- computeProximity(
    data = df_crabs,
    proxType = 1,               # 1 = Pearson correlation (internally 1 - cor)
    side = 0,                   # 0 = row-wise proximity
    isContainMissingValue = 0
  )

  # Step 2: Obtain R2E ordering
  r2e_order_pearson <- ellipse_sort(row_prox_pearson)  # R2E ordering

  # Step 3: Inspect results
  dist_pearson <- as.dist(1 - row_prox_pearson) # convert correlation matrix to distance matrix
  dist_pearson_MT <- as.matrix(dist_pearson)

  hctree_result_pearson <- hctree_sort(
    dist_pearson_MT,                    # use distance matrix directly
    externalOrder = r2e_order_pearson,  # apply r2e order
    orderType = 2,                      # 2 = Average-linkage
    flipType = 1                        # 1 = Flip based on externalOrder
  )

  avg_r2e_order_pearson <- hctree_result_pearson$order + 1

  # Inspect results
  avg_r2e_order_pearson
}

Run the code above in your browser using DataLab