step_kmedoids: K-Medoids Clustering Variable Selection

Description

Creates a specification of a recipe step that will partition numeric variables according to k-medoids clustering and select the cluster medoids.

Usage

step_kmedoids(
  recipe,
  ...,
  k = 5,
  center = TRUE,
  scale = TRUE,
  metric = c("euclidean", "manhattan"),
  optimize = FALSE,
  replace = TRUE,
  prefix = "KMedoids",
  role = "predictor",
  skip = FALSE,
  id = recipes::rand_id("kmedoids")
)
# S3 method for step_kmedoids
tidy(x, ...)
tunable.step_kmedoids(x, ...)

Arguments

recipe

recipe object to which the step will be added.

...

one or more selector functions to choose which variables will be used to compute the components. See selections for more details. These are not currently used by the tidy method.

number of k-medoids clusterings of the variables. The value of k is constrained to be between 1 and one less than the number of original variables.

center, scale

logicals indicating whether to mean center and median absolute deviation scale the original variables prior to cluster partitioning; not applied to selected variables.

metric

character string specifying the distance metric for calculating dissimilarities between observations.

optimize

logical indicator or 0:5 integer level specifying optimization for the clustering algorithm. See the pamonce argument of pam for details.

replace

logical indicating whether to replace the original variables.

prefix

if the original variables are not replaced, a character string prefix added to a sequence of zero-padded integers to generate names for the resulting new variables; otherwise, the original variable names are retained.

role

analysis role that added step variables should be assigned. By default, they are designated as model predictors.

skip

logical indicating whether to skip the step when the recipe is baked. While all operations are baked when prep is run, some operations may not be applicable to new data (e.g. processing outcome variables). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations.

unique character string to identify the step.

step_kmedoids object.

Value

An updated version of recipe with the new step added to the sequence of existing steps (if any). For the tidy method, a tibble with columns terms (selectors or variables selected), cluster assignments, medoid (logical indicator of cluster medoids), silhouette (silhouette values), and names for the new variables.

Details

K-medoids clustering partitions variables into k groups such that the dissimilarity between the variables and their assigned cluster medoids is minimized. Cluster medoids are then returned as a set of k variables.

References

Reynolds A, Richards G, de la Iglesia B and Rayward-Smith V (1992). Clustering rules: a comparison of partitioning and hierarchical clustering algorithms. Journal of Mathematical Modelling and Algorithms 5, 475--504.

Examples

Run this code

# NOT RUN {
library(recipes)

rec <- recipe(rating ~ ., data = attitude)
kmedoids_rec <- rec %>%
  step_kmedoids(all_predictors(), k = 3)
kmedoids_prep <- prep(kmedoids_rec, training = attitude)
kmedoids_data <- bake(kmedoids_prep, attitude)

pairs(kmedoids_data, lower.panel = NULL)

tidy(kmedoids_rec, number = 1)
tidy(kmedoids_prep, number = 1)

# }

Run the code above in your browser using DataLab