Learn R Programming

volker (version 3.3.0)

add_clusters: Add cluster number to a data frame

Description

Clustering is performed using stats::kmeans.

[Experimental]

Usage

add_clusters(
  data,
  cols,
  newcol = NULL,
  k = 2,
  method = "kmeans",
  labels = TRUE,
  clean = TRUE
)

Value

The input tibble with additional column containing cluster values as a factor. The new column is prefixed with "cls_". The new column contains the fit result in the attribute stats.kmeans.fit. The names of the items used for clustering are stored in the attribute stats.kmeans.items. The clustering diagnostics (Within-Cluster and Between-Cluster Sum of Squares) are stored in the attribute stats.kmeans.wss.

Arguments

data

A dataframe.

cols

A tidy selection of item columns.

newcol

Name of the new cluster column as a character vector. Set to NULL (default) to automatically build a name from the common column prefix, prefixed with "cls_".

k

Number of clusters to calculate. Set to NULL to output a scree plot for up to 10 clusters and automatically choose the number of clusters based on the elbow criterion. The within-sums of squares for the scree plot are calculated by stats::kmeans.

method

The method as character value. Currently, only kmeans is supported. All items are scaled before performing the cluster analysis using base::scale.

labels

Whether to get the label of the cluster column from the common prefix of item column labels

clean

Prepare data by data_clean.

Examples

Run this code
library(volker)
ds <- volker::chatgpt

volker::add_clusters(ds, starts_with("cg_adoption"), k = 3)

Run the code above in your browser using DataLab