cluster.fcm: Fuzzy C-Means Clustering for Functional Data

Description

Performs fuzzy c-means clustering on functional data, where each curve has a membership degree to each cluster rather than a hard assignment.

Usage

cluster.fcm(fdataobj, ncl, m = 2, max.iter = 100, tol = 1e-06, seed = NULL)

Value

A list of class 'fuzzycmeans.fd' with components:

membership: Matrix of membership degrees (n x ncl). Each row sums to 1.
cluster: Hard cluster assignments (argmax of membership).
centers: An fdata object containing the cluster centers.
objective: Final value of the objective function.
fdataobj: The input functional data object.

Arguments

fdataobj: An object of class 'fdata'.
ncl: Number of clusters.
m: Fuzziness parameter (default 2). Must be > 1. Higher values give softer cluster boundaries.
max.iter: Maximum number of iterations (default 100).
tol: Convergence tolerance (default 1e-6).
seed: Optional random seed for reproducibility.

Details

Fuzzy c-means minimizes the objective function: $$J = \sum_{i=1}^n \sum_{c=1}^k u_{ic}^m ||X_i - v_c||^2$$ where u_ic is the membership of curve i in cluster c, v_c is the cluster center, and m is the fuzziness parameter.

The membership degrees are updated as: $$u_{ic} = 1 / \sum_{j=1}^k (d_{ic}/d_{ij})^{2/(m-1)}$$

When m approaches 1, FCM becomes equivalent to hard k-means. As m increases, the clusters become softer (more overlap). m = 2 is the most common choice.

Examples

Run this code

# Create functional data with THREE groups - one genuinely overlapping
set.seed(42)
t <- seq(0, 1, length.out = 50)
n <- 45
X <- matrix(0, n, 50)

# Group 1: Sine waves centered at 0
for (i in 1:15) X[i, ] <- sin(2*pi*t) + rnorm(50, sd = 0.2)
# Group 2: Sine waves centered at 1.5 (clearly separated from group 1)
for (i in 16:30) X[i, ] <- sin(2*pi*t) + 1.5 + rnorm(50, sd = 0.2)
# Group 3: Between groups 1 and 2 (true overlap - ambiguous membership)
for (i in 31:45) X[i, ] <- sin(2*pi*t) + 0.75 + rnorm(50, sd = 0.3)

fd <- fdata(X, argvals = t)

# Fuzzy clustering reveals the overlap
fcm <- cluster.fcm(fd, ncl = 3, seed = 123)

# Curves in group 3 (31-45) have split membership - this is the key benefit!
cat("Membership for curves 31-35 (overlap region):\n")
print(round(fcm$membership[31:35, ], 2))

# Compare to hard clustering which forces a decision
km <- cluster.kmeans(fd, ncl = 3, seed = 123)
cat("\nHard vs Fuzzy assignment for curve 35:\n")
cat("K-means cluster:", km$cluster[35], "\n")
cat("FCM memberships:", round(fcm$membership[35, ], 2), "\n")